Getting lots of BIMI images using Python


I've written before about the moribund BIMI specification. It's a way for brands to include a trusted logo when they send emails. It isn't much used and, apparently, is riddled with security issues.

I thought it might be fun to grab all the BIMI images from the most popular websites, so I can potentially use them in my SuperTinyIcons project.

BIMI images are SVGs. Links to a site's BIMI are stored in a domain's DNS records. All BIMI records must be on a default._bimi. subdomain.

If you run dig TXT default._bimi.linkedin.com you'll receive back:

 DNS;; ANSWER SECTION:
default._bimi.linkedin.com. 3600 IN TXT "v=BIMI1; l=https://media.licdn.com/media/AAYQAQQhAAgAAQAAAAAAABrLiVuNIZ3fRKGlFSn4hGZubg.svg; a=https://media.licdn.com/media/AAYAAQQhAAgAAQAAAAAAALe_JUaW1k4JTw6eZ_Gtj2raUw.pem;"

Awesome! We can grab the .svg URl and download the file.

Getting a list of BIMI enabled domains is difficult. Thankfully, Freddie Leeman has done some excellent analysis work and was happy to share a list of over 7,000 domains which have BIMI.

Let's get cracking with a little Python. First up, install DNSPython if you don't already have it.

This gets the TXT record from the domain name:

Python 3 Python 3import socket
import dns.resolver

response = dns.resolver.query('default._bimi.linkedin.com', 'TXT')
result = response.rrset.to_text()
print(result)

There are various ways of extracting the URl. I decided to be lazy and use a regex. Sue me.

Python 3 Python 3import re

pattern = r'l=(https[^;"]*[;"])'
match = re.search(pattern, result)
if match:
   # Remove the trailing semicolon or quote mark
   url = match.group(1).rstrip(';\"')
   print(f'Matched URL: {url}')
else:
   print(f'No match: {result}')

Putting it all together, this reads in the list of domains, finds the BIMI TXT record, grabs the URl, and saves the SVG.

Python 3 Python 3import socket
import dns.resolver
import re
import requests

headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'
}

pattern = r'l=(https[^;"]*[;"])'

domain_file = open('domains.txt', 'r')
domains = domain_file.readlines()
domains.sort()

for domain in domains:
   bimi_domain = "default._bimi." + domain.strip()
   try:
      response = dns.resolver.query(bimi_domain, 'TXT')
      result = response.rrset.to_text()
      match = re.search(pattern, result)
      if match:
         # Remove the trailing semicolon or quote mark
         svg_url = match.group(1).rstrip(';\"')
         print(f'Downloading: {svg_url}')
         try:
            svg = requests.get(svg_url, allow_redirects=True, timeout=30, headers=headers)
            open(domain.strip() +'.svg', 'wb').write(svg.content)
         except:
            print(f'Error with {domain}: {result}')
      else:
         print(f'No match from {domain}: {result}')
   except:
      print(f'DNS error with {bimi_domain}')

Obviously, it could be made a lot more efficient and download the files in parallel.

I found a few bugs in various BIMI implementations, including:

  • ted.com and homeadvisor.com uses a http URl
  • consumerreports.org and sleepfoundation.org has a misplaced space in their TXT record
  • audubon.org had an invalid certificate
  • mac.com was blank - as was discogs.com, livechatinc.com, icloud.com, me.com, lung.org, miro.com, protonmail.ch and many others.
  • alabama.gov had a timeout - as did nebraska.gov, uclahealth.org and several others.
  • politico.com had a 404 for their BIMI - as do lots of others.
  • coopersurgical.com is 8MB!
  • There are loads of SVGs which bust the 32KB maximum size requirement - some by multiple megabytes.

I might spend some time over the next few weeks optimising the code and looking for any other snafus. I didn't find any with ECMAScript in them. Yet!


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">