I've written before about the moribund BIMI specification. It's a way for brands to include a trusted logo when they send emails. It isn't much used and, apparently, is riddled with security issues.
I thought it might be fun to grab all the BIMI images from the most popular websites, so I can potentially use them in my SuperTinyIcons project.
BIMI images are SVGs. Links to a site's BIMI are stored in a domain's DNS records. All BIMI records must be on a default._bimi.
subdomain.
If you run dig TXT default._bimi.linkedin.com
you'll receive back:
DNS;; ANSWER SECTION:
default._bimi.linkedin.com. 3600 IN TXT "v=BIMI1; l=https://media.licdn.com/media/AAYQAQQhAAgAAQAAAAAAABrLiVuNIZ3fRKGlFSn4hGZubg.svg; a=https://media.licdn.com/media/AAYAAQQhAAgAAQAAAAAAALe_JUaW1k4JTw6eZ_Gtj2raUw.pem;"
Awesome! We can grab the .svg URl and download the file.
Getting a list of BIMI enabled domains is difficult. Thankfully, Freddie Leeman has done some excellent analysis work and was happy to share a list of over 7,000 domains which have BIMI.
Let's get cracking with a little Python. First up, install DNSPython if you don't already have it.
This gets the TXT record from the domain name:
Python 3
import socket import dns.resolver response = dns.resolver.query('default._bimi.linkedin.com', 'TXT') result = response.rrset.to_text() print(result)
There are various ways of extracting the URl. I decided to be lazy and use a regex. Sue me.
Python 3
import re pattern = r'l=(https[^;"]*[;"])' match = re.search(pattern, result) if match: # Remove the trailing semicolon or quote mark url = match.group(1).rstrip(';\"') print(f'Matched URL: {url}') else: print(f'No match: {result}')
Putting it all together, this reads in the list of domains, finds the BIMI TXT record, grabs the URl, and saves the SVG.
Python 3
import socket import dns.resolver import re import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0' } pattern = r'l=(https[^;"]*[;"])' domain_file = open('domains.txt', 'r') domains = domain_file.readlines() domains.sort() for domain in domains: bimi_domain = "default._bimi." + domain.strip() try: response = dns.resolver.query(bimi_domain, 'TXT') result = response.rrset.to_text() match = re.search(pattern, result) if match: # Remove the trailing semicolon or quote mark svg_url = match.group(1).rstrip(';\"') print(f'Downloading: {svg_url}') try: svg = requests.get(svg_url, allow_redirects=True, timeout=30, headers=headers) open(domain.strip() +'.svg', 'wb').write(svg.content) except: print(f'Error with {domain}: {result}') else: print(f'No match from {domain}: {result}') except: print(f'DNS error with {bimi_domain}')
Obviously, it could be made a lot more efficient and download the files in parallel.
I found a few bugs in various BIMI implementations, including:
- ted.com and homeadvisor.com uses a
http
URl - consumerreports.org and sleepfoundation.org has a misplaced space in their TXT record
- audubon.org had an invalid certificate
- mac.com was blank - as was discogs.com, livechatinc.com, icloud.com, me.com, lung.org, miro.com, protonmail.ch and many others.
- alabama.gov had a timeout - as did nebraska.gov, uclahealth.org and several others.
- politico.com had a 404 for their BIMI - as do lots of others.
- coopersurgical.com is 8MB!
- There are loads of SVGs which bust the 32KB maximum size requirement - some by multiple megabytes.
I might spend some time over the next few weeks optimising the code and looking for any other snafus. I didn't find any with ECMAScript in them. Yet!