<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/rss-style.xsl" type="text/xsl"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	    xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	     xmlns:dc="http://purl.org/dc/elements/1.1/"
	   xmlns:atom="http://www.w3.org/2005/Atom"
	     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	  xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>
<channel>
	<title>BIMI &#8211; Terence Eden’s Blog</title>
	<atom:link href="https://shkspr.mobi/blog/tag/bimi/feed/" rel="self" type="application/rss+xml" />
	<link>https://shkspr.mobi/blog</link>
	<description>Regular nonsense about tech and its effects 🙃</description>
	<lastBuildDate>Sat, 07 Jun 2025 16:31:15 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://shkspr.mobi/blog/wp-content/uploads/2023/07/cropped-avatar-32x32.jpeg</url>
	<title>BIMI &#8211; Terence Eden’s Blog</title>
	<link>https://shkspr.mobi/blog</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title><![CDATA[Getting lots of BIMI images using Python]]></title>
		<link>https://shkspr.mobi/blog/2024/06/getting-lots-of-bimi-images-using-python/</link>
					<comments>https://shkspr.mobi/blog/2024/06/getting-lots-of-bimi-images-using-python/#respond</comments>
				<dc:creator><![CDATA[@edent]]></dc:creator>
		<pubDate>Fri, 07 Jun 2024 11:34:09 +0000</pubDate>
				<category><![CDATA[/etc/]]></category>
		<category><![CDATA[BIMI]]></category>
		<category><![CDATA[dns]]></category>
		<category><![CDATA[svg]]></category>
		<guid isPermaLink="false">https://shkspr.mobi/blog/?p=50701</guid>

					<description><![CDATA[I&#039;ve written before about the moribund BIMI specification. It&#039;s a way for brands to include a trusted logo when they send emails.  It isn&#039;t much used and, apparently, is riddled with security issues.  I thought it might be fun to grab all the BIMI images from the most popular websites, so I can potentially use them in my SuperTinyIcons project.  BIMI images are SVGs. Links to a site&#039;s BIMI are…]]></description>
										<content:encoded><![CDATA[<p>I've written before about the <a href="https://shkspr.mobi/blog/2022/08/dns-esoterica-bimi-svg-in-dns-txt-wtf/">moribund BIMI specification</a>. It's a way for brands to include a trusted logo when they send emails.  It isn't much used and, apparently, is <a href="https://mailarchive.ietf.org/arch/msg/bimi/xzYRH72V2HE9xeUfXK_zUgYSI7k/">riddled with</a> <a href="https://mailarchive.ietf.org/arch/msg/bimi/PS8Xf1hQ41oCAwtsUvVsbRSs34Q/">security issues</a>.</p>

<p>I thought it might be fun to grab all the BIMI images from the most popular websites, so I can potentially use them in my <a href="https://shkspr.mobi/blog/2020/05/some-updates-to-supertinyicons/">SuperTinyIcons project</a>.</p>

<p>BIMI images are SVGs. Links to a site's BIMI are stored in a domain's DNS records.  All BIMI records <em>must</em> be on a <code>default._bimi.</code> subdomain.</p>

<p>If you run <code>dig TXT default._bimi.linkedin.com</code> you'll receive back:</p>

<pre><code class="language-dns">;; ANSWER SECTION:
default._bimi.linkedin.com. 3600 IN TXT "v=BIMI1; l=https://media.licdn.com/media/AAYQAQQhAAgAAQAAAAAAABrLiVuNIZ3fRKGlFSn4hGZubg.svg; a=https://media.licdn.com/media/AAYAAQQhAAgAAQAAAAAAALe_JUaW1k4JTw6eZ_Gtj2raUw.pem;"
</code></pre>

<p>Awesome! We can grab the <a href="https://media.licdn.com/media/AAYQAQQhAAgAAQAAAAAAABrLiVuNIZ3fRKGlFSn4hGZubg.svg">.svg URl</a> and download the file.</p>

<p>Getting a list of BIMI enabled domains is difficult. Thankfully, <a href="https://www.uriports.com/blog/bimi/">Freddie Leeman has done some excellent analysis work</a> and was happy to share <a href="https://pastebin.com/si9e8dCc">a list of over 7,000 domains which have BIMI</a>.</p>

<p>Let's get cracking with a little Python.  First up, <a href="https://www.dnspython.org/">install DNSPython</a> if you don't already have it.</p>

<p>This gets the TXT record from the domain name:</p>

<pre><code class="language-python">import socket
import dns.resolver

response = dns.resolver.query('default._bimi.linkedin.com', 'TXT')
result = response.rrset.to_text()
print(result)
</code></pre>

<p>There are various ways of extracting the URl. I decided to be lazy and use a regex. Sue me.</p>

<pre><code class="language-python">import re

pattern = r'l=(https[^;"]*[;"])'
match = re.search(pattern, result)
if match:
   # Remove the trailing semicolon or quote mark
   url = match.group(1).rstrip(';\"')
   print(f'Matched URL: {url}')
else:
   print(f'No match: {result}')
</code></pre>

<p>Putting it all together, this reads in the list of domains, finds the BIMI TXT record, grabs the URl, and saves the SVG.</p>

<pre><code class="language-python">import socket
import dns.resolver
import re
import requests

headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'
}

pattern = r'l=(https[^;"]*[;"])'

domain_file = open('domains.txt', 'r')
domains = domain_file.readlines()
domains.sort()

for domain in domains:
   bimi_domain = "default._bimi." + domain.strip()
   try:
      response = dns.resolver.query(bimi_domain, 'TXT')
      result = response.rrset.to_text()
      match = re.search(pattern, result)
      if match:
         # Remove the trailing semicolon or quote mark
         svg_url = match.group(1).rstrip(';\"')
         print(f'Downloading: {svg_url}')
         try:
            svg = requests.get(svg_url, allow_redirects=True, timeout=30, headers=headers)
            open(domain.strip() +'.svg', 'wb').write(svg.content)
         except:
            print(f'Error with {domain}: {result}')
      else:
         print(f'No match from {domain}: {result}')
   except:
      print(f'DNS error with {bimi_domain}')
</code></pre>

<p>Obviously, it could be made a lot more efficient and download the files in parallel.</p>

<p>I found a few bugs in various BIMI implementations, including:</p>

<ul>
<li>ted.com and homeadvisor.com uses a <code>http</code> URl</li>
<li>consumerreports.org and sleepfoundation.org has a misplaced space in their TXT record</li>
<li>audubon.org had an invalid certificate</li>
<li>mac.com was blank - as was discogs.com, livechatinc.com, icloud.com, me.com, lung.org, miro.com, protonmail.ch and many others.</li>
<li>alabama.gov had a timeout - as did nebraska.gov, uclahealth.org and several others.</li>
<li>politico.com had a 404 for their BIMI - as do <em>lots</em> of others.</li>
<li>coopersurgical.com is 8MB!</li>
<li>There are <em>loads</em> of SVGs which bust the <a href="https://bimigroup.org/creating-bimi-svg-logo-files/">32KB maximum size requirement</a> - some by multiple megabytes.</li>
</ul>

<p>I might spend some time over the next few weeks optimising the code and looking for any other snafus. I didn't find any with ECMAScript in them. Yet!</p>
<img src="https://shkspr.mobi/blog/wp-content/themes/edent-wordpress-theme/info/okgo.php?ID=50701&HTTP_REFERER=RSS" alt="" width="1" height="1" loading="eager">]]></content:encoded>
					
					<wfw:commentRss>https://shkspr.mobi/blog/2024/06/getting-lots-of-bimi-images-using-python/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
