A Complete List of Every UK Government Domain Name


Eight years after I published this blog post, I helped officially release all these domain names as open data! Funny how life works out, eh?

Would you like to know every domain name the UK Government had registered? Of course you would! There could be all sorts of interesting tit-bits hidden in there (ProtectAndSurvive.gov.uk? EbolaOutbreak2017.nhs.uk? MinistryOfTruth.police.uk?)

Rather than relying on Freedom of Information requests, or Open Data, we can go straight to the source of domain names - the DNS!

Shut Up And Give Me The Codez!

Download all UK Government host names
.gov.uk 15,436 records
.nhs.uk 4,877 records
.police.uk 466 records
.mod.uk 268 records
.parliament.uk 91 records

That's... quite a lot! The majority are host names - only around 2,247 of the GOV.UK ones are domain names. Many of them are not currently live.

Still, I wonder how many are new?

The Gov.UK file is a CSV which also show when the domain was first registered (if available).

Geeky Details

The Domain Name System (DNS) lists every single domain name (example.com). It tells your computer which IP Address is associated with a Domain Name. If your local DNS doesn't know where example.gov.uk lives, it goes to the ISP's DNS. If they don't know, they ask an upstream provider's DNS. And so on, until someone asks the .gov.uk nameserver for an authoritative response.

So, can you download every domain name in existence? No, not easily. It usually involves filling out lots of forms and giving some compelling reason why you want it.

However, Rapid7's sonar project provides a sort of "best guess" for all the domain names which it can see.

To download the entire file is 12GB. That's the zipped version.

Once unzipped, it's a whopping 67GB

A quick look at the file shows it contains 1,408,097,159 records. Youch! That's a lot of domain names!

This is what the file looks like

$ head 20150926_dnsrecords_all
cshengmei.com.h310.6dns.net,a,103.225.196.101
reseauocoz.cluster007.ovh.net,cname,cluster007.ovh.net
cse-web-cl.comunique-se.com.br,a,200.166.77.69
ext-cust.squarespace.com,a,198.185.159.176
ext-cust.squarespace.com,a,198.185.159.177
ext-cust.squarespace.com,a,198.49.23.176
ext-cust.squarespace.com,a,198.49.23.177
ghs.googlehosted.com,cname,googlehosted.l.googleusercontent.com
isutility.web9.hubspot.com,cname,a1049.b.akamai.net
sendv54sxu8f12g.ihance.net,a,54.241.8.193
sites.smarsh.io,a,199.47.168.63
www.triblocal.com.s3-website-us-east-1.amazonaws.com,cname,s3-website-us-east-1.amazonaws.com
*.01ete21.cn.cname.yunjiasu-cdn.net,a,162.159.210.34
*.01ete21.cn.cname.yunjiasu-cdn.net,a,162.159.211.34

As a brief primer, a CNAME points to another domain name. An A Record points to an IP address. There are lots of different domain records.

Ok, so let's get all the *.gov.uk records out of there...

grep "gov\.uk" 20150926_dnsrecords_all
0-19insalford.info,soa,ns0.ictservices.co.uk postmaster.salford.gov.uk 2010022204 28800 7200 604800 86400
019186.gov.ukpfl.cn,a,122.9.230.117
100days.local.gov.uk,a,198.154.241.231
101.gov.uk,a,216.146.46.10
101.gov.uk,a,216.146.46.11
101.gov.uk,mx,20 sms2.101.gov.uk
101.gov.uk,ns,ns1.p08.dynect.net

Ah! Ok, we're picking up some websites which are pointing to a gov.uk site (potentially useful) and some false positives like "019186.gov.ukpfl.cn". Let's just look at records where the first column ends with .gov.uk":

grep "\.gov\.uk," 20150926_dnsrecords_all
100days.local.gov.uk,a,198.154.241.231
101.gov.uk,a,216.146.46.10
101.gov.uk,a,216.146.46.11
101.gov.uk,mx,20 sms2.101.gov.uk
101.gov.uk,ns,ns1.p08.dynect.net
101.gov.uk,ns,ns2.p08.dynect.net
101.gov.uk,ns,ns3.p08.dynect.net
101.gov.uk,soa,ns1.p08.dynect.net hostmaster.cscdns.net 2014121100 3600 600 604800 1800
1901redirect.nationalarchives.gov.uk,a,193.132.104.151
1sttouch.powys.gov.uk,a,212.219.229.79
1t6c3c0p2r0m934.forestry.gov.uk,a,212.38.180.45
2011.census.gov.uk,a,94.126.106.132
2014.colneyheathparishcouncil.gov.uk,a,81.27.85.11
2050-calculator-tool-wiki.decc.gov.uk,cname,wiki.2050.org.uk

OK, so how do we de-duplicate these? The first thing to do is manipulate the data. We only want the first column. There are an number of ways to do this in Linux, I prefer to use the Python tool CSVfilter.

To install sudo pip install csvfilter.

To grab only the first (zeroth) column
cat 20150926_dnsrecords_all | csvfilter -f 0 > out.csv

Now, this doesn't quite work. Why? Because some DNS records contain incredibly strange data! You can manually clean up the data, but that's a bit boring and utterly impossible to load into Excel or any other normal editor.

Here's what I did...

  1. Copy all the lines containing gov.uk into a new file
    grep "\.gov\.uk," 20150926_dnsrecords_all > govuk.csv
  2. Create a new file with only the first column
    cat govuk.csv | csvfilter -f 0 > govuk0.csv
  3. Sort the file and make sure each line in unique
    sort govuk0.csv | uniq > govuk.txt

Hey presto! A more-or-less complete list of every .gov.uk website which is registered. The same can be performed for .NHS.uk, .police.uk, .MOD.uk etc.

Getting The Dates

Time to crack out the Ruby!

Using the WHOIS library, I wrote a simple script to parse the text records and query when the domain name was created.

#!/usr/bin/env ruby
require 'whois'

c = Whois::Client.new

File.open( "govuk.txt" ).each do |line|
   begin
      r = c.lookup(line.chomp)
      puts "#{line.chomp},#{r.created_on}"
   rescue Whois::Error => e
   rescue StandardError => e
   end
end

This isn't perfect - there are only records for the third level of gov.uk - and no records at all for Parliament, MOD, Police, and NHS. It is also a bit slow to run through the thousands of records - but we can see a few interesting bits and bobs.

Created in 2015

I suspect some of these are merely renewals, rather than brand new domains.

seemis.gov.uk,2015-10-29 00:00:00 +0000
yjb.gov.uk,2015-10-28 00:00:00 +0000
crbonline.gov.uk,2015-10-23 00:00:00 +0100
coi.gov.uk,2015-10-14 00:00:00 +0100
gibraltar.gov.uk,2015-07-29 00:00:00 +0100
dorsetforyou.gov.uk,2015-03-19 00:00:00 +0000
ico.gov.uk,2015-03-19 00:00:00 +0000
bridgnorthtowncouncil.gov.uk,2015-01-29 00:00:00 +0000

Oldest

wdc.gov.uk,2003-06-03 00:00:00 +0100
west-dunbarton.gov.uk,2003-06-03 00:00:00 +0100
clacks.gov.uk,2003-06-02 00:00:00 +0100
bassetlaw.gov.uk,2003-04-29 00:00:00 +0100
dti.gov.uk,2003-03-13 00:00:00 +0000

Sadly, clacks.gov.uk has very little to do with Terry Pratchett!

That's all folks!

Spotted anything unusual? Found a better way to do things? Stick a comment in the box!


If you've enjoyed this post, you can buy me something from my Amazon Wishlist.


Share this post on…

5 thoughts on “A Complete List of Every UK Government Domain Name”

  1. Chris Keene says:

    This is really interesting, thanks.
    To point out to those who perhaps haven't looked at the file, many of these are hostnames, not domain names. And many of these are for non-web services, ie z3950.hants.gov.uk is a z39.50 server (api) which is probably used to allow third parties to search the library catalogue. Pointing this out before anyone jumps in with "15 thousand gov websites!"

    Reply
  2. Mel Dymond Harper says:

    The "oldest" part above is a bit of a red herring. There must have been a flag day at some point in 2003 when domains were moved from one system to another, since there certainly _were_ .gov.uk domains in existence before 2003 -- but the WHOIS for those domains specifies all of them as created within 2003.

    Reply
  3. a random person says:

    I asked HSCIC for a copy of the nhs.uk zone file once. They only offer a copy of the zonefile for the org(s) you belong to, they won't give out the entire thing for 'security reasons'. They did however respond very quickly on a weekend!

    Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">