I was having an
argument spirited discussion with my friend John the other day.
He was sympathising with Tony Blair's regrets over the introduction of the Freedom of Information Act. Apparently, FOI requests put too great a strain on departments and government bodies. Not only is there a huge cost of compliance, but there's a an institutional fear of "dangerous" FOIs - requests which take too much time, are costly to process, or could embarrass someone powerful.
I want institutions to fear FOI requests. I want them to tremble and quake each time WhatDoTheyKnow.com sends them an FOI email.
Each FOI request is a reminder that a department isn't doing their job properly. It shows that the people who pay for the information (the public) have not been easily able to access the information. In today's hyper-connected world, that is a tragedy.
At the moment, an FOI Officer's job is to search for information which hasn't yet been made public and then publish it. I'm greatly over simplifying here - the work of FOI staff is vitally important, but at the moment they are fulfilling a stop-gap function.
The public should have immediate, read-only, direct access to all official sources of information. The data should be well structured, reliable, and comprehensive. It doesn't need to be simplified - but consideration should be given to how non-experts will use and interpret the data.
That's the minimum requirement for Open Data. Giving the public the same level of access as officials.
Let me give you an example...
I wanted to know the location of every fire hydrant in the UK. Just because*.
People have asked for this before and have been fobbed off with bizarre excuses, including :
- Fire Hydrants are a national security issue!
- Their location is commercially sensitive!
- We don't have a comprehensive list!
All utter nonsense. A determined group of individuals could crowd-source the data for a restricted geographical area without much fuss. It's merely a case of walking around and looking!
I sent off an FOI request to my local fire service to see what they'd say. I was pleasantly surprised when they replied well within the legal time limit and gave me the information I asked for.
Death To The PDF!
Look, PDF is a perfectly fine way to represent a paper document in a digital format - but it should never be used for the transfer of data. It's a complex file which deliberately obfuscates data and makes it hard to perform any meaningful actions on its contents.
So, of course, that's how my FOI data were delivered to me - in a sodding PDF.
But, oh! Gentle reader, it gets so much worse.
There are many tools which will extract text from a PDF and render them as pure data. These tools are born out of necessity and - while they make a valiant effort to be correct - often fall short. They perfectly demonstrate why PDF is not a robust method of data transfer.
Sadly, the data in my PDF were not able to be extracted. The data in my PDF were encoded as images.
Page after page of JPG. Each one trapping the data as a picture, rather than machine readable text. 15 bloody MBs of it!
This is Open Data in name only. A sham. A cruel trick played by vengeful gods determined to frustrate mere mortals! Wickedness perpetuated out of demented malice serving only to enrage and disempower!
Or was it? I sent a polite email asking for the data in .CSV format, and it was quickly provided.
Which was nice.
In truth, there was nothing malicious about the way the data was sent to me. A busy FOI Officer hit the big "Export" button and accepted the default file format. If I had wanted a specific format, I should have been explicit in my request. Of course, I have no way of knowing what those defaults are - and the officer has no way of knowing what format suits me best.
My friend Andy Mabbett sensibly adds this note to all his FOI requests :
I would like this data, please, in a format which I can edit, such as a CSV file, or one which I can enter into a mapping service, such as a GPX/KML file.
My preference would be for you to publish this data on your website and to provide the relevant URL(s); and to keep it updated, with an indication of when and what changes are made. Otherwise, please send the file(s) by return.
It's that final paragraph on which I want to concentrate.
Solving (Some) Open Data Frustrations
Give the public access to Hydra.
Seriously. Do it. These data aren't secret. There's no need to hide them away other than "well, that's just how we've always done it."
To be clear, I'm not suggesting that any Tom, Dick, or Harry should be allowed to edit the data (I don't think the UK Civil Service is quite ready for the Wikipediaisation of their world) - but it solves so many problems by just providing direct access.
There are existing models for this throughout the UK - the Department for Education provides free and open access to EduBase - a database of all the schools in the country.
I'm suggesting that every single database held by the state should be available to everyone.
Working With The Data
Here is a sample of the data stored in Hydra.
mains_size ngr_easting ngr_northing address_property address_street address_town 100 458455000 222364000 OPP. BICESTER BRIDAL MANORSFIELD ROAD BICESTER
Converting the Eastings and Northings into Latitude and Longitude is relatively simple and means we can plot them on a map.
Using Google Streetview's API we can locate the hydrant marker via its co-ordinates.
We can also use the human-readable string "Opposite Bicester Bridal" to confirm the location of the hydrant itself.
With this mix of open data and proprietary services, we can create new services. Would the Fire Brigade benefit from street level photos? Could we crowdsource photos of each hydrant? How well maintained they are? Can I provide a competing service to the water companies?
That's the beauty of open data - it opens up endless possibilities.
Visualising The Data
* Ok, I wanted the data not "just because" but for quite a specific reason. When insuring a property, it would be helpful to know how far away the nearest fire hydrant is.
By using the quite remarkable CartoDB we can plot all of these points on an easy to use map.
By panning and zooming, we can easily see if the house we're about to buy / rent is well served by hydrants. Useful!