*Almost* Open Data


(Inspired by a session at Open Data Camp called "Open Data Horror Stories")

I was having an argument spirited discussion with my friend John the other day.

He was sympathising with Tony Blair's regrets over the introduction of the Freedom of Information Act. Apparently, FOI requests put too great a strain on departments and government bodies. Not only is there a huge cost of compliance, but there's a an institutional fear of "dangerous" FOIs - requests which take too much time, are costly to process, or could embarrass someone powerful.

Good.

I want institutions to fear FOI requests. I want them to tremble and quake each time WhatDoTheyKnow.com sends them an FOI email.

Each FOI request is a reminder that a department isn't doing their job properly. It shows that the people who pay for the information (the public) have not been easily able to access the information. In today's hyper-connected world, that is a tragedy.

At the moment, an FOI Officer's job is to search for information which hasn't yet been made public and then publish it. I'm greatly over simplifying here - the work of FOI staff is vitally important, but at the moment they are fulfilling a stop-gap function.

The public should have immediate, read-only, direct access to all official sources of information. The data should be well structured, reliable, and comprehensive. It doesn't need to be simplified - but consideration should be given to how non-experts will use and interpret the data.

That's the minimum requirement for Open Data. Giving the public the same level of access as officials.

Let me give you an example...

Hail Hydra!

I wanted to know the location of every fire hydrant in the UK. Just because*.

People have asked for this before and have been fobbed off with bizarre excuses, including :

  • Fire Hydrants are a national security issue!
  • Their location is commercially sensitive!
  • We don't have a comprehensive list!

All utter nonsense. A determined group of individuals could crowd-source the data for a restricted geographical area without much fuss. It's merely a case of walking around and looking!

I sent off an FOI request to my local fire service to see what they'd say. I was pleasantly surprised when they replied well within the legal time limit and gave me the information I asked for.

Nearly.

Death To The PDF!

Adobe have created many abominable file formats. The hideously insecure Flash, the needlessly complicated Photoshop, and the tragically misused PDF.

Look, PDF is a perfectly fine way to represent a paper document in a digital format - but it should never be used for the transfer of data. It's a complex file which deliberately obfuscates data and makes it hard to perform any meaningful actions on its contents.

So, of course, that's how my FOI data were delivered to me - in a sodding PDF.

But, oh! Gentle reader, it gets so much worse.

There are many tools which will extract text from a PDF and render them as pure data. These tools are born out of necessity and - while they make a valiant effort to be correct - often fall short. They perfectly demonstrate why PDF is not a robust method of data transfer.

Sadly, the data in my PDF were not able to be extracted. The data in my PDF were encoded as images.

Page after page of JPG. Each one trapping the data as a picture, rather than machine readable text. 15 bloody MBs of it!

This is Open Data in name only. A sham. A cruel trick played by vengeful gods determined to frustrate mere mortals! Wickedness perpetuated out of demented malice serving only to enrage and disempower!

Or was it? I sent a polite email asking for the data in .CSV format, and it was quickly provided.

Which was nice.

In truth, there was nothing malicious about the way the data was sent to me. A busy FOI Officer hit the big "Export" button and accepted the default file format. If I had wanted a specific format, I should have been explicit in my request. Of course, I have no way of knowing what those defaults are - and the officer has no way of knowing what format suits me best.

My friend Andy Mabbett sensibly adds this note to all his FOI requests :

I would like this data, please, in a format which I can edit, such as a CSV file, or one which I can enter into a mapping service, such as a GPX/KML file.

My preference would be for you to publish this data on your website and to provide the relevant URL(s); and to keep it updated, with an indication of when and what changes are made. Otherwise, please send the file(s) by return.

It's that final paragraph on which I want to concentrate.

Solving (Some) Open Data Frustrations

The data I want is stored on a database known as Hydra (which is quite different from Marvel's Hydra!)

Give the public access to Hydra.

Seriously. Do it. These data aren't secret. There's no need to hide them away other than "well, that's just how we've always done it."

To be clear, I'm not suggesting that any Tom, Dick, or Harry should be allowed to edit the data (I don't think the UK Civil Service is quite ready for the Wikipediaisation of their world) - but it solves so many problems by just providing direct access.

There are existing models for this throughout the UK - the Department for Education provides free and open access to EduBase - a database of all the schools in the country.

I'm suggesting that every single database held by the state should be available to everyone.

Working With The Data

Here is a sample of the data stored in Hydra.

mains_size ngr_easting ngr_northing address_property     address_street   address_town
100        458455000   222364000    OPP. BICESTER BRIDAL MANORSFIELD ROAD BICESTER

Converting the Eastings and Northings into Latitude and Longitude is relatively simple and means we can plot them on a map.

Using Google Streetview's API we can locate the hydrant marker via its co-ordinates.

Streetview Hydrant

We can also use the human-readable string "Opposite Bicester Bridal" to confirm the location of the hydrant itself.
Streetview SV Hydrant

With this mix of open data and proprietary services, we can create new services. Would the Fire Brigade benefit from street level photos? Could we crowdsource photos of each hydrant? How well maintained they are? Can I provide a competing service to the water companies?

That's the beauty of open data - it opens up endless possibilities.

Visualising The Data

* Ok, I wanted the data not "just because" but for quite a specific reason. When insuring a property, it would be helpful to know how far away the nearest fire hydrant is.

By using the quite remarkable CartoDB we can plot all of these points on an easy to use map.

oxfordshire_fire_hydrants_1_by_edent_12_18_2015_08_10_50-fs8

By panning and zooming, we can easily see if the house we're about to buy / rent is well served by hydrants. Useful!

4 thoughts on “*Almost* Open Data

  1. I'm suggesting that every single database held by the state should be available to everyone.

    Er. With an exemption for those which contain personal data, presumably... passports, several of the DVLA stable, ditto police, etc.

    1. Prude!

      But, yes, let me caveat that statement by saying that they should be appropriately treated to ensure that Personally Identifiable Information - and other data proscribed by the DPA and FOI - are not accessible.

  2. Awesome! In Ukraine we receive printed (paper) as a result of such inquiries. But that's just a start, really :) Special thanks for CartoDB

Leave a Reply

Your email address will not be published. Required fields are marked *