Tagged: data

Surrey Police and the Case of The Misleading Pie Charts

Surrey County Council have sent every household in the county a booklet explaining how our council tax is being spent. Within it is a highly political comment from Kevin Hurley, the newly elected Police and Crime Commissioner.

He presents a pie chart showing how the police force spend its money. Take a look at it and ask yourself this question: what percentage is spent on "Employees".

Surrey Police Pie Chart

Please use this poll to record your guess - answers at the end of this blog.

Pie charts have a long and noble history. They were popularised by Florence Nightingale and were hugely effective in helping politicians understand the causes of death among soldiers during the Crimean War.
Nightingale-mortality

As we understand more about the human brain and how we perceive shapes, it is becoming clear that pie charts are ineffective for representing complex information.

2D pie charts can still serve a useful purpose in limited circumstances. The real problem is with 3D pie charts. As far as I can tell, these abominations were popularised by Microsoft's Excel charting software.

3D charts distort the view of the data in such a way that it becomes increasingly hard to understand the information being presented. A picture being worth 1000 words, allow me to demonstrate:
GraphJam3d

So, just how bad is Surrey Police's Pie Chart? In an extremely scientific study of asking half a dozen people, they all guessed between 75% and 85%. That's quite a wide range considering it's a multi-million pound difference.

On the opposite page to the pie chart is this summary of spending.
Police Spending

In slightly more readable format, it is:

Category £ %
Employees £181.70 81.9%
Premises £8.00 3.6%
Supplies £27.20 12.3%
Transport £5.00 2.3%
Total £216.90 97.7%

A few interesting things to note here.

Firstly, how do we calculate the percentages? The total spend isn't mentioned in the report (£216.90). If we use that, "Employees" accounts for 81.9% of spending.

If we take into account the gross expenditure (£207.70) the figure jumps to 87.5%.

Secondly, if we do assume that we're using the unreported total spend - there is at least 2% missing. Some of which can be explained by rounding - but I wonder what the rest of the money is spent on.

Given the above, I don't think the provided pie chart allows Surrey residents to see an accurate view of how their hard earned money is being spent.

Hopefully, this side-by-side - of the above data - will show you how 3D pie charts distort data and end up misleading their audience.
3d 2d pie chart side by side

With this overlay, we can see the distortion much more clearly. The smaller sections of the chart look disproportionately larger.
Pie Charts Overlayed
It's time to announce a zero tolerance crackdown on dodgy data representation.

Rewired State - UK Parliament 2012

This weekend, I went to Rewired State's Parliament hackday. I teamed up with amazing front end designer Max Bye and statistician par excellence John Sandall to create a data visualisation of Parliament's Demographics.

Are the houses representative of the people in terms of gender diversity? Are the Labour Party younger than the Conservatives? Are the parties in the Lords particularly dissimilar?

You can play with the hack at ParliamentDemographics.tk/ or watch a video demonstration.

  • Each bubble represents a political party
  • The size of the bubble represents how many members they have
  • The Y-Axis (Vertical) represents the average age of MPs / Lords
  • The X-Axis (horizontal) represents how gender balanced the parties are

(As you can tell, the hack was heavily inspired by Hans Rosling)

Data

A quick word about the data we used.

The (beta) APIs had some reasonably good documentation - although the examples could have been better. It seemed to assume that a user was already intimately familiar with the (sometimes arcane) principles of Parliament.

It also only spat out XML, so that needed to be converted to JSON.

The main issue we had was with the quality of the data. Let's look at two examples.

First, Linda Perham (picked solely because she's a mate of my mum!)

{
  "FullTitle": "Linda Perham MP",
  "DateOfBirth": "1947-06-29T00:00:00",
  "DateOfDeath": {
    "-xsi:nil": "true",
  },
  "Gender": "F",
  "Party": {
    "-Id": "15",
    "#text": "Labour"
  },
  "House": "Commons",
  "MemberFrom": "Ilford North",
  "HouseStartDate": "1997-05-01T00:00:00",
  "HouseEndDate": "2005-05-05T00:00:00",
  "CurrentStatus": {
    "-IsActive": "False",
    "StartDate": {
      "-xsi:nil": "true",
    }
  }
},

That's pretty comprehensive. We can see when she joined, left, her age, that she's still alive, and who she represents.

Now, let's take George Galloway who has had an... interesting... Parliamentary career.

{
  "FullTitle": "Mr George Galloway MP",
  "DateOfBirth": "1954-08-16T00:00:00",
  "DateOfDeath": {
    "-xsi:nil": "true",
  },
  "Gender": "M",
  "Party": {
    "-Id": "26",
    "#text": "Respect"
  },
  "House": "Commons",
  "MemberFrom": "Bradford West",
  "HouseStartDate": "2012-03-30T00:00:00",
  "HouseEndDate": {
    "-xsi:nil": "true",
  },
  "CurrentStatus": {
    "-Id": "0",
    "-IsActive": "True",
    "Name": "Current Member",
    "StartDate": "2007-10-31T00:00:00"
  }
},

All we have is his current status. It doesn't mention his previous life as a Labour MP, nor does it mention that he was the Respect MP of Bethnal Green in 2005.

For MPs who have subsequently gone to the House of Lords, the data is also unhelpful.

Betty Boothroyd was a Labour MP (for two different constituencies), then became The Speaker of the House of Commons, then went to the House of Lords. This is all the information we have on her.

{
  "FullTitle": "The Rt Hon. the Baroness Boothroyd OM",
  "DateOfBirth": "1929-10-08T00:00:00",
  "DateOfDeath": {
    "-xsi:nil": "true",
  },
  "Gender": "F",
  "Party": {
    "-Id": "6",
    "#text": "Crossbench"
  },
  "House": "Lords",
  "MemberFrom": "Life peer",
  "HouseStartDate": "2001-01-15T00:00:00",
  "HouseEndDate": {
    "-xsi:nil": "true",
  },
  "CurrentStatus": {
    "-Id": "0",
    "-IsActive": "True",
    "Name": "Current Member",
    "StartDate": "2001-01-15T00:00:00"
  }
}

There's also a significant lack of historical data. There are some Lords & MPs in the dataset who were in Parliament in the 1940s - but only a few. It would be great to have a comprehensive record of, say, the last 100 years.

There needs to be a better representation of when a member has "changed" - whether that's affiliation, leaving and then returning, being elevated, changing constituency, or even gender. (Although, as far as I'm aware, there have been no Trans MPs. Nor any MPs with non ASCII characters in their name.)

The data represents a very monochromatic view of the world.

For examining broad trends, it was sufficient for a hackday. We had tried scraping Wikipedia to get full details of every election, but that was a bit beyond us (over 1000 people for every election, plus by-elections, for the last 50 years.)

What We Found

I was particularly surprised by how little gender diversity there is. 50% of the population is female, yet the Labour Party have roughly 33% women MPs. Caroline Lucas is the sole (female) representative of the Green Party - which doesn't quite balance out the entirely male Bishops in the House of Lords.

In our data, you can see the big jump after the 1997 election - where the number female MPs doubled.

Labour are consistently older than the Tories. That was completely against my expectations.

So, play with the hack at ParliamentDemographics.tk/ and see what you notice.

Thanks

As well as my amazing team mates Max Bye and John Sandall, I must thank the team from Rewired State; they put on a storming hackathon. There was plenty of interesting data, a good mix of people, healthy food and drink (as well as the obligatory pizza).

While it would have been lovely to hold the event in Parliament - I appreciate that a hoard of geeks turning up with a panoply of dodgy electronics may not have best pleased the Serjeant-at-Arms. So The Hub Westminster was a fine substitute.

Special mention to Alex Blandford who was very helpful at explaining the data and helping us navigate through the peculiarities of the system.

Finally, massive thanks to the Speaker for this fine certificate.
rewired state 2012 certificate

The Death Of The BlackBerry

For years I was a BlackBerry fanboy. I remember snatching a departing colleague's 6710 and lying to the IT department that I was authorised to have my email on my phone. I never looked back. Despite a brief flirtation with the Nokia N95 - I was a BlackBerry Boy through and through. Until this happened.

Dead BlackBerry

In early March 2010, my beloved BlackBerry Bold took a tumble out of a cab and died. I've been an Android man ever since. Magic, Hero, Nexus, Galaxy - all great phones, but none could hold a candle to the 'berry.

Or so I thought.

A Torch In The Night

A good friend of mine - who updates his phones as frequently as I do - offered me his discarded BlackBerry Torch. How could I refuse a chance to get back to a real phone?
Give Up Android
The main thing that was bothering me about Android was the lack of a physical keyboard and the general instability of the platform and radio software.

The Torch is a phenomenal BlackBerry. The action on the slider is exquisite. The keyboard is a joy to pound away on. The email and calendaring are rock solid with a powerful and practical UI. The browser has improved immeasurably. The range of apps is much broader than a year ago - and includes the all important trifecta of Foursquare, Dropbox, and Expensify.

And I hate it.
Continue reading

OpenTech 2010

A quick report on OpenTech 2010 - the London event for geeks interested in Government data, openness and generally doing good things with tech and data.

Get Excited And Make Things
Copyright Matt Jones used under a Creative Commons non-commercial, attribution, share-alike licence.

I attended last year's event which inspired me to create my "VoteUK" service for the 2010 general election. I had considered doing a talk about the trials and tribulations of using open - and not so open - data. Instead, I gave a more general talk about how to harness the power of the mobile web to empower people - and why iPhone apps are the wrong way to get data to the masses.

More details in a moment. First off, my thoughts on the rest of the presentations.
Continue reading

Unlimited?

ALL YOU CAN EAT BUFFET*

*One plate only, limit of half a sausage per person, no refills, persons weighing over 75Kg will have to pay a supplement, does not include ice-cream.

Doesn't really seem fair, does it?  The Internet industry loves to abuse the word "Unlimited" - the mobile industry is particularly bad.

Despite complaints from the public, the Advertising Standards Authority recently ruled that "unlimited means limited"

We noted that that information showed only a very small proportion of customers on the unlimited data package had exceeded the fair usage data limit of 250 MB per month. We considered that the vast majority of customers were unaffected by the data limit, and we therefore concluded that the fair usage policy did not contradict the claim "includes unlimited data".

Orange have recently faced the wrath of dictionary lovers everywhere by offering strict limits on their "unlimited" service.  Not only is it confusing, Orange are turning off potential customers.

Unlimited Has Two Meanings

To most of us, there are two ways we look at the word "Unlimited"

  1. Without physical limit.  Something will never end.
  2. Without practical limit.  Most people will never get to the end.

How high does a practical limit have to before it can be considered a physical limit?  If your download speed is 8Mbps, and you have 2,592,000 seconds in a month, you could download around 2.5TB.  So, is a limit that large justified as "unlimited"?

I'm a heavy data user. I'm never off Twitter, I regularly upload images to Flickr, I stream video to Qik - I'm a data fanatic.  Yet, most months, I struggle to get close to the 500MB "unlimited" barrier that my price plan offers.  Short of watching YouTube all day, I'm not sure how I could get to that limit.

Arms Race

Advertising is lying, let's make that clear. It's about stretching the truth as far as you legally can.

"UNLIMITED WEEKEND CALLS!" screams one advert.  What can their rival do? "2880 MINUTES OF WEEKEND CALLS" it means the same thing, but looks considerably worse.

What if they offer "1000 MINUTES OF WEEKEND CALLS"?  That's over 16 hours of talk time.  Are you ever likely to get even close to that? No, probably not.  But your brain will say "Hmmmm.... but I might. Better to go with unlimited just to be on the safe side."

Humans dislike limits.  We don't want people telling us to stop.  Even if we'll never reach the limit, we don't want to worry about it.  A buffet restaurant could say "No more than 6 trips to the buffet" - a limit that is probably excessive for most people - yet its next door rival will say "Unlimited trips" and get all the business.

How Much Is A Megabyte?

Try to visualise a Megabyte.  What does it look like to you?  How many Megabytes have you used today?

I'm a geek and I couldn't tell you how many MB I've used today.  I could make a rough guess - but I'd probably be wrong.  For most people, counting MB is an impossible task.  They haven't the faintest idea of how big an image is, whether it's larger than the email they sent or smaller than the last web page they viewed.

So, what can an Internet provider do?

  • Charge per MB and hope that people understand how much they'll be paying every month?
  • Charge per minute.  People understand minutes.  Let's call it €1 for 1 hour's surfing.
  • Charge per session.  Every time you connect to the net - no matter for how long or short - we'll call it €0.50
  • Charge per content. Pages on the BBC are free, pages on CNN will cost you €0.01 per page.

All of these charging schemes are in use throughout the world.  All of them cause confusion.  All of them cause bill-shock.  All of them annoy customers and prevent the uptake of mobile internet services.

The Practical Approach

So, we have customer confusion and an escalation of advertising terms.  What can a mobile Internet company do?  The answer is "Take the practical limit".

Here's a graph I made up - if anyone can point me to some hard data from ISPs, I'd be grateful.

Nonsense, but you get the idea
Nonsense, but you get the idea

Here, 99% of customers use 500MB or less.  In fact, the vast majority use 200MB or less.  So, what should the practical limit be set at?  Less than 1% will ever hit the limit, so they're the only ones who'll be pissed off about unlimited technically being limited.

And, that's what most ISPs do.  Set the limit well above the needs of the majority of their customers.  There's still a limit, but hardly anyone gets near it.

What Can Be Done

Network resources are finite. You can't offer infinite consumption of finite resources.

Customers don't understand Megabytes, sessions, PDP contexts - nor should the have to.

Here are some solutions to the problem - but I'd be interested in hearing how other people think it could be solved.

  • The ASA should clamp down on the use of the word "Unlimited" - make providers explicitly spell out what they are providing.
  • Mobile Internet providers should send users a daily / weekly notice telling them how much they've used and how much it has cost them or how much of their bundle they've got left.  e.g. "Today you used 20MB of data.  You have 480MB left until 01/01/2009"
  • Limits should alter month-by-month as more people use more data services.  The limit this month might be enough to satisfy 99% of customers, but if next month it's only 95% then the limit needs to change.

Disclaimer

I work for a mobile telecoms company which uses the word "unlimited". The views in this blog do not represent those of my employers.   I've not based any of the figures in this post on confidential information.

This post was featured on Heroes of the Mobile Screen.