Bluetooth MAC, K-Anonymity, and Population Privacy

by @edent | # # # | 3 comments | Read ~125 times.

I recently went to a university hackathon, where students were trying to invent novel ways to help prevent pandemics. This was purely an academic exercise – they were not developing a fully-fledged app, nor were they creating official policies.

I spent some time with one group discussing the privacy implications of what they had built.

Thesis

By monitoring nearby Bluetooth devices, we can tell who has come in to contact with an infectious person.

We can warn people that they may have been exposed, and request that they seek treatment.

Background

Every Bluetooth device has a unique identifier called a MAC address. This ID is a 48 bit serial number in EUI-48 format.

For example: AB:CD:EF:12:34:56

The first 6 bytes of the MAC tell you the manufacturer of the device. These are public records, so you can easily see that a device near to you is, for example, an iPhone or if it is made by Cisco.

Searching

There are loads of apps which will show you every Bluetooth device your phone can see. You can see the signal strength of that device which roughly correlates to distance.
List of Bluetooth devices.

You can also see the name of the device. It might be generic – “Bose Headphones” or it may be specific “Jo Smith’s iPhone 7”.

Use

Imagine you had a similar app running continuously on your phone. It would record the Bluetooth MAC of any device which was close to you for more than, say, 15 minutes.

If, later, it was revealed that the owner of a device was infectious – you could be alerted. The hospital could upload the patient’s phone’s MAC onto a server. Then the server would alert people who had been close to that person while they were infectious.

Privacy

There are a few ways this sort of system could work. Let’s ignore (!) privacy for now.

  1. Upload a list of everyone you’ve seen to a central database. Or…
  2. Send everyone the MAC of an infectious person.

Neither of these are great, are they? I don’t feel comfortable sharing a list of my contacts to a central agency. That feels like a gross invasion of privacy.

Similarly, we don’t want to send everyone in the world an easily-identifiable MAC which could expose the identity of a patient.

K-Anonymity

Enter the magical mathematics of K-Anonymity. Here’s a brief and incomplete explanation:

  1. We identify an infected person and request their MAC address.
  2. The server splits the MAC into two pieces.
  3. The server send everyone the first piece.
  4. If your phone has seen the first piece, send the second piece to the server.
    • You may have seen multiple devices with the first piece, so send all of the relevant 2nd pieces.
  5. If your 2nd piece matches the one on the server, you may be infected and will be alerted.

There are all sorts of other privacy-protecting things we could do…

  • Your phone could send random / misleading data back to the server in response to any query. That would prevent a central agency tracking individuals.
  • Rather than two equal halves, the server could send the first quarter, and your device could respond with the last quarter.
  • The data could be one-way hashed before sending in either direction.

Is this sufficiently private?

The whole range of 48 bits can be stored in 256TB. More storage than you have at home – but easily within the range of a well-financed organisation.

In reality, the address space is much less because not all addresses have been issued.

Using this system, how easy would it be to build up a database of everyone you have spent time with?

An open-source app could be audited to make sure that it wasn’t recording and transmitting anything it shouldn’t. But what’s to stop the server going on a fishing expedition and continually pinging your phone with pieces of a MAC?

How do you ensure that people only install the official app and not something which steals personal information?

Is the central database of infected people’s MACs a target for hackers? How could it be secured?

Are some MAC addresses so unique that even the first few bits are enough to reliably de-anonymise someone?

Is this politically acceptable?

And here’s the kicker. Would you install tracking software on your phone? What if Facebook quietly switched on this feature? What about GDPR? What about…

I politely remind you that this was an idea born out of a 24 hour hackathon, by a small group of students. To my knowledge – this isn’t something being developed for use.

But this sort of app could be built easily. And – if not designed correctly – it could be a privacy disaster.

3 thoughts on “Bluetooth MAC, K-Anonymity, and Population Privacy

  1. Simon Farnsworth says:

    One important thing to think about for privacy that I suspect students won’t is that MAC addresses have internal structure. Depending on whether the assigning company has MA-L (formerly OUI), MA-M, or MA-S prefixes, the first 24, 28 or 36 bits of the address merely identify the assigner, and as long as assignment is dense, companies are allowed to add structure in the remaining bits.

    So, if you divide the address into pieces without considering the patterns, you could end up in a place where (e.g.) the server is actually sending a piece that corresponds to “all owners of an iPhone 11”, or “all owners of a Samsung Galaxy S10 5G”, and the client is mapping out potentially high-value targets for you to steal from.

    Worse, those patterns change over time – when I first looked at all this, MA-L was all there was, and it was called an OUI. If your splitting took (say) 1st octet, 3rd octet and 4th octet from the server, the introduction of MA-S means that you’re now allowing the server to query for groups of 256 MA-S owners at a time, instead of groups of 65536 devices from up to 256 assignees.

    This is a really challenging thing to get right, and you need lots of context to avoid privacy issues.

    1. @edent says:

      That’s exactly right. I suspect you’d have to do this in the same way that HaveIBennPwned works – take a hash of the information first, and compare the first few bytes of that.

  2. Isn’t this also based on the idea that a device always stays with an individual? What happens when a device gets resold to a different person? Great starting point of an idea from a hackathon.

Leave a Reply

Your email address will not be published. Required fields are marked *