Big Data As A Lethal Weapon
Yesterday I attended an OII talk on the Ethical Treatment of Data in New Digital Landscapes.
Amy O'Donnell from Oxfam lead a discussion about how the charity is seeking to improve the way that Aid Agencies deal with the data they collect.
Oxfam collects data for many different reasons - sometimes it is incidental (for example the bank account details it needs to make payments), sometime it is deliberate (for example when conducting a survey about how aid is used).
Protecting personal data is often a legal requirement - although if you're a British NGO storing data about people from Philippines on a Cloud service hosted in the USA, knowing which data protection principles to follow is not simple! For Oxfam, protecting personal data is a necessity - the data they deal with often involves ethnicity, corruption, fragility, abuse, and human misery. Every datum could be a weapon in someone else's hand.
Data is like energy; it can never truly be destroyed. At best it can be manipulated until it is hard to reconstruct. This is why data collectors have to be wary about how they create data to ensure that they're not leaving a trail for malicious actors to follow.
How do we make this stuff interesting?
At a high level, most organisations know that keeping personal data private is sacrosanct - but how do they convey the message to those collecting the data and those from whom the data are being collected?
Instilling a sense of responsibility needs to be a priority for agencies. OII researcher Sanna Ojanperä talked about designing research methodologies in such a way that one minimises the amount of "dangerous" data which is available. Two particularly good resources she mentioned were the Research Ethics Guidebook and the free ebook "Ways to Practise Responsible Development Data".
In my opinion, what's just as necessary is ensuring that the people on the ground not only understand why this is important - but have the tools to support them. One young researcher at the event bemoaned the fact that the secure storage service he had to use was far less convenient than Google Drive. It may be less safe, but it's easier.
Who can blame him? Users dislike sub-standard tools. And when a phone (the primary data capture device) offers such effortless access to an easier way of doing things, it is natural that researchers will gravitate to a moderately more convenient method rather than following the strict and restrictive best practices.
Researches also need to understand that nearly everything is data! An excellent example of this is photography. The meta-data in a digital photograph can be used to determine the exact location of where the photo was taken, which direction the photographer was facing, which camera was used, what time the photo was taken, not to mention that the photograph itself contains the ethnicity of the subjects and who they associate with. When a researcher puts a photograph of their subjects up on their Facebook page, they may unwittingly be exposing multiple facets of personal data!
Does an elderly man, who has never used the Internet, fully understand the consequences of having his photo taken? Photos are tiny scraps of paper which live in a book and are rarely seen, aren't they?
This brings us round to the issue of "Informed Consent".
Informed Information
Most research guidelines are explicit in that subjects have to give informed consent. They must understand what data are being collected and to what purpose they will be used. This brings with it several major problems.
What Are Known Unknowns?
Do people who rarely, if ever, use computers fully understand exactly what their information can be used for? Given that data researchers often don't know what they will find, or how research techniques will develop, can they ever truly inform a participant about how the data will be used?
Data Reuse
Suppose, in 30 years time, a researcher wants to use this data set? Do participants have to be contacted again if the use is significantly outside that of the original research? What if the data were successfully pseudo-anonymised? Are we to accept that this data set is now off limits no matter what valuable information it may contain?
Data Sharing
Data does need to be shared sometimes. It could be via a court order, it could be via a reciprocal agreement with another aid agency.
We've all heard tales about private data thrown onto a USB stick and then being lost. Once data are shared with a third party, how much control do you hold over how it is used? What if multiple pseudo-anonymised data sets are gathered and then used to de-anonymise the data?
Withdrawal
Subjects are given the right to withdraw at any time. That's a sensible policy which will hopefully reassure participants who may feel uncomfortable about their data being misused. But withdrawal has its downsides. A user with a pseudo-anonymised data set may be able to correlate changes to the data with known changes to the participant lists.
Convincing your cousin to drop out of the research and you stand a good chance of working out which "anonymous" participant she was.
Does withdrawing from future collection also mean that past data ought to be removed? Participants may think so. That could lead to irreparable destruction of data rendering the research useless.
The Dangers Of Digital
Doing fieldwork with a laptop or a mobile phone has many advantages - near instant analysis for one - but it's not without its problems.
For a start, expensive electronic equipment can be an attractive target for thieves. While they may "only" want your iPhone, they may end up with a cache of highly confidential, unencrypted, personal information!
Asking people to fill in forms using a computer may unfairly skew the demographics of the data. Younger or wealthier people may be over-represented.
Even though mobile data signals penetrate most of the world, their are still large swathes without any coverage. For time sensitive data - or even just backup purposes - this can be a critical impediment.
Most countries with Data Protection laws allow data subjects to request data about themselves. In the UK this is known as a Subject Access Request. Despite the advances in technology, it may not be possible for a remote worker to access such information.
Finally, massive amounts of data available instantly may do more hard then good in the hands of non-professionals. While a village may be able to quickly see data on how aid money is benefiting them, do they have the statistical experience to understand and correctly interpret the data? That may sound patronising, but it is of real concern to people who produce data driven reports - they have to assume a baseline understanding of, say, standard deviations which may not be present in those looking at the data.
The Data Tightrope
Data is power. It can be used for good, or it can be a volatile source of wanton destruction.
Collecting and using data is as much about understand and mitigating risks as it is about physically collecting information.
There is a timeline for data - from pre-life (how should I collect this?) to after-life (can I dispose of this data when I'm done with it?) - every single moment of the data's existence should be carefully considered and controlled.
Do risks mean we shouldn't collect data? No - it is too valuable for that. But researchers have to remember that this isn't their data. Data are like the Goblin forged objects in Harry Potter:
Bill: "To a goblin, the rightful and true master of any object is the maker, not the purchaser. All goblin-made objects are, in goblin eyes, rightfully theirs." Harry: "But if it was bought —" Bill: "— then they would consider it rented by the one who had paid the money. They have, however, great difficulty with the idea of goblin-made objects passing from wizard to wizard." Harry Potter and the Deathly Hallows
As mere custodians of people's data, we have solemn responsibility to ensure that it is used only for good, and treated with the utmost care and respect.
You can read more about Oxfam's policies, and give your feedback, at http://policy-practice.oxfam.org.uk/.
One thought on “Big Data As A Lethal Weapon”
Trackbacks and Pingbacks
What are the risks when people withdraw data you’ve collected about them? — Responsible Data Forum:
[…] Amy O’Donnell from RDF partner organisation Oxfam led a session on the Ethical Treatment of Data in New Digital Landscapes at the Oxford Internet Institute in February. Terence Eden blogged about the event here, raising questions including what could happen if a person decides to withdraw their data from a research project: […]