Why API responses should be signed

api cryptography · 35 comments · 1,250 words · Viewed ~5,604 times

I'm going to start this discussion with the why and then move on to the how. Let's begin with a couple of user stories.

As the recipient of some data, I want to verify that it hasn't been tampered with.

and

As the recipient of some data, I want to verify who originally published it.

Here's why I think this is important. We are in an era of fake news. A screenshot can be easily altered. A webpage is trivial to edit. But data should be provably true.

Recently, a prominent person's private Twitter messages were leaked. It was presented as a series of JSON files ostensibly directly from the API. But, of course, there was no way to prove the data-dump didn't contain misinformation.

If I call https://api.example.com/?id=123 myself, then I can be reasonably sure that the data has come from the server unaltered.

But what if someone emails me a JSON file? Or someone re-hosts an XML dump? Or a hacker claims to have uncovered proof of...?

As data-journalism gathers pace, it's vital to be able to see whether data is authentic.

Suppose you see a news site claiming that your favourite sports star endorses the opposing team. They have an image of a tweet. But you wouldn't trust an image of a website (I hope) without a checking a live link. Right?

Fake tweet showing the Victoria Government announcing I am King of Australia.

But in the data world, we get no such assurances.

Here's data from a Tweet I've just faked:

 {
  &quot;created_at&quot;: &quot;Wed Oct 10 20:19:24 +0000 2018&quot;,
  &quot;id&quot;: 105011862119892123,
  &quot;text&quot;: &quot;Terence Eden is now officially the King of Australia&quot;,
  &quot;user&quot;: {
    &quot;id&quot;: 6253282,
    &quot;id_str&quot;: &quot;6253282&quot;,
    &quot;name&quot;: &quot;Australian Government&quot;,
    &quot;screen_name&quot;: &quot;AusGovOfficia1&quot;,

I've just made that up. But you've got no way to prove that it did not come from Twitter. Perhaps it was a real Tweet which was subsequently deleted?

Where this matters

Do you rely on data for your job? If someone gives you a scrap of data claiming to be from a website, how can you prove it is authentic?

Perhaps a Police API tells you what the crime rate is, or a betting API tells you what the odds are, or a news API tells you who was convicted of a crime. Or a source leaks information from a social network. How can you prove that the data is authentic and unaltered? Or someone else could present false data as coming from a trusted source. Or a trusted source could publish fake data and later claim they didn't.

Here's the end result that I want.

Given a piece of data,
Prove that where it came from and,
Prove that it hasn't been altered.

How to do it

This is where I get stumped. I can clearly see the need for this. Disinformation is an attack on civilisation. But I can't see how to do it in a seamless and easy to understand way.

There have been lots of proposals for how to do this.

I think it boils down to three distinct design choices.

Keep the signature and the message separate.
Include the signature in the message.
Only sign part of the message.

Separation of Powers

The first is conceptually easiest.

The API call example.com/?id=123 returns some JSON containing the data requested.
The API call example.com/?id=123&signed=true returns the signature only.

It relies on a few assumptions. Mostly that a GET operation is Immutable in terms of what it returns. That is, the data returned from a query is always the same.

It reduces complexity for the server and client. If the client doesn't care about verification, the server doesn't have to calculate it.

It might be complicated to store the information. You would need to keep response.json and response.signed.json as separate files.

It might be fragile. If a single bit or byte of the data is altered - say tabs get converted to spaces - then the signature could become invalid.

If you want someone to verify an API response, they need two separate files and need to load them in the right order into any verification system. Again, not an impossible task, but might make things more complicated than necessary.

Inclusivity

Does the set of all sets include itself? This isn't just a philosophical problem. If a signature is included in a response, does the signing process have to take account of the signature?

The usual way around this is to encode the result first. So, imagine your API response was BASE64 encoded and then signed. It might look like this:

 {
   &quot;response&quot;: &quot;data:text/plain;base64,SGVsbG8sIFdvcmxkIQ%3D%3D...&quot;,
   &quot;signature&quot;: &quot;0f60066673e0de....&quot;
}

That keeps the document and signature together, but does have some disadvantages. It loses the human-readable nature of the response. It also means that your code has to BASE64 decode the response before interpreting it. Both of those are annoying for debuggers - but not critical issues.

Partial

What if we only sign the critical parts of the response, rather than all the metadata? This is perhaps the easiest to implement, but has some drawbacks.

 {
  &quot;created_at&quot;: &quot;Wed Oct 10 20:19:24 +0000 2018&quot;,
  &quot;id&quot;: 105011862119892123,
  &quot;text&quot;: &quot;Terence Eden is now officially the King of Australia&quot;,
  &quot;text-sig&quot;: &quot;0f60066673e0de....&quot;,
  ...

That is, only sign the text. Or perhaps text + ID. Or any useful subset of the API response. It's simple to parse for humans and computers. But the metadata might be a crucial part of the information we want to verify. Knowing the date something was published, for example, could be as important as the content itself.

Put it all on the blockchain!!!!!

No.

Well... OK. Maybe this is a use for a Merkle Tree or similar. If every API response has an ID, then a cryptographic hash could be appended to a public ledger. But that has all the disadvantages of the previous schemes without any real benefit.

Security is hard

The problem with all of the above approaches is that the API provider needs to safely and securely manage cryptographic keys. That's expensive and complicated. If an attacker got access to the keys, they could issue fake statements.

But, the good news is that the tech tech is here and it works. JOSE is a suite of specifications - including JSON Web Tokens - which are well used across the Internet. Mostly for signing OAuth logins.

So, can we use it?

Will anyone care?

I suppose this is the crucial question. Right now I can easily fake a screenshot and the majority of people won't even look to see if the information is true - no matter how incredible the claim is.

But we need to build a better foundation for the future. One where information brokers can say "I have proof that this exact data came from this specific website at this precise time."

We have to believe that truth and trust are important.

35 thoughts on “Why API responses should be signed”

Andy Mabbett

Great post, your Majesty.

Reply 2020-01-17 12:42
Dan Brickley

how about signed HTTP exchanges? tools.ietf.org/id/draft-yassk… @jyasskin

Reply | Reply to original comment on twitter.com 2020-01-17 12:56
Steven Pears

Good read. Seeing signatures used more and more actually, I wonder from a tech point of view if that’s trust or easier to consume?

Slack and Alexa both put the sig data in the API response headers so the entire body is signed, always going to be based on trust somewhere

Reply | Reply to original comment on twitter.com 2020-01-17 12:56
1. George
  
  The requirement for the first case isn't idempotence. It might be that the request is idempotent in that it doesn't affect the resource, but the resource might still change over time! Good APIs might make guarantees about some URIs always referring to static resources, but certainly not all resources should be so.
  
  It's a hard problem. See also SAML: in principle every interesting fact is an assertion. Other more modern things in that space exist too.
  
  Reply 2020-01-17 18:12
Steven Pears

I think that's the use case with public data or webhooks where you need verification to move forward.

APIs with authorization it seems to be missing almost entirely - I think that has to catch up. I hope it does, makes my cynical and security focused nature less twitchy 😁

Reply | Reply to original comment on twitter.com 2020-01-17 13:31
Dan Brickley

much of the point of killing of SOAP-style web services in favour of REST APIs was to take advantage of these similarities, so let's try to find out. Your use case might be a motivation for long-term rather than ephemeral signing of sessions...

Reply | Reply to original comment on twitter.com 2020-01-17 14:03
Rosalie Marshall

Learnt a lot from this. Do you know if we have published guidance on this from NCSC?

Reply | Reply to original comment on twitter.com 2020-01-17 14:06
Terence Eden

I don't think so, no. All a bit new and experimental.I wanted to include electronic signatures in published docs on GOVUK - but I think it's too confusing at the moment.

Reply | Reply to original comment on twitter.com 2020-01-17 14:07
Ivan Dilber

What is the safe flow for a client to verify such signatures? If we use the same api service to fetch the public key then we're open to the same level of spoofing as with the api data, attacker can just send us their key instead.

Reply | Reply to original comment on twitter.com 2020-01-17 14:35
Simon_Lucy

Validate the chain with the CA.

en.m.wikipedia.org/wiki/Certifica…

Reply | Reply to original comment on twitter.com 2020-01-17 14:55
@rigo@mamot.fr

Oh, going back from the approved to the experimental only because of trend issues. The functionality required always existed in SOAP-style via xml sig and xml enc, xacml & saml. Not that I like SOAP-style, but we need to remain fair IMHO

Reply | Reply to original comment on twitter.com 2020-01-17 16:04
Dan Brickley

I don’t believe I was unfair

Reply | Reply to original comment on twitter.com 2020-01-17 17:10
@rigo@mamot.fr

Nothing that furthers Linked data can ever be unfair. 🤓 Still the XML world wasn't all dumb. That was my point.

Reply | Reply to original comment on twitter.com 2020-01-17 17:57
Andy Bennett

We had a robust way of doing this in the GOV.UK Registers project but we encountered 3 main problems.

Reply | Reply to original comment on twitter.com 2020-01-17 18:02
Andy Bennett

1) Publishers really didn't like the idea of being on the hook for absolutely everything.Transitive trust scared them silly.It's one thing to get something via https from gov.uk.It's quite another to be able to prove to someone else that you got it there.

Reply | Reply to original comment on twitter.com 2020-01-17 18:03
Andy Bennett

1a) What exactly does the signature tell you?+ That the statement is correct? Mistakes happen.+ That the provenance is correct? Leaks of sensitive data happen but this means that anyone with a copy can republish it with your signature intact.+ That a process was followed?...

Reply | Reply to original comment on twitter.com 2020-01-17 18:04
Andy Bennett

2) Consumers didn't understand the idea of verifying the signature. They got it from the source and they were happy. They didn't understand the threat model and trusted their data suppliers.

Reply | Reply to original comment on twitter.com 2020-01-17 18:05
Andy Bennett

2a) Consumers didn't know how to verify the signature. It was extra work so they didn't bother. Once they had the data they'd done enough.The few people we saw who tried to use the signatures didn't user them robustly.

Reply | Reply to original comment on twitter.com 2020-01-17 18:06
Andy Bennett

3) IdentityHow do you know who each key belongs to? What is each key for?The underlying certificate problem is still not solved well enough for the signatures to mean much other than in very limited situations between a small number of parties.

Reply | Reply to original comment on twitter.com 2020-01-17 18:07
Andy Bennett

3a) How do you balance the concerns of key reputation vs. key longevity?How do you protect the identity of the underlying Civil Servants but still include accountability and the right granularity of audit trail?

Reply | Reply to original comment on twitter.com 2020-01-17 18:08
Andy Bennett

There are a number of problems with embedding the signature inside the data structure being signed.There has been a lot of fallout in the XML communities (especially SAML) about this and the JSON communities are just starting to rediscover the issues.

Reply | Reply to original comment on twitter.com 2020-01-17 18:10
Andy Bennett

"I have proof that this exact data came from this specific website at this precise time."

It's about risk and trust. For the vast majority of use cases, being able to point to a snapshot at archive.org is usually sufficient.

Reply | Reply to original comment on twitter.com 2020-01-17 18:13
Terence Eden

Exactly! You have been far more eloquent than I could have been.

Reply | Reply to original comment on twitter.com 2020-01-17 18:25
Endless Mason

Also JWT should be discouraged. Mostly because of foot guns:

paragonie.com/blog/2017/03/j…

Reply | Reply to original comment on twitter.com 2020-01-17 18:53
Jeni Tennison

@johnlsheridan might have something to say on this. See also theodi.org/article/archan… which isn’t quite the same, but relevant

Reply | Reply to original comment on twitter.com 2020-01-18 07:17
Roberto Polli

@RosalieMarshall We started a discussion on that inside the http-wg and support from other gov depts is welcome.

Reply | Reply to original comment on twitter.com 2020-01-18 09:22
Roberto Polli

The http-wg thread is lists.w3.org/Archives/Publi… if you want to talk on that, I'm available.

Reply | Reply to original comment on twitter.com 2020-01-18 12:04
@pndc

Some of the clumsiness in deciding what specific byte stream to sign seems to come down to the use of JSON. BitTorrent solves that by using bencode: part of the protocol relies upon an arbitrary structure being round-trippable and hashable back to the same value.

Reply | Reply to original comment on twitter.com 2020-01-18 13:23
Hacker News

API responses should be signed: shkspr.mobi/blog/2020/01/w… Comments: news.ycombinator.com/item?id=220740…

Reply | Reply to original comment on twitter.com 2020-01-20 07:10
raupach

Good question, how does one verify the response of a JSON API? Always thought digital certificates are enough to verify the sender.shkspr.mobi/blog/2020/01/w…

Reply | Reply to original comment on twitter.com 2020-01-20 07:42
Gregory Magarshak

So basically you are talking about this:

https://www.w3.org/TR/vc-data-model/

Reply 2020-01-20 07:50
1. @edent
  
  Sort of. Although VC is now Verifiable Credentials. And not all API responses are credential-like.
  
  Reply 2020-01-20 08:18
Alyssa Ross

Perhaps it was a real Tweet which was subsequently deleted?

This is exactly why Twitter can’t (or at least probably shouldn’t) implement signed API responses. A nice property of Twitter as it stands is that, if you delete a post, it is no longer possible to prove that you ever posted it. Your best option would be to use something like the Wayback Machine, but even then you can’t prove it.

If they signed their API responses, then suddenly deleting a post would become less effective. If you’d captured the signature, you could prove that the post did exist with that content. This isn’t necessarily desirable (see non-repudiation1). In fact, this is something that many secure communication systems are designed to make impossible (TCP, Axolotl (Signal Protocol), etc.).

Whether Twitter should offer Delete functionality across the board is another discussion, but while they do, implementing signed API responses would hobble it.

Reply 2020-01-20 15:51
AlisonW

Coincidentally I recently saw a heavily-commented-upon tweet which looked a bit unbelievable so I searched through the alleged OT's stream to find if it was actually posted by them. It wasn't. Two things arise from that though; firstly that in many ways it ceased to matter - it now had a life of its own and accuracy had ceased to be relevant - and secondly that there was no obvious way to correct everyone else's misunderstanding.
The cartoon about "someone is wrong on the internet and I must tell them" might well be accurate (certainly is for me on occasion!) but there is little-to-nothing which can be done. I believe the similar situation applies with data as, unless the original source promulgates the security as an integral component of the data, it can be separated and lost far too easily.

Reply 2020-01-21 01:48
John Bob

Seems like the perfect use-case for HTTP headers (e.g., Slack does this).

The header (e.g. something like X-Signature) would contain the signature for the request body. (Note that you have to format the JSON response exactly the same way for the signature to work – simple way would be “sorted keys, zero whitespace”)

Reply 2020-01-21 08:15