Strange Encoding Errors in TOTP QR Codes
Not really a security issue, but one which I thought was worth highlighting. It shows the peril of slightly vague specifications.
When you scan a 2FA token into your authenticator app via QR code, you get presented with a bunch of information about your account. This lets you store things like the issuer and the account name.
I recently scanned a code, and it displayed my name as Terence+Eden
. Which was a bit weird. Try it yourself:
Checking the raw output of the code, shows the underlying data is:
otpauth://totp/
Private%20Bank%20Account:
Your+Name+Here?
secret=abcdefghijklmno&
digits=6&
issuer=Cayman%20Island%20Bank&
algorithm=SHA1&
period=30
The One Time Password information is encoded as a URl. Generally speaking, a URl cannot contain spaces. So they get encoded. But how should they be encoded?
The HTML specification, RFC1866, talks about when a space should be encoded as +
.
The form field names and values are escaped: space characters are replaced by
+
, and then reserved characters are escaped … that is, non-alphanumeric characters are replaced by%HH
8.2.1. The form-urlencoded Media Type
That is, a space becomes a plus only when it is submitted as part of a form. Not as part of a GET
request.
This is slightly confused by a later section:
The keywords are escaped according … and joined by
+
. For example … the user provides the keywordsapple
andberry
, then the user agent must access the resourcehttp://host/index?apple+berry
7.5. Queries and Indexes
But, in the case of OTP, the user's name is not multiple keywords. It is a single datum.
So it should use %20
, that's pretty obvious - right?
Well… not quite. You'll find many people (correctly) saying that path fragment of a URl doesn't have to be percent encoded.
This example is perfectly valid:
https://example.com/My+Cool+Page.php?username=Terence%20Eden
And, as we can see in the above optauth
scheme, the user's name does come before the ?
!
OK, let's take a look at the OTP scheme standard. The original is an archived Github page from Google.
The label is used to identify which account a key is associated with. It contains an account name, which is a URI-encoded string, optionally prefixed by an issuer string identifying the provider or service managing that account.
So that gives us Issuer:Name
. But, as we've seen above "URI-encoded" is a little ambiguous. It goes on to say "optional spaces may precede the account name" and gives the ABNF as label = accountname / issuer (“:” / “%3A”) *”%20” accountname
But, again, that doesn't say anything about how spaces inside the account name should be encoded.
Finally, it gives some non-normative examples, including: Provider1:Alice%20Smith
which shows a space being percent encoded.
Summary
The confusion arises, I think, because the label is not part of the query string. If it were, it would be obvious that it should have percent encoding applied to it.
But because it appears before the ?
, it looks like it is part of the pathname. Therefore, some encoding libraries - and some humans - get a little confused.
I contacted a few organisations who had made this mistake - and they were quick to fix it.
AlisonW 💜🦄⚾☕🧵🎹♿🏳️🌈🇪🇺🇬🇧 said on mastodon.social:
@Edent /me sees 'TOTP' and wonders why Top of the Pops got in the conversation.
rjc says:
Why are you using HTML specification here? Why not "Uniform Resource Identifier (URI): Generic Syntax" (RFC 3986)? Unless I'm reading it wrong, it is quite clear that 'sub-delims' can be used in 'userinfo', no? https://datatracker.ietf.org/doc/html/rfc3986#appendix-A