Why are QR Codes with capital letters smaller than QR codes with lower-case letters?
Take a look at these two QR codes. Scan them if you like, I promise there's nothing dodgy in them.
Left is upper-case HTTPS://EDENT.TEL/
and right is lower-case https://edent.tel/
You can clearly see that the one on the left is a "smaller" QR as it has fewer bits of data in it. Both go to the same URl, the only difference is the casing.
What's going on?
Your first thought might be that there's a different level of error-correction. QR codes can have increasing levels of redundancy in order to make sure they can be scanned when damaged. But, in this case, they both have Low error correction.
The smaller code is "Type 1" - it is 21px * 21px. The larger is "Type 2" with 25px * 25px.
The official specification describes the versions in more details. The smaller code should be able to hold 25 alphanumeric character. But https://edent.tel/
is only 18 characters long. So why is it bumped into a larger code?
Using a decoder like ZXING it is possible to see the raw bytes of each code.
UPPER
20 93 1a a6 54 63 dd 28
35 1b 50 e9 3b dc 00 ec
11 ec 11
lower:
41 26 87 47 47 07 33 a2
f2 f6 56 46 56 e7 42 e7
46 56 c2 f0 ec 11 ec 11
ec 11 ec 11 ec 11 ec 11
ec 11
You might have noticed that they both end with the same sequence: ec 11
Those are "padding bytes" because the data needs to completely fill the QR code. But - hang on! - not only does the UPPER one safely contain the text, it also has some spare padding?
The answer lies in the first couple of bytes.
Once the raw bytes have been read, a QR scanner needs to know exactly what sort of code it is dealing with. The first four bits tell it the mode. Let's convert the hex to binary and then split after the first four bits:
Type | HEX | BIN | Split |
---|---|---|---|
UPPER | 20 93 |
00100000 10010011 |
0010 000010010011 |
lower | 41 26 |
01000001 00100110 |
0100 000100100110 |
The UPPER code is 0010
which indicates it is Alphanumeric - the standard says the next 9 bits show the length of data.
The lower code is 0100
which indicates it is Byte mode - the standard says the next 8 bits show the length of data.
Type | HEX | BIN | Split |
---|---|---|---|
UPPER | 20 93 |
00100000 10010011 |
0010 0000 10010 |
lower | 41 26 |
01000001 00100110 |
0100 000 10010 |
Look at that! They both have a length of 10010
which, converted to binary, is 18 - the exact length of the text.
Alphanumeric users 11 bits for every two characters, Byte mode uses (you guessed it!) 8 bits per single character.
But why is the lower-case code pushed into Byte mode? Isn't it using letters and number?
Well, yes. But in order to store data efficiently, Alphanumeric mode only has a limited subset of characters available. Upper-case letters, and a handful of punctuation symbols: space $ % * + - . / :
Luckily, that's enough for a protocol, domain, and path. Sadly, no GET parameters.
So, there you have it. If you want the smallest possible physical size for a QR code which contains a URl, make sure the text is all in capital letters.
Should be the reverse damn it, ruins any jokes about uppercase QR codes being bigger…
@blog I used this a while ago to make a QR code that needed to be tiny, saved a noticeable bit of space 👍🏻
@blog yeh, so many people have no clue about details like this. But I have always done this for URLs in QR codes for this very reason.
@blog 11 bits for a character encoding with 45 symbols, that's clever. (45*45 = 2025 possible two-character symbols, out of 2048 representable by 11 bits)
@Edent
Very cool observation.
Probably a subtle effect of compressibility. Looking at the ASCII table, uppercase letters up to "m" have one less "1" bit than lowercase.
Combined with statistical frequency of the first 16 letters vs the last 10 could make a difference.
Cool! Will remember to do this!
@Edent this is one of the reasons bitcoin switched to a new address format. even though base58 mixed-case addresses were shorter, their qr codes were bigger than the new case-insensitive ones based on base32.
@Edent of course, this only works if the URL has no path after the origin, or if the server treats paths as case-insensitive. So test the code before printing it on a card/poster/something else.
More comments on Mastodon.