Why are QR Codes with capital letters smaller than QR codes with lower-case letters?


Take a look at these two QR codes. Scan them if you like, I promise there's nothing dodgy in them.


QR CODE   QR Code.


Left is upper-case HTTPS://EDENT.TEL/ and right is lower-case https://edent.tel/

You can clearly see that the one on the left is a "smaller" QR as it has fewer bits of data in it. Both go to the same URl, the only difference is the casing.

What's going on?

Your first thought might be that there's a different level of error-correction. QR codes can have increasing levels of redundancy in order to make sure they can be scanned when damaged. But, in this case, they both have Low error correction.

The smaller code is "Type 1" - it is 21px * 21px. The larger is "Type 2" with 25px * 25px.

The official specification describes the versions in more details. The smaller code should be able to hold 25 alphanumeric character. But https://edent.tel/ is only 18 characters long. So why is it bumped into a larger code?

Using a decoder like ZXING it is possible to see the raw bytes of each code.

UPPER

20 93 1a a6 54 63 dd 28   
35 1b 50 e9 3b dc 00 ec
11 ec 11

lower:

41 26 87 47 47 07 33 a2   
f2 f6 56 46 56 e7 42 e7
46 56 c2 f0 ec 11 ec 11  
ec 11 ec 11 ec 11 ec 11
ec 11

You might have noticed that they both end with the same sequence: ec 11 Those are "padding bytes" because the data needs to completely fill the QR code. But - hang on! - not only does the UPPER one safely contain the text, it also has some spare padding?

The answer lies in the first couple of bytes.

Once the raw bytes have been read, a QR scanner needs to know exactly what sort of code it is dealing with. The first four bits tell it the mode. Let's convert the hex to binary and then split after the first four bits:

Type HEX BIN Split
UPPER 20 93 00100000 10010011 0010 000010010011
lower 41 26 01000001 00100110 0100 000100100110

The UPPER code is 0010 which indicates it is Alphanumeric - the standard says the next 9 bits show the length of data.

The lower code is 0100 which indicates it is Byte mode - the standard says the next 8 bits show the length of data.

Type HEX BIN Split
UPPER 20 93 00100000 10010011 0010 0000 10010
lower 41 26 01000001 00100110 0100 000 10010

Look at that! They both have a length of 10010 which, converted to binary, is 18 - the exact length of the text.

Alphanumeric users 11 bits for every two characters, Byte mode uses (you guessed it!) 8 bits per single character.

But why is the lower-case code pushed into Byte mode? Isn't it using letters and number?

Well, yes. But in order to store data efficiently, Alphanumeric mode only has a limited subset of characters available. Upper-case letters, and a handful of punctuation symbols: space $ % * + - . / :

Luckily, that's enough for a protocol, domain, and path. Sadly, no GET parameters.

So, there you have it. If you want the smallest possible physical size for a QR code which contains a URl, make sure the text is all in capital letters.


Share this post on…

  • Mastodon
  • Facebook
  • LinkedIn
  • BlueSky
  • Threads
  • Reddit
  • HackerNews
  • Lobsters
  • WhatsApp
  • Telegram

8 thoughts on “Why are QR Codes with capital letters smaller than QR codes with lower-case letters?”

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.

Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">