Yes, the pre-composed (normalized) characters may be trickier to find in text, but I think that's why the Unicode docs devote a whole section/chapter to sorting (which involves text comparison just like searching.) I believe that the normalized canonical representation of characters (https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms) is likely the most concise, but it's probably best to always rely on library code for text comparisons than just byte-by-byte comparisons (or strncmp).