A UTF-8 Aware substr_replace (for use in App.net)

So, I stayed up bashing my head against a brick wall all last night! PHP's string functions aren't (yet) UTF-8 aware.

This is a replacement for subtr_replace which should work on UTF-8 Strings:

  1. function utf8_substr_replace($original, $replacement, $position, $length)
  2. {
  3.  $startString = mb_substr($original, 0, $position, "UTF-8");
  4.  $endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8");
  5.  
  6.  $out = $startString . $replacement . $endString;
  7.  
  8.  return $out;
  9. }

Take this typical string from App.net

» Hello @bob how are you?

According to App.net's entities, @bob occurs at position 9 and has length of 3.

Normally, we would just use substr_replace.

However, PHP will count any unicode character like "»" as two characters. So it thinks that the position of @bob is 10.

Arse.

So, given we have the position of the substring, and its length, we can use PHP's multibyte functions to split the string in two.

First,

  1. $startString = mb_substr($originalString, 0, $position, "UTF-8");

Gives us:

» Hello @

Secondly,

  1. $endString = mb_substr($originalString, $position + $length, mb_strlen($originalString), "UTF-8");

Gives us

 how are you?

Finally, we stitch them back together

  1. $newString = $startString . $replacement . $endString;

Leave a Reply