A UTF-8 Aware substr_replace (for use in App.net)

by @edent | # # # # | Read ~1,043 times.

So, I stayed up bashing my head against a brick wall all last night! PHP's string functions aren't (yet) UTF-8 aware.

This is a replacement for subtr_replace which should work on UTF-8 Strings:

function utf8_substr_replace($original, $replacement, $position, $length)
{
    $startString = mb_substr($original, 0, $position, "UTF-8");
    $endString = mb_substr($original, $position + $length, mb_strlen($original), "UTF-8");

    $out = $startString . $replacement . $endString;

    return $out;
}

Take this typical string from App.net

» Hello @bob how are you?

According to App.net's entities, @bob occurs at position 9 and has length of 3.

Normally, we would just use substr_replace.

However, PHP will count any unicode character like "»" as two characters. So it thinks that the position of @bob is 10.

Arse.

So, given we have the position of the substring, and its length, we can use PHP's multibyte functions to split the string in two.

First,

$startString = mb_substr($originalString, 0, $position, "UTF-8");

Gives us:

» Hello @

Secondly,

$endString = mb_substr($originalString, $position + $length, mb_strlen($originalString), "UTF-8");

Gives us

 how are you?

Finally, we stitch them back together

$newString = $startString . $replacement . $endString;

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.