You just tweeted this, so I read it for the first time, and I'd like to call you out on the "5ms" statement.
Human reaction to delay is not linear; the difference between 98ms and 103ms (close to the fusion threshold) is far more likely to be seen than the difference between 25ms and 30ms. This leads to a situation where, depending on the rest of the processing in a request, it can be worthwhile optimising out 5ms in the place where it can be optimised, so as to bring the total request down from 102 ms to 97 ms (pushing it over a perceptual boundary for most viewers).
Further, if you have anything quantising time for you, the effect of that 5ms optimisation can be magnified by going over a quantisation boundary - the difference between 32ms and 37ms is small, but on a 60 Hz display refresh rate, that's also the difference between a consistent 33ms per frame, and variation between 50ms and 33ms per frame depending on exactly when in the external 60 Hz clock you hit your 37ms target.
I suspect that "pixel-perfect" design is similar; it's not that any one bit of pixel-perfection makes a particular difference, it's that the combined effect of many small improvements is non-linear, such that a lot of small improvements adds up to a much bigger improvement in feel than you would expect.