"Alexa… timer for fifteen minutes."
The problem with the English language is that it is full of homophones, or semi-homophones. 15 and 50 sound basically the same. Humans have a hard time distinguishing them. So there's no wonder that voice assistants also have difficulty.
Recently, I've noticed that my wife and I have adopted a very specific accent when talking to our Alexa. Certain constants are emphasised, phonemes are executed with precision, and pauses between words subtly lengthened - all to ensure the artificial "intelligence" can catch our drift.
"Alexa… turn on the downstairs… lights."
A decade ago I noticed that some voice to text services worked better if I spoke with a generic American accent, rather than my delightful British accent. It seems things haven't got much better in the intervening years. Maybe voice communications are just too hard a problem to solve? They require such massive amounts of context and symbolic awareness that computers simply aren't there yet.
So we adapt. We adapt our speech to better fit our tools. Just like American toddlers adopt British accents from watching Peppa Pig, children around the world will grow up learning that you have to speak clearly in order to be understood by humdrum machines.
No human would speak to another human like this, would they?
Somehow, we've internalised the idea that the way computers listen to us is fundamentally different from the way that humans listen to us.
Perhaps this is the answer - we need to adopt a specific mode of speaking. Lest we mistake robots for people.