This is probably me being a bit dense. I’m on a video call with two other people. Alice is on the left of my screen, Bob is on the right. Why isn’t the audio in stereo?
(Zoom lets you send stereo audio – but only of you have a stereo microphone. Whereas I’m talking abound individuals sending mono and receiving to stereo.)
Every video conference system I’ve used delivers the audio in mono. That’s fine when one person is speaking, but gets really muddy and confusing when multiple people are speaking. So why not deliver stereo audio? Put Alice in my left ear and Bob in my right.
It would improve the audio quality, make it easier to hear multiple participants, and give a more immersive feel to tedious calls.
Individual audio would also mean that I could control the volume of participants with dodgy mics and reduce the bass on those with boomy voices. What’s not to like?
So, what are the reasons not to do this?
Bandwidth? Opus, the audio codec used by most systems, uses about 16kbps for mono speech. So, call it 32kbps for stereo.
A 720p video call, according to Zoom, requires 1.5Mbps. Adding another audio stream – or even half a dozen – would add a negligible amount of data. So it can’t be that.
Computational complexity? If the client (the web browser) is organising the order of the video on screen, it can organise the audio. There’s a stereo panning API built right in to the browser.
If the server is sending out the video pre-arranged – then is it really a chore to mux a stereo stream rather than mono? Maybe it’s a bit of work, but that’s nothing compared to re-encoding video.
Patents? I know software patents are ridiculous – but surely stereo audio isn’t held by an IP Troll, right?
Do users find it confusing? Maybe. But lots of podcasts have stereo audio with a host in each ear. Every TV show is broadcast in stereo (or better). I think people could handle it. And, if not, have a big ol’ MONO button on screen.
Come on Zoom, Skype, Teams, Meet, Jitsi et al! Give the people what they want! An audio feed for each ear!
Now, I’m being slightly disingenuous here. There are some video call services which use spatial audio. Here’s the Dobly service:
The HighFidelity.com service also does spatial audio:
I think both of those are much more immersive. Far easier to listen to. I’m not sure if I want to put my work conferences though a 5.1 surround sound system just yet – but even basic stereo is a vast improvement on boring mono.
So what’s stopping the more mainstream services from offering this?