Why don't video calls have stereo audio?
This is probably me being a bit dense. I'm on a video call with two other people. Alice is on the left of my screen, Bob is on the right. Why isn't the audio in stereo?
(Zoom lets you send stereo audio - but only of you have a stereo microphone. Whereas I'm talking abound individuals sending mono and receiving to stereo.)
Every video conference system I've used delivers the audio in mono. That's fine when one person is speaking, but gets really muddy and confusing when multiple people are speaking. So why not deliver stereo audio? Put Alice in my left ear and Bob in my right.
It would improve the audio quality, make it easier to hear multiple participants, and give a more immersive feel to tedious calls.
Individual audio would also mean that I could control the volume of participants with dodgy mics and reduce the bass on those with boomy voices. What's not to like?
So, what are the reasons not to do this?
Bandwidth? Opus, the audio codec used by most systems, uses about 16kbps for mono speech. So, call it 32kbps for stereo.
A 720p video call, according to Zoom, requires 1.5Mbps. Adding another audio stream - or even half a dozen - would add a negligible amount of data. So it can't be that.
Computational complexity? If the client (the web browser) is organising the order of the video on screen, it can organise the audio. There's a stereo panning API built right in to the browser.
If the server is sending out the video pre-arranged - then is it really a chore to mux a stereo stream rather than mono? Maybe it's a bit of work, but that's nothing compared to re-encoding video.
Patents? I know software patents are ridiculous - but surely stereo audio isn't held by an IP Troll, right?
Do users find it confusing? Maybe. But lots of podcasts have stereo audio with a host in each ear. Every TV show is broadcast in stereo (or better). I think people could handle it. And, if not, have a big ol' MONO button on screen.
Come on Zoom, Skype, Teams, Meet, Jitsi et al! Give the people what they want! An audio feed for each ear!
Except...
Now, I'm being slightly disingenuous here. There are some video call services which use spatial audio. Here's the Dobly service:
The HighFidelity.com service also does spatial audio:
I think both of those are much more immersive. Far easier to listen to. I'm not sure if I want to put my work conferences though a 5.1 surround sound system just yet - but even basic stereo is a vast improvement on boring mono.
So what's stopping the more mainstream services from offering this?
Eugen said on mastodon.social:
@Edent Depends entirely on the app. In TeamSpeak you can configure sound from different participants to come from different angles ("3D sound").
James O'Malley said on twitter.com:
Man, every day your blog posts are interesting. Also gaming sorted this years ago: If you play PSN and Xbox Live let you mute individual players which means you must get a separate audio stream for each.
Ben Smith says:
Guesses because what is a question on the internet without a dude giving low-expertise answers with supreme confidence.
left-right pan on screen is too small relative to the 180° span of audio and 'stretching' it so 5 or 6 people are spread evenly across that spectrum would 'position' them too far from on-screen location. apps re-order people based on who's speaking or just re-joined which makes audio positioning more confusing. extended periods listening to audio mainly from one side is fatiguing (we tried slight spatial panning for our podcast but it didn't get great feedback and the pros advise against it) the additional bandwidth needed to achieve stereo would be better spent on improving clarity in terms of user experience
Philipp Hancke said on twitter.com:
this is a long story. The idea isn't new, see symonics.com/publications/p… There were some early services trying to do this. However, at some point chrome/webrtc changed its echo cancellation implementation. The new version (v3) didn't support stereo.
Philipp Hancke said on twitter.com:
As a result, Chrome mixed everything stereo to mono. See @anthmfs bug filed in 2017: bugs.chromium.org/p/webrtc/issue… You could (and it took two years to find that out) still enable it. However, that disabled echo cancellation. Until...
Owen says:
https://github.com/capnmidnight/Calla is interesting. Novel right this sec, but a cool idea. Works decently, all told
Chris Key said on twitter.com:
Better duplexing would be nice
Dan Brickley said on twitter.com:
I feel like there’s been a wider cultural backing off from stereo being appreciated- lots of cases audio comes from a single device rather than carefully spaced out pairs of speakers
Shawn Simister said on twitter.com:
Same. Seems like a lot of use cases where people want background noise instead of immersive audio.
Shawn Simister said on twitter.com:
OTOH, Apple is pushing their spatial audio feature. But I guess that’s a different audience?
macrumors.com/2021/01/14/net…
Jason Marshall says:
I've had a similar conversation a few times.
The way we do audio chat right now breaks the 'cocktail party effect', dumbing all interactions down to not much better than walkie-talkie levels.
I think if you go down this road far enough you might even be able to emulate the 'hallway track' effect of in-person conferences, by blending in other conversations into the audio.
Alex says:
Seems like it is doable, and FaceTime has been offering it since last September (I think) https://www.imore.com/how-use-spatial-audio-facetime-iphone-and-ipad