opus – Terence Eden’s Blog

Podcasts on Floppy Disk

@edent — Sat, 05 Sep 2020 11:12:41 +0000

An old 3.5 inch floppy disk holds 1.44 MB of data. According to my calculations, that's 1,424 KB blocks. For a total of 1,458,176 Bytes. Once formatted as FAT, you end up with 1,457,664 Bytes of storage. But how much audio can a floppy hold?

(Here I mean wave based audio of human speech. It's trivial to fit more in using MIDI or speech synthesis.)

I'm going to use "A Podcast Of Unnecessary Detail" to experiment with, as this blogpost also has too much detail. The podcast is a 39MB MP3, running just over half-an-hour.

Here's the first 40 seconds of the original MP3 file, so you can hear the music, and male and female voices.

That's about 800KB. A floppy can hold about a minute and a half of that quality audio.

Squash it down

A floppy disk holds about 11,000 Kilobits. So, to hold 1,800 seconds (30 minutes) of audio, we need to encode audio at about 6kbps (Kilobits per second)

By coincidence, the Opus audio format supports encoding speech at around 6Kbps.

Here's the same sample of the podcast bounced down to mono and encoded at 6Kbps. The voices are pretty clear, but the music is extremely mushy. (The following files are in .opus format. They should play fine on Android and most desktop browsers).

Encoded using opusenc in.wav --downmix-mono --bitrate 6 out.opus

But that's still a little too big. The 33 minute podcast weighs in at 1,581,781 Bytes. Too large for a floppy.

Using the --framesize option, we can set the Framesize to 60 milliseconds. Not great for streaming, but we don't care about that, and it makes the files much smaller.

But opusenc has a few more tricks up its sleeve! Using --cvbr we can force the encoder never to go above a bitrate limit.

So, using opusenc in.wav --downmix-mono --bitrate 6 --cvbr --framesize 60 out.opus we can save 33 minutes, 8 seconds, in 1,422,676 Bytes. Enough space left over on a floppy disk for an image.

But wait! There's more!

Surprisingly, the opusenc documentation is not quite telling us the whole story! You can pass any number lower than 6 to opusenc and it will try its best.

In practice, the lowest bitrate it will generate for speech is about 4Kbps. Here's the same sample:

opusenc in.wav --downmix-mono --bitrate 4 out-4.opus

What's the lowest we can go? How low before we lose all meaning?

The absolutely lowest encoding which produced any sound at all was 1.2Kbps. I warn you, this sounds awful!

opusenc in.wav --downmix-mono --bitrate 1.2 --hard-cbr out-1.2-hard.opus

This uses the --hard-cbr option which forces the encoder to a specific constant bitrate.

About the lowest you can go and still have things even vaguely intelligible was 2Kbps. Again, this sounds horrible, but it is just about possible to understand most of the speech. Even if they do sound like Daleks with low batteries!

opusenc in.wav --downmix-mono --bitrate 2 --hardcbr out-2-hard.opus

If you were prepared to have you audio that shitty, you can just about squeeze a full hour of speech onto an old floppy disk.

So, guess what tomorrow's blog post is going to be...?

Enjoyed this post?

If you like the silly things I do, you can say thanks by:

Or, just leave a supportive comment.

Removing default metadata from .opus files

@edent — Fri, 24 Apr 2020 11:03:00 +0000

I'm trying to create some ridiculously tiny audio files. The sort where every single byte matters.

I've encoded a small sample. But the opusenc tool automatically adds metadata - even if you don't specify any.

Using the amazing Mutagen Python library I was able to completely strip out all the metadata!

import mutagen
mutagen.File("example.opus").delete()

It edits the file immediately - so be careful!

But what is it actually doing? I wanted to understand a bit more - so let's go hex diving!

What the user sees

Running opusinfo example.opus gives:

New logical stream (#1, serial: 03fe3cc9): type opus
Encoded with libopus 1.3.1, libopusenc 0.2.1
User comments section follows...
    ENCODER=opusenc from opus-tools 0.2
    ENCODER_OPTIONS=--bitrate 6 --comp 10 --framesize 60 --padding 0
Opus stream 1:
    ...
Logical stream 1 ended

There are two "mandatory" comments. The ENCODER and the ENCODER_OPTIONS. I can't find a way to stop those being generated by opusenc.

The Opus File API gives some idea about the binary structure of the file.

But the real magic happens in the Opus Forumat Specification RFC. It details the header format in 32 bit clumps.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'O'      |      'p'      |      'u'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'T'      |      'a'      |      'g'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Vendor String Length                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                        Vendor String...                       :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   User Comment List Length                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #0 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                   User Comment #0 String...                   :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #1 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     :                                                               :

Let's take a look at our file in binary, jumping straight to the comment section.

0000004b: 4f70 7573  Opus
0000004f: 5461 6773  Tags

Starts as expected. Next is the Vendor String Length

00000053: 1f00 0000  ....

0x1f is 31 bytes. This is a 32 bit, unsigned, little endian number. Hence it is written as 1f00 which becomes 00001f.

00000057: 6c69 626f  libo
0000005b: 7075 7320  pus 
0000005f: 312e 332e  1.3.
00000063: 312c 206c  1, l
00000067: 6962 6f70  ibop
0000006b: 7573 656e  usen
0000006f: 6320 302e  c 0.
00000073: 322e 31    2.1

According to the spec, no terminating null octet is necessary. So the next bytes are the User Comment List Length. Continuing on from the previous line:

00000073:        02     .
00000077: 0000 00    ...

There are two comments (again, 32 bit little endian).

This field indicates the number of user-supplied comments. It MAY indicate there are zero user-supplied comments, in which case there are no additional fields in the packet.

This means we can have an empty comment section! This is what you get by default:

00000077:        23  ...#
0000007b: 0000 00    ...

First string length is 0x23 = 35 bytes long. Again, little endian.

0000007e: 454e 434f  ENCO
00000082: 4445 523d  DER=
00000086: 6f70 7573  opus
0000008a: 656e 6320  enc 
0000008e: 6672 6f6d  from2
00000092: 206f 7075   opu
00000096: 732d 746f  s-to
0000009a: 6f6c 7320  ols 
0000009e: 302e 3240  0.2@

After exactly 35 bytes, we get our next little endian number 0x40 = 64.

000000a1: 4000 0000  @...
000000a5: 454e 434f  ENCO
000000a9: 4445 525f  DER_
000000ad: 4f50 5449  OPTI
000000b1: 4f4e 533d  ONS=
000000b5: 2d2d 6269  --bi
000000b9: 7472 6174  trat
000000bd: 6520 3620  e 6 
000000c1: 2d2d 636f  --co
000000c5: 6d70 2031  mp 1
000000c9: 3020 2d2d  0 --
000000cd: 6672 616d  fram
000000d1: 6573 697a  esiz
000000d5: 6520 3630  e 60
000000d9: 202d 2d70   --p
000000dd: 6164 6469  addi
000000e1: 6e67 2030  ng 0

And that's the end of the comment section!

Manually editing the file

I started by setting the User Comment List Length to zero, and removing all the subsequent comment data. That didn't work. opusinfo gave the following errors:

WARNING: Hole in data (28 bytes) found at approximate offset 1492 bytes. Corrupted Ogg.
WARNING: Hole in data (51 bytes) found at approximate offset 1492 bytes. Corrupted Ogg.
WARNING: sequence number gap in stream 1. Got page 2 when expecting page 1. Indicates missing data.
WARNING: discontinuity in stream (1)

Back to the documentation!

An Ogg Opus stream is organized as follows (see Figure 1 for an example).

        Page 0         Pages 1 ... n        Pages (n+1) ...
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     |            | |   | |   |     |   | |           | |         | |
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |||ID Header|| ||  Comment Header || ||Audio Data Packet 1| | ...
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |            | |   | |   |     |   | |           | |         | |
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     ^      ^                           ^
     |      |                           |
     |      |                           Mandatory Page Break
     |      |
     |      ID header is contained on a single page
     |
     'Beginning Of Stream'

    Figure 1: Example Packet Organization for a Logical Ogg Opus Stream

There are two mandatory header packets. The first packet in the logical Ogg bitstream MUST contain the identification (ID) header, which uniquely identifies a stream as Opus audio. The format of this header is defined in Section 5.1. It is placed alone (without any other packet data) on the first page of the logical Ogg bitstream and completes on that page. This page has its 'beginning of stream' flag set.

The second packet in the logical Ogg bitstream MUST contain the comment header, which contains user-supplied metadata. The format of this header is defined in Section 5.2. It MAY span multiple pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it completes.

I tried saying there was one comment, with a length of zero and a null comment. That didn't work either.

I think this is because before the start of the comment header there is something describing how long the packet will be.

Headers

Here are the headers from the original file, and the one stripped by Mutagen.

Original Header

00000000: 4f67 6753 0002 0000  OggS....
00000008: 0000 0000 0000 c93c  .......<
00000010: fe03 0000 0000 f90e  ........
00000018: f775 0113 4f70 7573  .u..Opus
00000020: 4865 6164 0101 3801  Head..8.
00000028: 80bb 0000 0000 004f  .......O
00000030: 6767 5300 0000 0000  ggS.....
00000038: 0000 0000 00c9 3cfe  ......<.
00000040: 0301 0000 0035 dfaf  .....5..
00000048: 0601 9a4f 7075 7354  ...OpusT
00000050: 6167 731f 0000 006c  ags....l
00000058: 6962 6f70 7573 2031  ibopus 1

Stripped Header

00000000: 4f67 6753 0002 0000  OggS....
00000008: 0000 0000 0000 c93c  .......<
00000010: fe03 0000 0000 f90e  ........
00000018: f775 0113 4f70 7573  .u..Opus
00000020: 4865 6164 0101 3801  Head..8.
00000028: 80bb 0000 0000 004f  .......O
00000030: 6767 5300 0000 0000  ggS.....
00000038: 0000 0000 00c9 3cfe  ......<.
00000040: 0301 0000 00ae 941c  ........
00000048: 4e01 2f4f 7075 7354  N./OpusT
00000050: 6167 731f 0000 006c  ags....l
00000058: 6962 6f70 7573 2031  ibopus 1

The Difference

Original                                  Stripped
00000040: 0301 0000 0035 dfaf  .....5.. | 00000040: 0301 0000 00ae 941c  ........
00000048: 0601 9a4f 7075 7354  ...OpusT | 00000048: 4e01 2f4f 7075 7354  N./OpusT

So, something is happening in bytes 45 - 50. But what?

A page is a header of 26 bytes, followed by the length of the data, followed by the data. The constructor is givin a file-like object pointing to the start of an Ogg page. After the constructor is finished it is pointing to the start of the next page

Mutagen Source Code

Unfortunately, my brain freezes up when I see things like

header = struct.unpack('<4sBBqIIiB', header_data)

But the code does point to the Ogg page format specification.

The LSb (least significant bit) comes first in the Bytes. Fields with more than one byte length are encoded LSB (least significant byte) first.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | capture_pattern: Magic number for page start "OggS"           | 0-3
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | version       | header_type   | granule_position              | 4-7
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 8-11
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | bitstream_serial_number       | 12-15
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | page_sequence_number          | 16-19
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | CRC_checksum                  | 20-23
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | page_segments | segment_table | 24-27
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | ...                                                           | 28-
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

So, it is the CRC Checksum which is different. The Vorbis framing documentation has a brief description of how the CRC is calculated - but the full documentation 404s.

Conclusion

Hand editing binary files is for mugs.

Bouncing all my music down to Opus

@edent — Sun, 26 Jan 2020 18:00:06 +0000

As much as technology marches forward, there are two truths I need to accept.

File transfer speeds are always going to be slower that I can be bothered to wait
My ears aren't going to get any better at hearing

For years, I ripped all of my music as FLAC. I collected ridiculously high-resolution audio files. I devoured disk drive space for surround sound soundtracks.

"One day," I thought, "I'll have an amazing audio system to play these back on."

The reality is that I spend most of my time listening to music on £40 bluetooth headphones. I have a nice 5.1 surround sound system - but it isn't exactly THX certified. And, thanks to the construction of British houses, I can't turn it up to 11 without my neighbours complaining.

Yesterday, as I was waiting for a couple of GB of new music to fly through the aether to my phone, I was struck by a realisation...

I am not an archivist.

I don't need to preserve all my commercially-bought music in the highest resolution possible. It isn't my job to faithfully preserve every ultrasonic decibel. And I am never going to own a set of speakers which will super-charge my old ears.

It's OK to bounce my music down to a more convenient file format.

Enter Opus

I've written before about the Opus file format. It's the modern and open successor to MP3. It isn't lossless - but I've compared the quality, and I can't hear a damned difference.

Turns out, everyone agrees with me. Even at extremely low bitrates it is superior to every other format.

Opus plays back natively on Android, it supports all the normal music metadata / IDv3 tags, and works perfectly with surround sound. The codec and tools are Open Source and Linux friendly.

And, best of all, it's small! Even when I encode at the maximum possible bitrate (I'm not a total savage!) an hour of 5.1 audio is about 20% of the size of FLAC.

I know I could buy a bigger disk. But while home storage is relatively cheap, mobile storage is still expensive. Yes, WiFi 6 will make everything better - but I don't need to fling gigabytes through the air to my tin ears.

So, from now on, everything is getting run through: opusenc --bitrate 1536 in.flac out.opus

Extracting DVD-Audio on Linux, the modern(ish) way

@edent — Thu, 24 Jan 2019 17:16:04 +0000

DVD-Audio (henceforce DVDA) is an unloved and mostly forgotten audio format. Nevertheless, there's a large back-catalogue of music which is still trapped on ancient discs encoded in the proprietary MLP format.

A few years ago I wrote about how to extract the audio using the obsolete Windows program DVD-Audio Explorer. I wanted to be able to run the extraction via the command line, which means trying to find a native Linux app. I tried Python AudioTools but I got lost in an endless maze of incompatible dependencies.

So I went with Brian "tuffy" Langenberger's libDVD-Audio.

To install, simply run:

sudo make install

That will give you two new programs. To get info about your DVDA, run:

dvda-debug-info -A /path/to/your/AUDIO_TS

That will pump out details about each track like so:

Title  Track  Length  PTS Length  First Sector  Last Sector
    1      1    3:30    13450000             0        86547
    1      2    4:11    12500000         73144       122600
    1      3    2:11    16010000        370601       233337

Extract

To extract the tracks, run:

dvda2wav -A /path/to/your/AUDIO_TS

That will spit out the files in WAV format.

Encode

WAV is pretty large - about 20MB per minute per channel. Converting to FLAC (the Free Lossless Audio Codec) gets you down to about 10MB. I just go straight for the modern Opus Codec which does excellent quality surround sound at low file sizes.

opusenc --bitrate 4096 track-01-01.wav 1.opus

That's about 2MB/minute/channel and I promise that you won't hear the difference.

Metadata

If you want to add metadata to a track, it's done like this:

opusenc --bitrate 4096 in.wav out.opus --title "Yesterday" --artist "The Beatles" --tracknumber "02"

Older versions of Opusenc, oddly, don't have a native way to express track numbers, so you'll need to do it manually using --comment "tracknumber=02"

Newer versions can use --tracknumber to add track numbers.

Automating

You can make it slightly easier to add the metadata if you give the files predictable names. For example: 01-Yesterday-The Beatles.wav

Here's a scrappy bash script:

#!/bin/bash
for FILE in *.wav
do
    FILENAME="${FILE%.*}"

    TRACK=$(echo  $FILENAME | cut -d'-' -f 1)
    TITLE=$(echo  $FILENAME | cut -d'-' -f 2)
    ARTIST=$(echo $FILENAME | cut -d'-' -f 3)

    OUTPUT="[$TRACK] $ARTIST - $TITLE.opus"

    opusenc --bitrate 4096 "$FILE" "$OUTPUT" --title "$TITLE" --artist "$ARTIST" --tracknumber "$TRACK"
done

I hope future me finds these notes useful!

Convert Surround Sound WAV albums to individual opus files

@edent — Sun, 18 Mar 2018 18:38:16 +0000

As ever, notes to myself. This is a method to take a .wav and .cue and transform it into individual files. In this case, .opus.

#!/bin/bash
json=$(ffprobe -i "file.mkv" -print_format json -show_chapters -loglevel error)
count=$(echo $json | jq ".chapters | length" )

mkvmerge test.mkv -D -S --split chapters:all -o "%02d.mkv"

COUNTER=1
while [ $COUNTER -le $count ]; do

  printf -v zerotrack "%02d" $COUNTER

  json=$(ffprobe -i "$zerotrack.mkv" -print_format json -show_chapters -loglevel error)
  title=$(echo $json | jq ".chapters[0].tags.title" -r)
  filename="[$zerotrack] $title"

  mkvextract tracks "$zerotrack.mkv" 0:"$filename.opus"
  let COUNTER=COUNTER+1 
done