Removing default metadata from .opus files
I'm trying to create some ridiculously tiny audio files. The sort where every single byte matters.
I've encoded a small sample. But the opusenc
tool automatically adds metadata - even if you don't specify any.
Using the amazing Mutagen Python library I was able to completely strip out all the metadata!
Python 3import mutagen
mutagen.File("example.opus").delete()
It edits the file immediately - so be careful!
But what is it actually doing? I wanted to understand a bit more - so let's go hex diving!
What the user sees
Running opusinfo example.opus
gives:
New logical stream (#1, serial: 03fe3cc9): type opus
Encoded with libopus 1.3.1, libopusenc 0.2.1
User comments section follows...
ENCODER=opusenc from opus-tools 0.2
ENCODER_OPTIONS=--bitrate 6 --comp 10 --framesize 60 --padding 0
Opus stream 1:
...
Logical stream 1 ended
There are two "mandatory" comments. The ENCODER and the ENCODER_OPTIONS. I can't find a way to stop those being generated by opusenc
.
The Opus File API gives some idea about the binary structure of the file.
But the real magic happens in the Opus Forumat Specification RFC. It details the header format in 32 bit clumps.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'O' | 'p' | 'u' | 's' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 'T' | 'a' | 'g' | 's' |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Vendor String Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
: Vendor String... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User Comment List Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User Comment #0 String Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
: User Comment #0 String... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| User Comment #1 String Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: :
Let's take a look at our file in binary, jumping straight to the comment section.
0000004b: 4f70 7573 Opus
0000004f: 5461 6773 Tags
Starts as expected. Next is the Vendor String Length
00000053: 1f00 0000 ....
0x1f is 31 bytes. This is a 32 bit, unsigned, little endian number. Hence it is written as 1f00
which becomes 00001f
.
00000057: 6c69 626f libo
0000005b: 7075 7320 pus
0000005f: 312e 332e 1.3.
00000063: 312c 206c 1, l
00000067: 6962 6f70 ibop
0000006b: 7573 656e usen
0000006f: 6320 302e c 0.
00000073: 322e 31 2.1
According to the spec, no terminating null octet is necessary. So the next bytes are the User Comment List Length. Continuing on from the previous line:
00000073: 02 .
00000077: 0000 00 ...
There are two comments (again, 32 bit little endian).
This field indicates the number of user-supplied comments. It MAY indicate there are zero user-supplied comments, in which case there are no additional fields in the packet.
This means we can have an empty comment section! This is what you get by default:
00000077: 23 ...#
0000007b: 0000 00 ...
First string length is 0x23 = 35 bytes long. Again, little endian.
0000007e: 454e 434f ENCO
00000082: 4445 523d DER=
00000086: 6f70 7573 opus
0000008a: 656e 6320 enc
0000008e: 6672 6f6d from2
00000092: 206f 7075 opu
00000096: 732d 746f s-to
0000009a: 6f6c 7320 ols
0000009e: 302e 3240 0.2@
After exactly 35 bytes, we get our next little endian number 0x40 = 64.
000000a1: 4000 0000 @...
000000a5: 454e 434f ENCO
000000a9: 4445 525f DER_
000000ad: 4f50 5449 OPTI
000000b1: 4f4e 533d ONS=
000000b5: 2d2d 6269 --bi
000000b9: 7472 6174 trat
000000bd: 6520 3620 e 6
000000c1: 2d2d 636f --co
000000c5: 6d70 2031 mp 1
000000c9: 3020 2d2d 0 --
000000cd: 6672 616d fram
000000d1: 6573 697a esiz
000000d5: 6520 3630 e 60
000000d9: 202d 2d70 --p
000000dd: 6164 6469 addi
000000e1: 6e67 2030 ng 0
And that's the end of the comment section!
Manually editing the file
I started by setting the User Comment List Length to zero, and removing all the subsequent comment data. That didn't work. opusinfo
gave the following errors:
WARNING: Hole in data (28 bytes) found at approximate offset 1492 bytes. Corrupted Ogg.
WARNING: Hole in data (51 bytes) found at approximate offset 1492 bytes. Corrupted Ogg.
WARNING: sequence number gap in stream 1. Got page 2 when expecting page 1. Indicates missing data.
WARNING: discontinuity in stream (1)
An Ogg Opus stream is organized as follows (see Figure 1 for an example).
Page 0 Pages 1 ... n Pages (n+1) ...
+------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
| | | | | | | | | | | | |
|+----------+| |+-----------------+| |+-------------------+ +-----
|||ID Header|| || Comment Header || ||Audio Data Packet 1| | ...
|+----------+| |+-----------------+| |+-------------------+ +-----
| | | | | | | | | | | | |
+------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
^ ^ ^
| | |
| | Mandatory Page Break
| |
| ID header is contained on a single page
|
'Beginning Of Stream'
Figure 1: Example Packet Organization for a Logical Ogg Opus Stream
There are two mandatory header packets. The first packet in the logical Ogg bitstream MUST contain the identification (ID) header, which uniquely identifies a stream as Opus audio. The format of this header is defined in Section 5.1. It is placed alone (without any other packet data) on the first page of the logical Ogg bitstream and completes on that page. This page has its 'beginning of stream' flag set. The second packet in the logical Ogg bitstream MUST contain the comment header, which contains user-supplied metadata. The format of this header is defined in Section 5.2. It MAY span multiple pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it completes.
I tried saying there was one comment, with a length of zero and a null comment. That didn't work either.
I think this is because before the start of the comment header there is something describing how long the packet will be.
Headers
Here are the headers from the original file, and the one stripped by Mutagen.
Original Header
00000000: 4f67 6753 0002 0000 OggS....
00000008: 0000 0000 0000 c93c .......<
00000010: fe03 0000 0000 f90e ........
00000018: f775 0113 4f70 7573 .u..Opus
00000020: 4865 6164 0101 3801 Head..8.
00000028: 80bb 0000 0000 004f .......O
00000030: 6767 5300 0000 0000 ggS.....
00000038: 0000 0000 00c9 3cfe ......<.
00000040: 0301 0000 0035 dfaf .....5..
00000048: 0601 9a4f 7075 7354 ...OpusT
00000050: 6167 731f 0000 006c ags....l
00000058: 6962 6f70 7573 2031 ibopus 1
Stripped Header
00000000: 4f67 6753 0002 0000 OggS....
00000008: 0000 0000 0000 c93c .......<
00000010: fe03 0000 0000 f90e ........
00000018: f775 0113 4f70 7573 .u..Opus
00000020: 4865 6164 0101 3801 Head..8.
00000028: 80bb 0000 0000 004f .......O
00000030: 6767 5300 0000 0000 ggS.....
00000038: 0000 0000 00c9 3cfe ......<.
00000040: 0301 0000 00ae 941c ........
00000048: 4e01 2f4f 7075 7354 N./OpusT
00000050: 6167 731f 0000 006c ags....l
00000058: 6962 6f70 7573 2031 ibopus 1
The Difference
Original Stripped
00000040: 0301 0000 0035 dfaf .....5.. | 00000040: 0301 0000 00ae 941c ........
00000048: 0601 9a4f 7075 7354 ...OpusT | 00000048: 4e01 2f4f 7075 7354 N./OpusT
So, something is happening in bytes 45 - 50. But what?
A page is a header of 26 bytes, followed by the length of the data, followed by the data. The constructor is givin a file-like object pointing to the start of an Ogg page. After the constructor is finished it is pointing to the start of the next page Mutagen Source Code
Unfortunately, my brain freezes up when I see things like
Python 3header = struct.unpack('<4sBBqIIiB', header_data)
But the code does point to the Ogg page format specification.
The LSb (least significant bit) comes first in the Bytes. Fields with more than one byte length are encoded LSB (least significant byte) first.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_pattern: Magic number for page start "OggS" | 0-3
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version | header_type | granule_position | 4-7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | 8-11
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | bitstream_serial_number | 12-15
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_sequence_number | 16-19
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | CRC_checksum | 20-23
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | page_segments | segment_table | 24-27
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... | 28-
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
So, it is the CRC Checksum which is different. The Vorbis framing documentation has a brief description of how the CRC is calculated - but the full documentation 404s.
Conclusion
Hand editing binary files is for mugs.
Kevin Marks said on twitter.com:
The Ogg file format is a right mess. They decided to break it into packet sized chunks for streaming, and that made it very annoying to parse or seek.