Removing default metadata from .opus files

by @edent | # # # # | 1 comment

I’m trying to create some ridiculously tiny audio files. The sort where every single byte matters.

I’ve encoded a small sample. But the opusenc tool automatically adds metadata – even if you don’t specify any.

Using the amazing Mutagen Python library I was able to completely strip out all the metadata!

import mutagen
mutagen.File("example.opus").delete()

It edits the file immediately – so be careful!

But what is it actually doing? I wanted to understand a bit more – so let’s go hex diving!

What the user sees

Running opusinfo example.opus gives:

New logical stream (#1, serial: 03fe3cc9): type opus
Encoded with libopus 1.3.1, libopusenc 0.2.1
User comments section follows...
    ENCODER=opusenc from opus-tools 0.2
    ENCODER_OPTIONS=--bitrate 6 --comp 10 --framesize 60 --padding 0
Opus stream 1:
    ...
Logical stream 1 ended

There are two “mandatory” comments. The ENCODER and the ENCODER_OPTIONS.
I can’t find a way to stop those being generated by opusenc.

The Opus File API gives some idea about the binary structure of the file.

But the real magic happens in the Opus Forumat Specification RFC. It details the header format in 32 bit clumps.

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'O'      |      'p'      |      'u'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |      'T'      |      'a'      |      'g'      |      's'      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                     Vendor String Length                      |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                        Vendor String...                       :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                   User Comment List Length                    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #0 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     :                   User Comment #0 String...                   :
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                 User Comment #1 String Length                 |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     :                                                               :

Let’s take a look at our file in binary, jumping straight to the comment section.

0000004b: 4f70 7573  Opus
0000004f: 5461 6773  Tags

Starts as expected. Next is the Vendor String Length

00000053: 1f00 0000  ....

0x1f is 31 bytes. This is a 32 bit, unsigned, little endian number. Hence it is written as 1f00 which becomes 00001f.

00000057: 6c69 626f  libo
0000005b: 7075 7320  pus 
0000005f: 312e 332e  1.3.
00000063: 312c 206c  1, l
00000067: 6962 6f70  ibop
0000006b: 7573 656e  usen
0000006f: 6320 302e  c 0.
00000073: 322e 31    2.1

According to the spec, no terminating null octet is necessary. So the next bytes are the User Comment List Length. Continuing on from the previous line:

00000073:        02     .
00000077: 0000 00    ...

There are two comments (again, 32 bit little endian).

This field indicates the number of user-supplied comments. It MAY indicate there are zero user-supplied comments, in which case there are no additional fields in the packet.

This means we can have an empty comment section! This is what you get by default:

00000077:        23  ...#
0000007b: 0000 00    ...

First string length is 0x23 = 35 bytes long. Again, little endian.

0000007e: 454e 434f  ENCO
00000082: 4445 523d  DER=
00000086: 6f70 7573  opus
0000008a: 656e 6320  enc 
0000008e: 6672 6f6d  from2
00000092: 206f 7075   opu
00000096: 732d 746f  s-to
0000009a: 6f6c 7320  ols 
0000009e: 302e 3240  0.2@

After exactly 35 bytes, we get our next little endian number 0x40 = 64.

000000a1: 4000 0000  @...
000000a5: 454e 434f  ENCO
000000a9: 4445 525f  DER_
000000ad: 4f50 5449  OPTI
000000b1: 4f4e 533d  ONS=
000000b5: 2d2d 6269  --bi
000000b9: 7472 6174  trat
000000bd: 6520 3620  e 6 
000000c1: 2d2d 636f  --co
000000c5: 6d70 2031  mp 1
000000c9: 3020 2d2d  0 --
000000cd: 6672 616d  fram
000000d1: 6573 697a  esiz
000000d5: 6520 3630  e 60
000000d9: 202d 2d70   --p
000000dd: 6164 6469  addi
000000e1: 6e67 2030  ng 0

And that’s the end of the comment section!

Manually editing the file

I started by setting the User Comment List Length to zero, and removing all the subsequent comment data. That didn’t work. opusinfo gave the following errors:

WARNING: Hole in data (28 bytes) found at approximate offset 1492 bytes. Corrupted Ogg.
WARNING: Hole in data (51 bytes) found at approximate offset 1492 bytes. Corrupted Ogg.
WARNING: sequence number gap in stream 1. Got page 2 when expecting page 1. Indicates missing data.
WARNING: discontinuity in stream (1)

Back to the documentation!

An Ogg Opus stream is organized as follows (see Figure 1 for an example).

        Page 0         Pages 1 ... n        Pages (n+1) ...
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     |            | |   | |   |     |   | |           | |         | |
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |||ID Header|| ||  Comment Header || ||Audio Data Packet 1| | ...
     |+----------+| |+-----------------+| |+-------------------+ +-----
     |            | |   | |   |     |   | |           | |         | |
     +------------+ +---+ +---+ ... +---+ +-----------+ +---------+ +--
     ^      ^                           ^
     |      |                           |
     |      |                           Mandatory Page Break
     |      |
     |      ID header is contained on a single page
     |
     'Beginning Of Stream'

    Figure 1: Example Packet Organization for a Logical Ogg Opus Stream

There are two mandatory header packets. The first packet in the logical Ogg bitstream MUST contain the identification (ID) header, which uniquely identifies a stream as Opus audio. The format of this header is defined in Section 5.1. It is placed alone (without any other packet data) on the first page of the logical Ogg bitstream and completes on that page. This page has its ‘beginning of stream’ flag set.
The second packet in the logical Ogg bitstream MUST contain the comment header, which contains user-supplied metadata. The format of this header is defined in Section 5.2. It MAY span multiple pages, beginning on the second page of the logical stream. However many pages it spans, the comment header packet MUST finish the page on which it completes.

I tried saying there was one comment, with a length of zero and a null comment. That didn’t work either.

I think this is because before the start of the comment header there is something describing how long the packet will be.

Headers

Here are the headers from the original file, and the one stripped by Mutagen.

Original Header

00000000: 4f67 6753 0002 0000  OggS....
00000008: 0000 0000 0000 c93c  .......<
00000010: fe03 0000 0000 f90e  ........
00000018: f775 0113 4f70 7573  .u..Opus
00000020: 4865 6164 0101 3801  Head..8.
00000028: 80bb 0000 0000 004f  .......O
00000030: 6767 5300 0000 0000  ggS.....
00000038: 0000 0000 00c9 3cfe  ......<.
00000040: 0301 0000 0035 dfaf  .....5..
00000048: 0601 9a4f 7075 7354  ...OpusT
00000050: 6167 731f 0000 006c  ags....l
00000058: 6962 6f70 7573 2031  ibopus 1

Stripped Header

00000000: 4f67 6753 0002 0000  OggS....
00000008: 0000 0000 0000 c93c  .......<
00000010: fe03 0000 0000 f90e  ........
00000018: f775 0113 4f70 7573  .u..Opus
00000020: 4865 6164 0101 3801  Head..8.
00000028: 80bb 0000 0000 004f  .......O
00000030: 6767 5300 0000 0000  ggS.....
00000038: 0000 0000 00c9 3cfe  ......<.
00000040: 0301 0000 00ae 941c  ........
00000048: 4e01 2f4f 7075 7354  N./OpusT
00000050: 6167 731f 0000 006c  ags....l
00000058: 6962 6f70 7573 2031  ibopus 1

The Difference

Original                                  Stripped
00000040: 0301 0000 0035 dfaf  .....5.. | 00000040: 0301 0000 00ae 941c  ........
00000048: 0601 9a4f 7075 7354  ...OpusT | 00000048: 4e01 2f4f 7075 7354  N./OpusT

So, something is happening in bytes 45 – 50. But what?

A page is a header of 26 bytes, followed by the length of the data, followed by the data. The constructor is givin a file-like object pointing to the start of an Ogg page. After the constructor is finished it is pointing to the start of the next page
Mutagen Source Code

Unfortunately, my brain freezes up when I see things like

header = struct.unpack('<4sBBqIIiB', header_data)

But the code does point to the Ogg page format specification.

The LSb (least significant bit) comes first in the Bytes. Fields with more than one byte length are encoded LSB (least significant byte) first.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | capture_pattern: Magic number for page start "OggS"           | 0-3
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | version       | header_type   | granule_position              | 4-7
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                                                               | 8-11
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | bitstream_serial_number       | 12-15
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | page_sequence_number          | 16-19
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | CRC_checksum                  | 20-23
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                               | page_segments | segment_table | 24-27
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 | ...                                                           | 28-
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

So, it is the CRC Checksum which is different. The Vorbis framing documentation has a brief description of how the CRC is calculated – but the full documentation 404s.

Conclusion

Hand editing binary files is for mugs.

One thought on “Removing default metadata from .opus files

  1. The Ogg file format is a right mess. They decided to break it into packet sized chunks for streaming, and that made it very annoying to parse or seek.

Leave a Reply

Your email address will not be published. Required fields are marked *