Bokeh

Making sense of the mess in my head

Opus Decoding in Swift

Introduction

Friend is an open-source wearable AI device that captures your conversations, transcribes and summarises them, and offers a ton of features based on that information.

The wearable device captures the audio signal and sends it to your phone over BLE, which offers a limited bandwidth. Initially, Friend was sending raw PCM data (16 bits, 8kHz), meaning the audio quality was quite low and so was the transcription accuracy.

To improve that, last weekend I re-enabled the Opus codec in their firmware, allowing higher audio quality while using less bandwidth.

Opus is an open, royalty-free codec that works great with voice content. The available library can be used on various platforms, including on embedded devices.

Although you can use the official Friend app, I’m developing a native iOS app that records conversations from the Friend device as part of my PAL project. And now that the device supports Opus, I needed to add that to my codebase.

Integrating libopus in a Swift code base

I started by doing a quick search for a swift package that supports Opus and found alta/swift-opus: Opus audio codec for Swift Package Manager.
It provides an API that integrates nicely with the native AVFoundation concepts such as AVAudioFormat and AVAudioPCMBuffer, which I was already using in my code for recording the PCM data used so far.

And there is recent activity on the project, which is always a plus.

The way the Friend firmware works is that it takes 10ms of audio data, encodes it (or not for PCM) and sends that packet as one BLE characteristic notification.

In order to optimise the writing to the audio file, I’m collecting a series of packets, appending everything in a Data structure and every 100 packets, converting that to an AVAudioPCMBuffer and writing to the file.

And so my first naive try was to keep the same mechanism and pass my big data blob to the decode method which conveniently returned an AVAudioPCMBuffer

public func decode(_ input: Data) throws -> AVAudioPCMBuffer {

And the code did not work, the decode call was always throwing an Opus.Opus.Error error -4, which is OPUS_INVALID_PACKET.

Packet boundaries are important

Why was my packet invalid ? I reviewed the low-level code that gets the data from Bluetooth and populates the buffer but all looked OK and it worked for PCM, so why not here.

Glancing at the opus site, I saw this information

  • Frame sizes from 2.5 ms to 60 ms

Got it, my packets are too big, I need to feed the decoder smaller ones. So I updated the code to split my big buffer in small chunks and iterated on the call to decode. Same error, although at some point, when playing around with the chunk size, I could successfully decode one packet ?!?

I went back to the firmware code, checking the parameters such as sampling rate, making sure everything was aligned on both sides. And then it hit me. On the firmware side, we’re encoding one 10ms packet at a time and sending that encoded packet over BLE. So it makes perfect sense that on the decoding side, it’s that same packet that needs to be fed to the decoder.

But the code in place does pack several packets it receives over BLE together before handing it over to the decoder. And because we’re using Variable Bit Rate for encoding, I can not split the buffer back into the initial packets in the decoder.

I reworked the code so that instead of storing the raw bytes in a Data structure, I collected the different packets in an Array, each entry storing an individual packet. So when the decoder receives the Array, it can now decode each packet individually.

And this worked. It seems obvious in hindsight, but by adding the decoding logic within the structure of the existing code without looking at the bigger picture, I had gone in the wrong direction.

State matters

No more decoding errors, the audio file was properly written and it had the correct length, great.
But when I listened to the recording, the sound was heavily distorted. Not complete garbage, I could barely understand what I had said during the recording but loads of crackling, saturation and other artefacts.

Back to looking at the code, did I make a mistake in my byte manipulation logic ? Was it an endianness problem ? I tried different changes but to no avail.

The Friend native app, written in Dart with Flutter was producing correct audio, so the issue was on the decoding side. I looked at their code, looked at the Dart Opus plugin (which is using the same libopus library underneath) but nothing stood out.

So I applied a technique I use a lot when faced with a problem in my apps. I try to reproduce it in an isolated context, as small and focused as possible, getting rid of as much overhead as possible so I can iterate fast.
I updated my code to store the raw packets I receive from BLE to a file (so I had a test fixture and I did not need to connect to a BLE device and record a bit of audio for every test). From there, I created a simple command line tool on the Mac to read those packets and decode them to a WAV file.
And to my surprise, the audio was perfect, clean and pristine sound.

Huh ? Comparing the code in my app with the simple test I made, I saw that in the app, again because of the structure in place and because I first wanted to get something working before refactoring the code, I re-created the Opus decoder for every packet. So I updated my test to do the same. And boom, same artefacts in the decoded audio.

Updating the app code to use the same decoder for the whole duration of a single recording session fixed the issue.

Once again, after having fixed the issue, it makes perfect sense. My limited knowledge of how the codec works made me take a shortcut that had a huge impact on its output.

Optimising the processing

I now had a working app, but to fix some of the problems, I had to adapt the code in place.

Because of the API of the Opus Decoder

public func decode(_ input: Data) throws -> AVAudioPCMBuffer {

and of AVAudioFile

open func write(from buffer: AVAudioPCMBuffer) throws

and the fact there’s no API to easily concatenate 2 AVAudioPCMBuffer s together, I was now writing each decoded packet to the audio file.

I did not scientifically investigate the topic, but it bothered me to make frequent calls to an API that results in disk I/O.

My solution was to fork the swift-opus package and add a method to decode to a Data structure and not an AVAudioPCMBuffer.

public func decodeToData(_ input: Data) throws -> Data {

With that in place, I could now properly decode each packet, build a big data buffer by appending together the decoded bytes, create a single AVAudioPCMBuffer buffer from there and write that to the audio file.

The fork also includes an update of libopus to the latest 1.4.x version. I tried updating to 1.5.1 (the latest version at this time) but the project did not compile and I did not bother to investigate further.

I think that given my limited usage, I could also directly include libopus into my app and bypass the need for an extra package altogether. That might be work for some other time.

Key takeaways

Specifically for Opus decoding:

  • decode the packets as they’ve been encoded, don’t try to manipulate them in any way
  • make sure to use the same decoder instance for the whole audio segment, state matters so the lifetime of the decoder should be similar to the one of the encoder

More generally with regards to software development:

  • don’t let existing code inform your decisions too much, take the time to think about the bigger picture and how this can impact what’s in place
  • once faced with a problem, try to replicate it in an isolated context, with the least overhead possible on the debugging process

If you want to take a look at the final source code, the PAL project is fully open-source on GitHub.