Game Instance


Let the games begin

FLAC decoder

A comprehensible, lightweight C++ implementation

Every now and then, one reinvents the wheel. To better grasp techniques, for keeping dependencies in check or simply to get a glimpse of attention, we all did it at some point and were so proud. Most do it in their youth, some still do it in their adulthood and the rest know better. The first two reasons got me a native C++ FLAC decoder with all the essential code grouped in one file, requiring one custom bit streaming and an appendable error message component. The excitement you get hearing music instead of white noise after hours of debugging is toxic... but so fun.

Why

PCM56 player was hindered by ESP8266's limitations and in a parallel effort to deal with that, I started investigating the time complexity of libFLAC, memory requirements and how'd be configured, if possible, to use just enough RAM to decode CD-quality tracks. Failing to achieve the latter, I turned to the FLAC format description which was drafted in such a terse and intuitive manner that by the time I got to the end of it, I'd written 2/3 of the decoding code. All I was missing was a rice decoder for the residuals, the linear prediction and few other details that were exhaustively covered by the IETF specification.

Since the algorithm is not locked by any patents and the creators of FLAC made everything open-source, I too will be following their lead. As such, the audio library code and tools, as well as their dependencies, are published under GPL-3.0.

Before getting any further, this was released with a strong belief that it can be helpful to others, as it was for me. It's a comprehensive, yet compact, native C++ implementation of the specification. Nothing more! That said, I advise not to use this library for anything other than educative applications. It is partially tested, decodes no more than 2 channels, executes twice as slow as libFLAC and, most importantly, it won't be maintained as well or often.

The API

There are some differences from libFLAC++ in terms of dev interface. The decoder is constructed with a stream object that reads bytes off an input source. The user program then needs to call dedicated member functions that decode metadata or audio blocks and propel the object's state transitions. Calling decode_marker() moves the state from "init" to "has_marker", looping decode_metadata() sufficient times bumps it to "has_metadata" and looping decode_audio() eventually leads it to a "complete" state. After each audio decoding call, audio samples are accessible directly from the inner buffer via block_data(). This is an alternative to the use of callbacks, preferred by libFLAC.

namespace audio {
namespace flac {

using sample_type = int64_t;
using audio_data = std::vector<std::vector<sample_type>>;

struct streaminfo_type;
enum class decoder_state;

template<typename INPUT_STREAM, size_t BUFFER_SIZE = 8192>
class decoder {
public:
    explicit decoder(INPUT_STREAM &istream);

    void decode_marker();
    void decode_metadata();
    void decode_audio();

    inline const decoder_state &state() const;
    inline const streaminfo_type &streaminfo() const;
    inline const uint32_t &block_sample_rate() const;
    inline const audio_data &block_data() const;
    inline const uint16_t &block_size() const;

private:
    // ...
    decoder_state _state;
    streaminfo_type _streaminfo;
    // ...
    audio_data _buffer;
};

} // namespace flac
} // namespace audio

All class members use the stack (under 200 bytes) apart from the buffer, which as a std::vector ends up having elements created on the heap. The buffer is constructed with all BUFFER_SIZE samples for both channels. Besides avoiding re-allocations as the vector grows, this has a second, more important purpose. A FLAC requiring more than BUFFER_SIZE samples per block will cause a handled exception. Sure, libFLAC would return a NULL pointer if there's no memory left in store to allocate the decoder. However, what if it succeeds and allocates it ALL? For a limited device may be a problem as it makes other components needing memory to fail later-on. That's the kind of control I was looking for and succeeded to obtain with this implementation.

Application code

looks pretty tight and readable as well. The code sample below transcodes a FLAC file to WAVE. It does, however, use another lightweight WAVE encoding C++ component, also available in the audio library.

#include <basics/error.hh>
#include <basics/file.hh>
#include <audio/flac.hh>
#include <audio/wave.hh>

using namespace basics;
using namespace audio;

int main(int argc, char *argv[])
{
    try {
        if (argc < 3)
            throw error{"Usage: %s <input.flac> <output.wav>", argv[0]};

        auto file_istream = file::input{argv[1]};
        auto flac_istream = flac::decoder{file_istream};
        auto file_ostream = file::output{argv[2], true};
        auto wave_ostream = wave::encoder{file_ostream};

        flac_istream.decode_marker();
        while (flac_istream.state() != flac::decoder_state::has_metadata)
            flac_istream.decode_metadata();

        auto info = flac_istream.streaminfo();
        wave_ostream.encode_header(wave::streaminfo_type{info.sample_rate,
                                                         info.sample_bit_size,
                                                         info.channel_count,
                                                         info.sample_count});

        while (flac_istream.state() != flac::decoder_state::complete) {
            flac_istream.decode_audio();
            if (flac_istream.block_sample_rate() != info.sample_rate)
                throw error{"variable sample rate not supported"};

            for (auto i = size_t{0}; i < flac_istream.block_size(); ++i)
                for (auto ch_idx = uint8_t{0}; ch_idx < info.channel_count; ++ch_idx)
                    wave_ostream.encode_sample(flac_istream.block_data()[ch_idx][i]);
        }
    } catch (const error &err) {
        err.dump();

        return 1;
    }

    return 0;
}

OK, so how does it perform? As I mentioned speed is not its strength at this point, decoding the same FLAC file in twice as much time as the original flac -d. A core design feature is its control over the space complexity. Considering that most encodings prefer block sizes of 4096 or 4608, the default value of 8192 is largely sufficient. This, however, was initially designed to run on a memory limited MCU and for 16bit CD extracted FLAC files. With that in mind, the buffer_sample_type can be reduced to 32bit, practically halving the memory requirements. So, for a buffer length of 4608 we'd get to 36KB (heap) plus the 176 bytes (stack) needed by the decoder instance.

Any way you look at it, the decoder has linear time complexity. Even the constant subframe value must populate all samples in the block. Verbatim subframes are O(N) whereas the LPC encoded frames need O(k*N), with N being the number of samples and k the polynomial order (which can theoretically be up to 32). The complexity is indirectly granted by the specification which states that every frame is encoded independently.

If any conclusion is to be drawn out of this, re-writing community validated code should be considered carefully. However, I chose not to fork libFLAC and then work my way to match some technical constraints but instead I preferred understanding the concepts and intricacies behind this lossless codec. In my doing so, I've written a compact decoder that can be tweaked such that it'll process a stream in a given amount of memory or not at all. Although not important in today's terms, I also reached the conclusion that memory usage on decoder side could have further been reduced should other encoding choices have been made in the protocol.

Update 2024-06-01: Check-out the ESP32 PCM56 player application of the FLAC decoder.