Game Instance

Let the games begin

PCM56 audio player

ESP32 evolution

Since the initial PCM56 player prototype, things have evolved quite a bit. From the hardware perspective, the device now plays little-endian 16bit signed int PCM samples from an SD-card. The firmware contains a bare-bone HTML/CSS/Javascript interface served from the MCU's flash memory through an async webserver that exports a basic JSON backend, taking play/stop commands and listing audio files, all handled by the ESP8266 acting as a WiFi station... in under 80KiB of user-data RAM.

This article follows the architectural preview, technical challenges and choices for an audio system mod kit with the following specs:

  • Play CD-quality lossless formats from a storage medium.
  • Must be remotely controlled via a web GUI.
  • Capable of (re)connecting to a preset WiFi access point.
  • Able to switch audio output between host unit and the device itself.
  • Have a small footprint, easy to install in most audio devices.
  • Have no RF and thermal interference on its host.
  • Optional:
    • Remotely maintainable.
    • Low power stand-by.

From the start, the device is useless if it plays just a raw PCM waves. Indeed, this is the same format CD players feed their DACs but there's no other reason not to use a lossless encoding, FLAC being the obvious one. Fitting such a decoder in the already memory constrained ESP8266 is simply impossible because of the usual block size FLAC decoder needs. So the next step up the family ladder is ESP32 which, besides having 4x more RAM, comes with a 160-240MHz dual-core 32-bit LX6 processor.

Hardware requirements

The circuit must be fed 3 voltages relative to a common power and signal GND point. First one, capable of sourcing a maximum 1A current at 3.3V, powering the MCU, must be sufficiently decoupled and bypassed close-by to cope with the chip's power surges. Two well regulated power rails feeding both DAC chips, giving anything between 5-12V at 250mA and centered around the GND. Ideally, the digital and analog parts of the chips should have different power rails but most manufacturers of the 80s chose to reuse rails, provided they're stable enough. As an observation, the DAC output signal is influenced by variations in the power supply rails, thus an emphasis on the regulation thereof. Preferably, the MCU's supply should not come from the same transformer core as the DAC rails. Heat produced by power supplies should be minimal and must be taken into account. Proper grid input protection fuses must be used. Never rely on the host's transformer for power!

The micro-controller WiFi antenna must be routed outside the host unit's case, for the obvious radiation reasons. Some ESP32's feature male U.FL connectors for this exact reason. A 6dBi omnidirectional stick hidden at the back of the case should suffice for most use cases.

For a functional SD-card interface, one must use pull-up resistors for data, CMD and CS pins, keeping in mind not to affect MCUs boot selection. If possible, the SD-card slot should be placed in an accessible region of the host unit. This will be infinitely helpful for changing the memory card without having to open the case.

The device should contain small signal relays capable of controlled switching between the host and the device DAC's. This means one should disconnect the host's output, route it to the NC relay input, then route the relay's output to the unit's output. Preferably, both jumper wires should be screened to the signal ground.

Mechanical fixation of the device's board must be made with no or minimal effect on the host's esthetics.

The device's output level should match the one of the host unit. That could be achieved by the use of an audio-grade op-amp, which could also bring the signal's impedance to healthier levels.


can no longer rely on Arduino, mainly because of the high-level genericity of the API and lack of support for all MCU settings in the IDE. The biggest problem, however, is the inability to specify memory placing attributes for 3rd party library functions. By default, all functions are is linked to flash memory, which is slow thus unsuitable for ISR and time critical tasks. Another contraption is the build chain, which compiles everything after every small change. That, of course, if it doesn't break for some weird reason such as "panic: runtime error: index out of range [N] with length N". Build automation is out the window from start and one's only hope of make-like builds is using arduino-cli, which is convoluted and too command-centric to yield a stable configuration to work with.

ESP32 driven PCM56 audio player prototype ESP32 driven PCM56 audio player prototype. DAC chips completely shadowed by jumper wires.

Espressif, on the other hand, maintains a cmake-based build chain for their MCUs and a whole lot of just about anything, from project and sources config management to flash manipulation, under one tool. It is called IDF and I highly recommend it. This not only brings the developer closer to the native API, but also to sane low-level notions and data structures. It opens the door to FreeRTOS providing multi-tasking, critical sections, queues, stream buffers... you name it, they got it. Needless to say, it gets one outside the Arduino IDE comfort, but offers so much more... like Morpheus's red pill.


In spite of running logic jumper wires in close proximity to the audio signal, the breadboard prototype works pretty well. The migration to ESP's IDF poses a succession of hill slopes but conquering them is as satisfying as popping bubble-wrap pads... one-by-one. One should fear mountains. Mine is the libFLAC integration right now, but about that and others in a future post.