spoticli
a music streaming tui application
Prologue
Streaming is a pretty cool concept, sometimes, it seems like something only the cool kids do. Well, when I thought about it–I am a backend developer and I don’t know how to stream? I set out to learn how to stream and I what was the most rewarding bit of this, was arriving at a solution through an evolution of attempts.
I knew a few things when I started this–it was going to be comprised of a frontend with a TUI (terminal user interface) and the backend would get the files from an S3 bucket.
My first few attempts, I tried to segment the mp3 file on arbitrary bounds. Only the first segment would play for some reason. I learned that the first frame worked because it had an mp3 header, allowing the mp3 player (decoder) to play the data. I then figured just to prepend the header to other frames, and maybe the decoder would work. Well, it didn’t.
The approach where I naively attached a dummy header could have worked on older mp3 files because of what is known as a CBR (Constant Bit Rate). More modern files uttilize a VBR (variable bit rate), meaning headers may not always be the same size.
My solution got even more inviolved, and it was pretty cool. I had to start doing bitwise stuff. I began reading the mp3 spec, as I realized that if I wanted to know the exact bounds of an mp3 frames, I would need to calculate it.
It was not easy, but it was definitely a fun project.
Description
A program to stream music from the command line. This program can stream or download mp3 files from aws s3 to a command-line user. Downloading can be done directly (presigned url) or via the backend serving the content. Streaming cannot be done using presigned urls as some backend processing is required.
Streaming is made possible due to the Decoder service. It decodes whats needed to comput the sizes of frames and tags and metadata.
See the README in spoticli-backend for more on the algorithm and backend architecture.
Getting Started
The main parts are spoticli-cli
and spoticli-backend
, and a README.md describing their setup will be in each of these subprojects.
Example
Running the stream or download command, will generate a prompt of songs to choose from. The songs are what is being stored in the database which is running in a docker container.
❯ ./spoticli-cli song play
Use the arrow keys to navigate: ↓ ↑ → ←
? Select Song:
blinded_in_chains.mp3
the_wicked_end.mp3
▸ bat_country.mp3
sidewinder.mp3
↓ blinded_in_chains.mp3
This backend provides an API for downloading and streaming music.
For streaming, a naive approach is to segment an MP3 file randomly. This won’t work though because mp3 decoders normally require a mp3 header for decoding. And even adding a fake header to randomly segmented partitions will not work because the frame header contains information that may be exclusive to a given frame.
The approach I ended up devising, was to partition each frame at the frame boundaries. Then when data is sent, a cluster of frames are merged into a byte array sent.
This means that I first decode the ID3v2 tag, so I can remove it. Then I decode each frame’s header to get the frame’s size. I can then use this information to move to the start of thg next frame. I repeat this until no frames are left.
The anatomy of an mp3 is as shown below,
+---------------------+
| MP3 File Header | --> Metadata (e.g., ID3v2 tags)
+---------------------+
| Audio Frame 1 | --> Contains header + data
+---------------------+
| Audio Frame 2 |
+---------------------+
| Audio Frame 3 |
+---------------------+
| Audio Frame N | --> Last audio frame
+---------------------+
| Optional Metadata | --> Footer (e.g., ID3v1 tags)
+---------------------+
In my algorithm, I strip the initial ID3v2 tag, then break the file apart by frames, as shown below,
+-------------------+ +-------------------+ +-------------------+ +-------------------+
| Frame 1 Header | | Frame 2 Header | | Frame 3 Header | | Frame N Header |
+-------------------+ +-------------------+ +-------------------+ +-------------------+
| Frame 1 Data | | Frame 2 Data | | Frame 3 Data | | Frame N Data |
+-------------------+ +-------------------+ +-------------------+ +-------------------+
| | | |
(1152 samples) (1152 samples) (1152 samples) (1152 samples)
The frame slices are then grouped together, such that there is x frames per cluster. The size calculations required for this technique can be found in the Decoder service.
Also a feature of this backend is when doing a range request, only the start position is respected.