I think ever since @carlc27843's Impossible Mission R.T. cart came out, people have been wondering if they could make background music for a cart using PCM synthesis. carlc27843's Emulated Amstrad CPC Chiptunes post discusses using its engine that way, @luchak has had to let people know that the RP-8 groovebox can't be used that way ... folks are curious.
I don't know a lot about digital audio synthesis, but from the conversations that have happened in the PICO-8 Discord, it sounds like there's roughly three sides to the equation:
Cost
- How many tokens and bytes are cart designers willing to give up to the soundtrack? @bikibird's Speako8 Speech Synthesis Library is under a thousand tokens - is that a good target?
- What percentage of PICO-8's CPU budget? Four voices with 25% CPU seems possible in a few different ways, but is that too much to give up to background music?
- How much memory, Lua and addressable? Most forms of synthesis probably run out of CPU first, but this could be a question if you're making a lot of lookup tables.
Usability
- How do you program tracks? Does it use PICO-8's built-in tracker with its own sound sources? Does it have a custom editor?
- How do you add them to your games? Presumably you copy a bunch of code into memory and add a function or coroutine to your game loop, but where and how do you store tracks?
Quality
- How many simultaneous voices? PICO-8's built-in tracker allows 4 simultaneous sounds, but most game BGM is built with 2 or 3.
- What effects can you add? Reverb is probably out of budget, but echo is possible (if memory-expensive), and distortion and compression are totally feasible. As are filters - certainly low pass, high pass, band pass, and notch.
- What kind of synth do you make?
It's definitely possible to make:
- Simple waveform synthesis (e.g. sine, square/pulse, sawtooth, triangle)
- FM synthesis
- Sample-based synthesis (very storage-expensive!)
- Wavetable synthesis (the original PPG Wave synthesizers only had 8-bit samples! but this is also storage-expensive, if less than sample-based synthesis)
- Subtractive synthesis with any of the above as oscillators
...but the more processing you do, the more sound design tools you add, the more expensive your result will be.
Conclusion?
I don't really have one? But I think it would be good to have a space on the official forums where people who are thinking about this stuff can talk about it. I haven't made a lot of games, so I don't know what what a good budget would be for game developers ... and I haven't made much music with much outside PICO-8 and, like, an actual piano, so I don't know what a good synthesizer would be for game soundtrack composers. And I know barely anything about software synthesis, so I don't know what's possible, what's easy, what's hard, or why my low-pass filter makes hell noises if I give it the wrong parameters.
I think it would be cool to share knowledge, and the forums seems like the best place to do it.
For my part, I've been slowly trying to make a PCM synth button keyboard using stat(28,scancode) for raw keyboard input. In my head, I feel like this is a good intermediate step between simple test carts that do nothing but make basic waves and any kind of PCM tracker project: it's a fun toy on its own, the CPU cap is much more generous, and it lets me test out if the sounds sound good and also if it runs well enough to even try to make a tracker.
My code has some real problems right now, but I'll probably share a work-in-progress once it reaches a point where it's good enough to scavenge pieces from. Programming the input system for playing a computer keyboard like it's a piano keyboard is boring, and if I have something that works and others can copy+paste, I'm down with making it copyable and pasteable.
Just occurred to me that this thread could be used to share, like, fundamental knowledge that would be helpful to anyone who wants to try and figure out how to PCM synthesis. Here's a thing about aliasing.
Aliasing: The Explanation
So, like, if you want the scientific explanation with a really great animation, the Wikipedia page on aliasing has one, but here's a terrible drawing I made in about thirty seconds in MS Paint:
Each vertical line is a sample point, and the top (blue) line is being sampled more-or-less correctly: there's one point at -1 for each trough and one point at 1 for each peak.
The bottom (red) line is going at three times the frequency - there's three peaks for every peak of the blue one - but there aren't enough sample points to capture it, so it ends up looking exactly the same as the blue one. The frequency of this wave has been reflected down to the same point.
Now, why does it matter? Because as far as computer audio goes, everything is effectively a sum of sine waves recreated from the samples. And with something like a square wave or a sawtooth wave or even a triangle wave, when you try to recreate that from sine waves, the stack of sine waves goes all the way up to infinity ... which means it hits your folding/Nyquist frequency and comes back down out of tune to make your notes sound all harsh and bad.
So, you wanna get rid of all those harmonics that 5512.5 Hz audio can't capture. And there's a lot of ways to do it, most of which are very expensive, but a few of which are cheaper.
Cheap Anti-Aliasing
Now moving on to just blatantly stealing from @luchak's off the cuff post from last May, which illustrates this really well and with sound: if you have a wave, like a sawtooth or a square, where the value jumps up or down from one point to the next, then you want to modify the two samples around the changeover with something called polyBLEP. (Which apparently stands for ... Band Limited stEP? I guess they wanted it to be pronounceable.)
Someone who's good at PICO-8 programming could probably give you (and me, I would also want this) code to copy that's efficient and stuff, but the idea of it is that you're making a little parabola - something with an x*x bit, an x bit, and a constant number bit - so when you hit that one-sample-wide snippet before or after the flip, the curve goes from flat on the side away from the flip to a steep slope at the flip.
The harder thing to anti-alias is a change of slope. The keyword to search for here is polyBLAMP (Band Limited rAMP); I haven't tried it, but when I did a search, I turned up:
- this paper I haven't read by Fabián Esqueda, Vesa Välimäki, and Stefan Bilbao which talks about changing four points instead of two and using fifth-order polynomials instead of quadratic (second-order) ones; and
- this paper I haven't read by the same authors which uses only two points and only a cubic polynomial, but reportedly doesn't sound as good.
...and honestly that leaves me full of curiosity, so I might give it a go? But if I don't, then now you know what to look for.
@packbat Somewhat related: ICYMI, I made a one-shot sampler that syncs w the tracker for 5ch audio. Docs and code are in the comments on the demo page. It has a lot of issues and limitations, lol, notes on those are also on the page. What I didn't mention there is, it only consumes about 0.06 CPU @ 60fps, so I think one could be able to do simple manipulations of samples on-the-fly as some you've written about (ie, re-pitch / time-stretch) and still have plenty of headroom for a game, though samples would probably need to be stored in separate carts bc of compressed space. I think it could be a best-of-both-worlds solution and I have a big list of ideas on how to improve/expand the functionality, but the sync problems I ran into killed my motivation to pursue the project any further, so it's just waiting for someone smarter than me to take a look at it, lol
@ridgekuhn I did neglect to mention that! And that's really cool - I didn't know samples were that cheap to play!
And I've thought about sync - I feel like the best way to do it is to count samples and hope your serial PCM buffer never empties? because one music tick in the tracker is 183/22050 seconds, so four music ticks is 183/5512.5 seconds, so the clocks are synchronized.
I'm sorry the stat(56) approach didn't work - that's frustrating.
@packbat Yeah, I'm not sure where the problem is. It was fun to put the demo together though, I'd never worked w raw PCM data before, so I'm just happy that it even works at all, lol!
@ridgekuhn This might not be helpful, but thinking about note timing makes me think about the lag between sending bytes to serial() and sound coming out, and specifically the intermittent way PICO-8 pulls samples out of the buffer. I was testing out how the PCM buffer as measured by stat(108) changes over time, and I noticed that it seems to grab samples intermittently in batches - so I started playing with different strategies for filling it:
The three comparison points here are the naive "add enough samples to get it to target" strategy, the "add 94 samples unless the buffer's full" strategy that I'm told RP-8 uses, and the "I took a control theory class in college once" strategy I came up with for my button keyboard, where the number of samples ramps up or down depending on how far from the target it is.
Edit: this might actually be really helpful to think about for embedded BGM synth makers - the RP-8 strategy of either sending 94 or 0 samples would probably keep the maximum CPU used by the synth as low as possible. That said, you probably want to initialize the buffer by filling several frames worth of samples, just so it doesn't immediately run out.
@packbat This might be helpful indeed! Thanks so much, I'll take a look at this when I have some time to dedicate to it! I believe I'm using the "naive add enough samples to get it to target" strategy; ie, I use stat(56) and stat(108) to calculate the number of "empty" bytes (value 127 since that's the 0-crossing) needed to fill the buffer so the sample buffered immediately after that padding hits on-time. That calculation seems to be correct, as sometimes the value will be negative, indicating a sample is going to be played late. Assuming I designed the demo song and sample lengths correctly, that value should never be negative in the demo, but sometimes it is. I'm not sure if it's a chicken or egg problem though, and it left me suspecting that stat(108) is somehow the culprit. I tried cropping off the front-end of the next loaded sample to keep everything in time; it worked initially but sounded bad, and at some point I broke that part of the function, lol. Anyway, hopefully your cart has the answer!
to repost what I said on the discord: my current prototype for a 4 voice Phasemod kinda thing is at .34 cpu.
I think that's probably on the higher edge of acceptable, so maybe cutting a voice would be better.
[Please log in to post a comment]