Scaling the input frequency
So.. I’m writing this wave file converter, adding all kinds of support and options, when I came across the issue of storing multiple samples of the same origin but different octaves. This is an optimization technique to help retain frequency ranges within an instrument.
So to visualize this; the main driver always outputs 7khz. It doesn’t matter what the input frequency is, the output frequency is fixed. Now, to scale a frequency on the fly – the fastest way is to do nearest neighbor scaling. If you think of it as in terms of graphics, a waveform is just a single scanline. That’s it. One scanline. The brightness of a pixel is the amplitude of the waveform. And how does nearest neighbor scaling work on a fixed resolution scanline? You either repeat pixels or skip pixels, all depending if you want to shrink or inflate that image on that single scanline.
Anybody who’s worked in photo editing software, has seen this effect first hand. But it’s even simpler than that; the SNES mode 7. We’re all aware what happens when the snes scales up – it gets blocky. But pay attention to when the image is at a point that is smaller than 1:1. I.e. in shrunken state. The pixels become distorted. This is because the pixels in the shrunken image cannot appear in between the real pixels of 256 resolution. One option, is to increase the horizontal resolution so the steps become finer. The snes obviously doesn’t have this option. But there still comes a point where the image shimmers, as it moves in and out of zoom. That’s where other fancy techniques come into play and interpolate the distance between pixels, and distributes that.. etc.
Audio works much in the same way, but our brain is more forgiving when it comes to sounds than visual data anomalies. So what’s the issue here? How can we solve this?
First, the issue is this: the output frequency of the PCE TIMER driver is 7mhz MAX. If you scale a waveform to play at a frequency below 7khz, then it’ll play just fine with all the data intact (no missing samples). But here’s the catch, you get no resolution benefit for those repeated samples. In other words, you are not working with the optimal frequency band of 7khz. If I scaled a waveform that has a 1:1 rate of 7khz, to 1:2 rate of 3.5khz.. that 7khz main driver gives it zero benefit. Quite the opposite; I’m losing potential frequency resolution output.
Now, this needs to be understood in a larger context. The idea is to avoid issue when the input frequency is higher than the output frequency – anything greater than 1:1 (like 2:1, 4:1, etc). When this happens, you get all sorts of frequency anomalies as well as unintended reflections back into the output (nyquist frequency artifacts). So one approach is to store an instrument sample in multiple octaves. When you move out of that octave range, you switch to a different sample in that group. It cleans up the sound and removes this audio artifacts from the scaling routine. There’s also the added benefit that scaling with nearest neighbor, takes nothing into account. You can be destroying potential frequency ranges simply by skipping <n> amount of samples. If you properly resample, with an external app, you can get those frequency ranges back. Well, to a point – but it’s soo much better than simply skipping samples. Think of this as mip-mapping of textures for 3D graphics. It’s much the same application, although for different reasons.
Ok. So we have this approach that fixes upward scaling (shrinking) by mip-mapping our octave range for a given instrument. The second issue arises now. If all samples in a mip-map range are 1:1, then the difference from (octave+1) to (octave) is the frequency divided by 2. So you work with 7khz from the top and as you go down in notes (notes that approach the octave one step below the current one), the input frequency falls below 7khz. Normally, if the output driver is of high enough frequency, then this wouldn’t be so big of an issue. But 7khz is pretty low as it is, and you definitely want to keep every single HZ of that driver output at such a low rate. Linear interpolation is the derivative of the main function; the distance between two points. [f(x+a)-f(a)]\(x-a). It’s the slope formula. The change in Y over the change in X. In this case, the change in X is one – so (x-a) is redundant. Using the delta symbol, [f(Δx)-f(x)]\Δx.
Doing linear interpolation on the PCE isn’t difficult, but doing it real time is still requires some additional steps. If a sound engine is approaching 20%,30%,.. 50% cpu resource, then you want to save as many cycles as you can. The idea instead, is to encode this one sample interpolation into the wave form itself. Where would you put it? In between two samples that it’s derived from. This automatically doubles the waveform in size, but more importantly double the waveform in size played back on a fixed frequency is the original waveform played back one octave lower (or half its speed). The doesn’t help us directly, but what if the input drive (the frequency scaler) always skipped two samples – regardless of the waveform? Or rather, if you want 1:1 playback of a waveform, you set the input driver to 2:1. Since the frequency of the waveform is double, and the input driver is now default to 2:1 for the top frequency, the original waveform plays back without any artifacts – even though we’re above the nyquist limit (the output driver).
The benefit here, is realized that if the top limit of the input driver is n:1, with n=2, then as n approaches 1… the interpolated samples get played back instead of repeated samples. At 5bit resolution depth, it might not have a huge impact in the output range. But as you approach 8bit resolution, this becomes more significant. 6bit is twice the resolution of 5bit in audio output. That’s a lot.
So anyway, this approach maximizes the output quality of an instrument sample as you work your way through the mip-map set. The only question here now is, what’s better than linear interpolation? What would make a smoother transition from one mip-map sample to another? Maybe if you actually embedded the sample below it into gaps in the sample above it? Of course, those individual samples would have to be resampled separately by themselves, before being inserted back into the “gaps” – else it just because the original again. I’m not sure which method is better. Maybe a blend of the sample below it with the linear interpolation, and have the weight of that blend as it approaches the lower octave. I’d have to do some tests to see if there’s an audio benefit.
This is one of the features I’m working on adding to my wave converter. I was going to do the resampling myself, on the input waveform, but Cool Edit Pro does such a nice job for me. It wins out in sheer lazy-ness factor. I just added the option for linear interpolation or embedded two input wave files.