ADPCM this time..

Posted in Uncategorized on November 24, 2015 by pcedev

Black Tiger had made the comment that if hucards had enough storage, they could have used streaming voices for cinemas using ADPCM and 10bit paired channel output.

That got me thinking, is that really feasible? And the answer is; yes, yes it is. I put out a demo playing two songs. One was 20khz and the other was 33khz, but neither was interrupt driven. So it got me thinking, what kind of acceptable ADPCM playback can I get from timed interrupts? What kind of resource am I looking at? Storage-wise, ADPCM is 4bit per sample. 4bits for a 13bit output is pretty decent IMO (clipped to 10bit for the paired channels).

The mednafen authour wrote the decompressor, and I’ve modified it slightly with a few case optimizations, but otherwise it’s pretty fast. So for 15.3khz (not 15.7khz) output, I’m looking at 50-55% cpu resource. And that’s the normal, non self-modifying, code version. Everything is contained within the VDC interrupt routine, so it’s self managing. That’s always nice because the other option is buffer fill and buffer read, and that gets tricky with timing.

So this soft playback ADPCM streaming sounds great at 20khz, but what does it sound like at 15khz? Hopefully pretty decent. From what I’ve heard in comparison to ADPCM on the CD unit itself, this soft playback routine seems to sound better. It might have to do with how the original ADPCM chip in the PCE CD unit is 10bit output too, but it can clip and overflow rather than saturate into positive or negative amplitudes (i.e. does it clip at 10bit, or 12bit but output 10bit?). Or maybe it’s something else, as in a filtering effect of the PCE audio circuit compared to the ADPCM output circuit of the CD unit.

Typically, CD games use 8khz ADPCM output for sound FX, and sometimes streaming.

So where is all this going? Well, I have a SF2 mapper and a flash card.. and if I reserve 2048k just for streaming audio, I can do a small demo (shmup) with streaming music. I only have 274seconds to work with, if I reserve the lower 512k for the game/demo itself. 274 seconds isn’t a lot, but I can loop tracks. At a minimum, I would need two level tracks and a boss track. Optimally though, I would want a fourth ending track. So something like three 70second tracks and one 64second track. Or whatever. How it’s divided up isn’t really an issue.

I spent yesterday reworking the ADPCM routine into a VDC interrupt routine. I also picked out two levels from two other shooter games of other consoles. The demo is going to be a simple vertical shmup/shooter.  I was toying with the idea of the canyon level of Musha, and the 3D fire level of Axelay, with the Axelay level proceeding the Musha level (kinda makes sense). The graphics won’t be exact, but the effects will be similar. I plan to rip other enemy sprites from verty shmups too, and probably do a different boss for the Musha stage. I have 512k to work with for graphic assets. For both the Axelay 3D level and the Musha canyon stage, I spent quite a bit of time doing calculations for effects as well as redesigning the approach to those effects (with 60fps in mind). It’ll be kinda tight, but I’ve worked with worse.

As for the PCM engines, I did some work on those as well. The first XM player is done and I’ll probably release a very simply demo for it, and then one with a song demo afterwards.

 

But back to Black Tigers ponderings, if you did 7khz ADPCM for voice then that’s 3.5k per second. If you reserved 512k of rom for ADPCM, that gives 150seconds of speech or audio. If you used PSG/chip for music and some sound FX, you could easily put together cinema audio tracks. The silence between speaking or other audio parts, doesn’t need to be stored. Cinemas don’t take a whole lot of resource; I could even do realtime linear interpolation for that 7khz on a 15khz output.

But all this talk about compression, makes me wonder how some other compression schemes out there would sound. Maybe something less cpu resource than ADPCM. Something like range encoding delta PCM via block segments (kinda like the snes).

 

More PCM player stuffs

Posted in Audio on November 16, 2015 by pcedev

The wave conversion tool is up and running and looping support is working flawlessly. It’s forward looping only, but I’m gonna add ping-pong loop support soon. Ping-pong support will be hard coded into the wave, since it’s too much cpu overhead, or work, to change how the PCM frequency driver works.

So the tool outputs a specialized format for the player with a small header containing the loop points. Looping on the player side doesn’t take any additional cpu resource, which is nice.

Did you know that Batman on the PCE does precalculated frequency scaled waves of a single bass guitar instrument, and actually has a loop point section? There’s the attack part and the loop part of the waveform. It means a tiny sample can be made out to be a really long sound. IIRC, there’s 2 octaves which means a total of 24 notes or 24 samples. It gives the bass guitar instrument a nice punch-y sound that the normal PCE channels just quite don’t reach when emulating/modeling it (although they do a good job).

Anyway, more on the player itself. So octaves follow a formula of 2^a, with a being the octave. This means the rate of change is increasing in between octaves. Notes are section of frequencies along with octaves, and they are also part of that 2^a format, except they exist in between octaves ranges. It now becomes something along the lines of 2^(a+(b/12)), with b being the note (ranging from 0 to 11). The frequency difference/distance between each note increases as the frequency increases. It’s not linear.

Since octaves are an exponential function with a base of 2, I can simply binary shift the frequency to get my octave range. Therefore I only need to store 12 note frequencies in a table for once octave, and the rest can be derived from there. But I need more than just notes and octaves; I need to be able to slide a frequency up and down. So I increased the table from 12 notes, with 32 frequency steps between each note – for a total of 384 entries. Still not bad. Not only do I now have frequency sliding control with precision that I can track, but I now have a method of fine tuning as well.

It works like this: O:N:S. O is the octave, N is the note, and S is the step. S ranges from 0 to 31, any carry/borrow gets added/subtracted from N. N ranges from 0 to 11 and carry/borrow affects O. O ranges from 0 to 7, with 3 being a 1:1 or rather no binary shifting. I build the frequency divider from O:N:S number, but I only need to do this when there’s a change in frequency. And when it is performed, it’s pretty fast. There’s no multiplication: only one table fetch, a few shifts, and one addition (finetune). The other nice thing is, all inter-note frequency steps are the same ratio for any octave. So if you do a vibrato effect, it has the same strength if the note is high pitch or low pitch – unlike period based music players that rarely ever compensate for this. Under period based players, this presents a problem if you have an instrument where you have vibrato effect going on and you want to slide (portamento-to-note) to a higher frequency note – that vibrato effect that sounded perfect might sound too extreme at a higher frequency. You would have to compensate by having a function scale back the vibrato strength while going up in frequency, and this would be trial/error (I have yet to see a music driver do this, but it’s doable). So this resolves that issue.

Anyway, the player (driver) is done but I’m building what amounts to a small music engine to demo it off. So that takes time. Plus, I’m trying to modularize this into sections; the driver, interface support, music engine example. I need to keep them all separate so it’s easy to pick and use just what you need (assuming anyone uses this).

 

Do I expect this to revolutionize the PCE sound? No. This is more of a proof of concept. There are definitely strengths and weakness of this approach. Some samples will sound dirty, or unpleasant, so it really depends on the sample itself and I believe that puts a limit on how useful this is. The 5bit resolution of the PCM is also another issue; for some samples it’s fine and for others it can be hiss-y or noisy. Techniques like 1 or 2 point (difference) volume-map to keep noise at a minimum might help, but it really depends on the specific waveform. That said, I think the 6bit PCM player (the other approach) has more promise than this, but this approach, this player, is simpler in both execution and interfacing. I’m also reusing some stuff here for that other driver.

Scaling the input frequency

Posted in Uncategorized on November 11, 2015 by pcedev

So.. I’m writing this wave file converter, adding all kinds of support and options, when I came across the issue of storing multiple samples of the same origin but different octaves. This is an optimization technique to help retain frequency ranges within an instrument.

So to visualize this; the main driver always outputs 7khz. It doesn’t matter what the input frequency is, the output frequency is fixed. Now, to scale a frequency on the fly – the fastest way is to do nearest neighbor scaling. If you think of it as in terms of graphics, a waveform is just a single scanline. That’s it. One scanline. The brightness of a pixel is the amplitude of the waveform. And how does nearest neighbor scaling work on a fixed resolution scanline? You either repeat pixels or skip pixels, all depending if you want to shrink or inflate that image on that single scanline.

Anybody who’s worked in photo editing software, has seen this effect first hand. But it’s even simpler than that; the SNES mode 7. We’re all aware what happens when the snes scales up – it gets blocky. But pay attention to when the image is at a point that is smaller than 1:1. I.e. in shrunken state. The pixels become distorted. This is because the pixels in the shrunken image cannot appear in between the real pixels of 256 resolution. One option, is to increase the horizontal resolution so the steps become finer. The snes obviously doesn’t have this option. But there still comes a point where the image shimmers, as it moves in and out of zoom. That’s where other fancy techniques come into play and interpolate the distance between pixels, and distributes that.. etc.

Audio works much in the same way, but our brain is more forgiving when it comes to sounds than visual data anomalies. So what’s the issue here? How can we solve this?

First, the issue is this: the output frequency of the PCE TIMER driver is 7mhz MAX. If you scale a waveform to play at a frequency below 7khz, then it’ll play just fine with all the data intact (no missing samples). But here’s the catch, you get no resolution benefit for those repeated samples. In other words, you are not working with the optimal frequency band of 7khz. If I scaled a waveform that has a 1:1 rate of 7khz, to 1:2 rate of 3.5khz.. that 7khz main driver gives it zero benefit. Quite the opposite; I’m losing potential frequency resolution output.

Now, this needs to be understood in a larger context. The idea is to avoid issue when the input frequency is higher than the output frequency – anything greater than 1:1 (like 2:1, 4:1, etc). When this happens, you get all sorts of frequency anomalies as well as unintended reflections back into the output (nyquist frequency artifacts). So one approach is to store an instrument sample in multiple octaves. When you move out of that octave range, you switch to a different sample in that group. It cleans up the sound and removes this audio artifacts from the scaling routine. There’s also the added benefit that scaling with nearest neighbor, takes nothing into account. You can be destroying potential frequency ranges simply by skipping <n> amount of samples. If you properly resample, with an external app, you can get those frequency ranges back. Well, to a point – but it’s soo much better than simply skipping samples. Think of this as mip-mapping of textures for 3D graphics. It’s much the same application, although for different reasons.

Ok. So we have this approach that fixes upward scaling (shrinking) by mip-mapping our octave range for a given instrument. The second issue arises now. If all samples in a mip-map range are 1:1, then the difference from (octave+1) to (octave) is the frequency divided by 2. So you work with 7khz from the top and as you go down in notes (notes that approach the octave one step below the current one), the input frequency falls below 7khz. Normally, if the output driver is of high enough frequency, then this wouldn’t be so big of an issue. But 7khz is pretty low as it is, and you definitely want to keep every single HZ of that driver output at such a low rate. Linear interpolation is the derivative of the main function; the distance between two points. [f(x+a)-f(a)]\(x-a). It’s the slope formula. The change in Y over the change in X. In this case, the change in X is one – so (x-a) is redundant. Using the delta symbol, [f(Δx)-f(x)]\Δx.

Doing linear interpolation on the PCE isn’t difficult, but doing it real time is still requires some additional steps. If a sound engine is approaching 20%,30%,.. 50% cpu resource, then you want to save as many cycles as you can. The idea instead, is to encode this one sample interpolation into the wave form itself. Where would you put it? In between two samples that it’s derived from. This automatically doubles the waveform in size, but more importantly double the waveform in size played back on a fixed frequency is the original waveform played back one octave lower (or half its speed). The doesn’t help us directly, but what if the input drive (the frequency scaler) always skipped two samples – regardless of the waveform? Or rather, if you want 1:1 playback of a waveform, you set the input driver to 2:1. Since the frequency of the waveform is double, and the input driver is now default to 2:1 for the top frequency, the original waveform plays back without any artifacts – even though we’re above the nyquist limit (the output driver).

The benefit here, is realized that if the top limit of the input driver is n:1, with n=2, then as n approaches 1… the interpolated samples get played back instead of repeated samples. At 5bit resolution depth, it might not have a huge impact in the output range. But as you approach 8bit resolution, this becomes more significant. 6bit is twice the resolution of 5bit in audio output. That’s a lot.

So anyway, this approach maximizes the output quality of an instrument sample as you work your way through the mip-map set. The only question here now is, what’s better than linear interpolation? What would make a smoother transition from one mip-map sample to another? Maybe if you actually embedded the sample below it into gaps in the sample above it? Of course, those individual samples would have to be resampled separately by themselves, before being inserted back into the “gaps” – else it just because the original again. I’m not sure which method is better. Maybe a blend of the sample below it with the linear interpolation, and have the weight of that blend as it approaches the lower octave. I’d have to do some tests to see if there’s an audio benefit.

This is one of the features I’m working on adding to my wave converter. I was going to do the resampling myself, on the input waveform, but Cool Edit Pro does such a nice job for me. It wins out in sheer lazy-ness factor. I just added the option for linear interpolation or embedded two input wave files.

PCM engines

Posted in Uncategorized on November 9, 2015 by pcedev

I’ve already stated that I’m redoing the 4 channel XM engine, and it’s up and running with a few looped notes until I finish parsing a particular mod file for simple demonstration, but I also have another engine I threw it together over a couple of hours on the weekend; an 8 channel static PCM player. 8 PCM channels at 6bit (higher than the native 5bit) and still leaves 4 normal PCE channels. Yeah, a total of 12 channels.

The second engine required more support though. The first engine only required a small 384 word table. This second engine, because all the PCM channels are mixed in software, requires volume tables because multiplying each sample is waaayy out of the scope of the PCE – tables do the work even faster. The PCM format also needs to be in 2’s complemented numbers. The PCM data might be 6bit, but it’s in 8bit format as signed numbers. Maybe this is overkill, but it just feels cleaner than adding any possible side effect because it’s not centered (a relative centerline). I’ve done mixing in software before with unsigned samples, but the center line moves around (the waveforms still accumulate the same). It just doesn’t sit well with me, so 2’s complement signed format it is. But that means the volume table has to include all 256 entries even it only uses 6bit resolution/values. At 32 levels of volume control, that’s an 8k table. Doesn’t need to be ram; fits anywhere in rom.

But yeah, so it needs more support/tools surrounding it. I had to make a wave file converter in C. I needed to make one anyway, so this isn’t a total waste of time. So what does this engine eat up cpu wise? about 21-22% cpu resource. I kid you not. 8 PCM channels, at 6bit vs 5bit, and still have 4 PCE channels left over – all faster than Air Zonk does to play a single PCM channel. Yeah, Air Zonk has a horrible PCM routine that eats up 30-33% cpu resource. I couldn’t believe it, but I checked it about 20 times over, and each time was 30 to 33% (33% when it has to fetch a new sample to bit shift, 30% when it’s just playing that sample).

Keep in mind, none of these 8 channels in the second engine scale in frequency like the first engine. It’s actually nothing super special or radical. 8 channels are soft mixed, with volume control for each channel, into a single buffer. That buffer is played using two PCE DDA channels to output 10bit audio. That’s it. The other downside is that it’s mono. If you want stereo, you have to take away another two PCE channels for a second 10bit paired output. Not enough channels you say? Want stereo you say? Well, bump up that number up to 37.3% cpu resource and get 16 PCM channels – stereo. You still have 2 regular PCE channels left over. For an extra 5% cpu resource on top of that, I can make 4 of those stereo channels frequency scale XM style. So many options…

That’s ridiculous! What the hell would someone do with 16 PCM 6bit stereo channels!?  But hey, 18 channels on the PCE would make a great demo – no? Brag rights and all that sort of thing, I guess. Does the PCE have enough power to do more than 18 channels? Don’t ask. But yes. It does.

PCM player

Posted in Audio on October 31, 2015 by pcedev

That old PCM player that’s in the download in links section? No good. I was looking over it and realized that not only is it convoluted and complex, but the interface really doesn’t show how to use the damn thing. It’s a poor example.

So I revisited it. I’m going to attempt to modularize it into an easy library of sorts. I’ll include macros and routines for basic functions for controlling each channel. I also made it a little more friendly to memory page layouts compared to the fix bank requirement from before. This bumped the cpu resource up by 2.7%. In addition, I added dynamically controlled fixed frequency sample streaming for the last two channels. So 4 XM channels and 2 PCM streaming channels.

I refer to them as XM channels, because they use a linear frequency approach compared to period base system of MODs. It’s a phase accumulator. The nice thing about this, is that I don’t need a huge table to translate notes to frequencies. I actually only need a table of 12 notes at C3 (octave 3). Everything else can be derived by shifting that left or right. The phase accumulator is 19bits long with the top three bits being the whole number and the lower 16bits being the float part (fractional value). So I have a finetune table of 16 points between notes (plus or minus direction) for finetuning the wave form without the need of an external program. And lastly, I have 32 frequency steps in between notes. I could have more… waaayyy more given the 16bit fractional part of the frequency divider, but honestly from my experience 16 usually cuts the cake anyway, so 32 should be enough. The steps are for frequency sliding (or vibrato).

I’ll include a small command line utility that prep a wave file to the format needed for the player. The player supports loop points, so those would need to be included into the waveform directly for that support. I also don’t track the waveform by its length, but rather have an end of marker value in the stream itself. So those need to be included as well.

So the numbers I have for 4 XM channels is 29% cpu resource from the call to the RTI at 117 times per frame (1/60) – for all channels playing.  The frequency division is all handled inside the timer routine, so you would just need to provide the note to play.  For the 4 XM and 2 PCM, that number jumps to 36.6% if all six channels are playing. Still not bad at all. The timer interrupt is actually 6.991khz and not 7khz, but I reset the timer counter ever vblank to keep it in sync, and do an mid-between call to the timer routine – so it ends up being 117 calls per frame or 7.02khz. Honestly, if I skipped it and did 116 I doubt there would be an audible difference.

 

On the topic of audio, have you guys ever heard the PCE stream 55khz 5bit PCM using the channel’s waveform memory as a buffer? Check it out: http://pcedev.net/6280a/sgx_dump.wav   That was recorded from my SuperGrafx. Basically it’s a 1.7khz timer interrupt, which means an interrupt called 29 times per frame. Each interrupt call, just refill the buffer. On the SGX, you only need one channel to do this (or any of the core grafx consoles with the 6280a processor). The regular 6280’s need two channels to do this. Sounds pretty good for 5bit audio. Because of how audio tends to get filtered/blended at that higher range, you can actually interleave channel streams and have them mix together. You could do a ~64khz output setup, but have four streams feeding it in interleave format for a 4 channel PCM stream at 16khz each. Not bad at all, especially considering how light the cpu resource is compared to something like 7khz single channel output the old fashion way. But here’s the crazy thing, as if that wasn’t crazy enough, you can do this with two channels on the SGX and stream 10bit audio in the same fashion. Just let that sink in for a minute…

Email and server is back up..

Posted in Uncategorized on October 30, 2015 by pcedev

I’m currently reconciling all the stuff between my laptop and new desktop setup. I don’t think my laptop is going to survive much longer.

SamIam and I talked about the possibility of a dub for Spriggan Mark 2. That’s all I’ll mention at the moment on that.

I finished my final for one my classes, so things I’ve got more free time, which is one of the reasons why I’m setting up my desktop system; pce dev stuffs. In an attempt to get more organized, I’m going through and trying to make a list of all the wonderful things and possibilities of PCE stuffs that I want to do. I have a terrible time trying to keep track of all these things, so I need to organize them into a system. I tend to forget about things. It’s not that I lose interest, but I forget and pick up other interesting stuffs.

Anyway, just an update. Also, if you come across anything that appears damaged (file wise) for my links, let me know.

Email and file server is down

Posted in hacks, homebrew, NES2PCE, retro on October 17, 2015 by pcedev

My server account got suspended cause I didn’t pay the bill. Hopefully I can get it back online in the next few days.

I’ve been messing with PCE hardware this weekend. Also looking over what projects are due, what needs attention, and what is going to be put on the back burner (for time resource reasons).

That and just some gaming on the SGX+SCD. I was playing SMB (nes2pce) and realized I could optimized some of the nes2pce PPU emulation code. Anytime the cpu writes to vram, the internal emulation code has to do all these checks to make sure it’s transferring data to the right area (and how to interpret that data). I needed this because I found games could just load PPU tile, sprite, and tilemap stuffs all in one shot.

Of course this slows things down having to do all these checks. Sometimes it puts what would normally be the cpu in vblank time, into the start of active display. I figured one way to speed this up, is to have dedicated $2007 write functions. NES games tend to setup strings of data into a buffer to be updated during vblank (because game logic is processed during active display). This means it’s very rare that a string of data will cross a tile bank into a nametable area, etc. So for those areas of code, I could potentially use the faster and specific $2007 write functions.

Nes2pce code was never meant as general emulation for NES games to run on the PCE. It was to get the game up and running close as possible, to help the transition of replacing the original NES sprite and map routines into native PCE stuffs. Though none of the nes2pce stuffs I’ve released have these changes.

Ideally, the sprite and map routines should be hacked for direct PCE writes, by passing some of the PPU emulation. So the optimization I listed it kind of counter intuitive to the goal of nes2pce. But I just don’t have the time to alter each game for these kinds of changes. I used to have that kind of time, but I squandered it :/ Eventually, I will have that kind of time again – but will people really care about nes stuffs on PCE by then? One example is Dragon Warrior. It’s the first RPG I’ve ever played, so I have nostalgia for it. I’ve already hacked the map routines somewhat. The game is simple enough to keep modifying. But should I really put my time into it? This is the dilemma I’m faced with nes2pce.

Anyway, just thinking out loud.