A few old utilities..

Posted in homebrew, PCE PC tools on April 15, 2016 by pcedev

I have a few private utilities that I’m seriously thinking about cleaning up and releasing to the community. One is a palette sorting utility for PCE’s subpalettes and pics. It’s not as fancy as say Image2PCE or Promotion, as it doesn’t generate lossy results, but it does seem to get better results with its sorting algorithm. In other words, you can feed the output of either program and crunch the number of subpalettes down. What I found that works the best, is Promotion and setting the color limit for each tile to 15 instead of 16 and allowing my Pic2PCE app (for a lack of better name) to re-crunch the image again. It also works great with utilities like Mappy: export a tileset, use this utility to sort all tiles into proper subpalettes and then encode them (the colors of the tiles correlating to the subpalette number) into ranges of 16 colors inside a 256 color block (thus 16 subpalettes). This isn’t too useful on it’s own, but I have another powerful Mappy conversion tool that can extract this information. It does all kinds of nice things with the FMP file for conversion, include clipping X and Y ranges (great for making mappy prioritize tiles to specific slots; aka dynamic tile animation), as well as stuff like vram offset, subpalette offsetting for tilemap, output horizontal or vertical strip format, use additional layers in the file to create collision or object maps, conversion of 16×16 mappy files into 8×8 with LUT, not LUT, or native 8×8 support, etc.

The Pic2pce app has an internal GUI already. I’ll be adding GUI support to some of the other apps as well, since I’m work on this area of stuff for the DIY Shmup suite, but you can still use command line arguments for batch processing. I think it would probably be a good idea to make some video tutorials for how to use these tools, too.

SHMUP suite musings

Posted in homebrew, Uncategorized on April 7, 2016 by pcedev

I have two issues. One, is that I don’t want to have to learn a GUI setup for windows in order to make my utilities. This probably the completely wrong approach to a solution; reinventing the wheel. But I’m soo used to it. Ok, here’s how I’ve made my GUI-ish apps in the past: it’s all internal. My GUI needs have always been simple. I just need a few buttons, and a display windows. Nothing fancy, nothing much. I currently just create the buttons and layouts manually. Yes, that means providing the X/Y coords and which functions they correspond to. I’ve decided that if I’m not going to actually use an existing GUI toolset, I should at least create my own toolset for creating GUI frames. Duh! So I’m doing just that. Just a simply visual editor for placing buttons and display windows. That gets compiled into a text define file which I can include into my code and just associate functions via pointers, etc. This probably makes some people cringe, but I think it’s fun.

Secondly, enemy  pattern AI. Well, AI might be a strong word here. Reactive might be a better word for it. But that aside, simple dumb enemy pattern movement.. how do I make this flexible enough that people using the suite don’t have to rely on a list of fixed pattern movements? I’ve taken a cue from the FM playbook. Enemies can be defined by phases. Inside each phase, there are three operators. These operators are a type of behavior; sine, cos, etc. You can assign where the output of the operator is going, X or Y coords, or as a modulator for another operator. Each operator also has assigned attributes like frequency and scale control. The phase has two attributes: time and reaction. The time parameter is self explanatory (how long the current phase lasts), the reactive parameter is a little different. It could something as if the enemy gets shot (doesn’t die in one hit or does and spawns a new objects, etc), or if the enemy gets close enough to the player – change phase. This might sound expensive on processing time wise, but really it’s not. Of course there will always be predefined patterns designers can default to (familiar enemy patterns), to keep resource lighter for the bulk of the level. It’s all about balancing where the ‘spice’ is anyways.

I’ve been playing around with other idea of how to emulate the PCE video and audio in the simulator. This is easy enough, but it’s tied directly to the game engine and not the real PCE. So I’m not worried about accurate emulation – just enough simulation for the game engine. The audio part is going to be tricky for me, as I’ve not had a whole lot of experience with it, but I’m sure from the little that I have done on the PC – it’ll be fine. Again, not really emulating the PCE’s audio as I am simulating it for the chiptune and SFX engine. In my opinion, that’s pretty big difference (and in favor for my side – lol).

Ever wanted to make a game for the PCE?

Posted in Uncategorized on April 4, 2016 by pcedev

Sorry for the lack of updates. This semester is a bit tough (two big credit tough classes pushes me above normal fulltime enrollment).

On top of that, I found out that I’m forced to take a computer science course this summer because my school’s CS program sucks (not really). You can’t take two introductory courses at the same time, so what I could get done in one semester, turns out to be a full year.. and I can’t take any other CS related courses until these two are met, which opens up two more, which opens up three more, which opens up all the rest… wooh! I’m supposed to be transferring into this university as a junior, yet I’m behind because of this. Apparently most other fields/majors don’t have this problem.

The problem is not time, but actually money. I have enough aid left for two years and one semester (and not full time for this last semester either). So basically, taking this summer course helps meet that goal, but is going to financially kill me (summer classes are expensive; $1750 for this one class). Apply for scholarships you say? Yeah, I’m doing that but that’s never a guarantee. I actually did receive a small transfer scholarship, but that’s already factored in. And the requirement for that is I take 15 credits a semester (12 is full time).

So what options do I have? Well, one option would be to do a crowd funding project. My brother is making his own pencils and pens from titanium and other metals. His campaign already hit over $10k. I know a little bit about machining, and could learn to use his CNC setups, but probably wouldn’t be as nice as his stuffs (he has more experience with designs).

The other option I thought of, and have had kicking around in the back of my mind, is a PCE project. The draw back to doing a PCE project, is that I’m neither an artist or musician. It’s not that I don’t have some talent and capabilities in either of these two areas – it’s that I’m not up to snuff compared to people that specialize in these areas. I can copy/emulate these two areas on my own – but they will never be on the level of dedicated people in these fields. There are a billion and one shmups for the PCE, and I want to add one more – lol. This has always been a goal of mine. I definitely have the capability to do a great technical shmup on the PCE. But that’s not enough to make a good game.

So why not just make the damn shump already? Well, I kinda have. The problem though, is time. Making a game engine is easy. It’s one of the easiest parts. But the real work is polishing a game until it’s fun to play, etc. 10% of the work is making the game, 90% of the work is polishing and tweaking it. Ever play a technically impressive game, but you’d rather play something that’s not as fancy simply because it’s soo much more fun? Well, I sure have. Homebrew community tends to be more forgiving, but I hold myself to the same standards at the commercial softs of the system’s era. I.e. I will always find fault in the presentation of my own stuff.

Where all this rambling is going.. PC Engine Shmup Maker. I would create the game engine. I would create a suite of tools to create a shump for this game engine. You wouldn’t need to know how to code in assembly, or small C, in order to create an awesome shmup for the PCE. You would get the benefits of speed and flexibility, like a game created from scratch. I’ve been researching the most popular features of shmups from that era (16bit), and have been working on design implementations of those features. For this first project, the shmup will be vertical scrolling. There will be a whole range of effects capable on a stage by stage basis, as well as hsync effects and dynamic tiles, split screen scrolling, BG used as objects for large enemies or bosses, enemy and horde AI behavior, etc. The engine would be one giant script interpreter. This is how I’ve made my engines in the past (makes development soo much easier). The tools will be gui/mouse based, but create files that content text with definitions and scripting language structure (which means people can create new/additional tools as well). The tool set will also have a WYSIWYG approach for designing the levels; you can simulate the play through of the level without having to constantly export to rom format. There will be chip tune engine, with script format (and probably a visual editor as well), SFX editor, sample support, etc.

So this is the kickstarter or whatever crowd funding project I want to make. This, to me, is much easier to do than the time it takes to polish a game, let alone rely on other people on a team to get assets in time. I want to give back to the community and I’ve always wanted fans to be involved in making stuff for the PCE – what better way than this!? I’m sure many of you that can’t code for the PCE have always wanted to make a game for the PCE. Because this would be a crowd funding project, I’d break it down into free version and licensed version (sell your own game, etc). The free version will definitely be capable and you can freely distribute your work with the community. The licensed version will be for those who wish to sell you game. Have you guys seen the recent advances in repro hucards? Totally doable. I’ll have support for SF2 mapper as well, for those that want it. What about CD? That will probably be a goal marker thing – hucard development first.The complexity of the engine AND the toolset capability will be broken down into goal sets. Basically because it takes longer to implement more features and/or more advanced features.

I’ll be working up the details and prototyping some stuff to show. I want to open the crowd funding thing around the end of May. If you’re excited about this kind of project/idea, please share this post. I’d like to get feedback from the community.

 

VDMA tests

Posted in Uncategorized on January 3, 2016 by pcedev

Ok, so I confirmed what others have tested: vdma is somewhere between 81-85 WORDs per scanline in 5.37mhz mode. So in 10.74mhz mode, that’s 330+ bytes per scanline. The more you clipped active display, the more bandwidth you get for vdma. I did a 209 line display and was able to transfer 17.6+ kbytes during vblank. This is perfect for the bitmap mode and all that I’ve talked about in that regards (free buffer clear, free transfer for double buffer system, etc).

Semester has ended.. and I did some PCE timing tests today

Posted in Uncategorized on December 17, 2015 by pcedev

Some bad news for the TSB/TRB instructions for VDC port $0003; it’s damn slow. It’s not the instruction, but apparently there’s a delay for when the VDC switches from reading to writing vram back to back. I.e. doing a bunch of TRB $0003 to increment the vram point is going to block by 6.5 cycles more than the expected 9 cycles. This is even the case of back to back LDA $0003 STA $0003 instructions – the processor is stalled by the VDC and ends up being ~15.6 cycles instead of 12 for the single pair.  Of course, this was only tested at lowest resolution. If the overhead should be half that for high res. I’ll need to test for that. I still have more vdc read/write setups to test.

PCE bitmap mode

Posted in Uncategorized on December 10, 2015 by pcedev

I’ve talked about the PCE bitmap mode, albeit vaguely, in the past. It’s a 128x128x16 color mode. It’s a linear bitmap mode, which makes it fast for certain types of effects or operations. The SGX extends this to 128x128x241 colors. But I was thinking this morning, is it possible to push this limit. I approach with with a dithering angle; if 256 tiles is all that is needed to show 16 colors, but surely 1024 tiles could be used as extended dither patterns of those same colors. The limit is near 240 colors, but PCE doesn’t have enough room in vram for that (technically it does, but it would need column scrolling which it doesn’t have – so it’s the long way round by wasting memory).

Then I decided to flip the problem on it’s head. In the SGX approach, each VDC pixel is 16colors of that tile along with one of 16 color subpalette. In the PCE method, each tile (or BAT entry) is two pixels. Normally, each pixel has a max value of 16 colors. Together they don’t exceed the limit of 16 colors – in any combination. And in 16 colors, I can’t just apply different subpalettes to the whole tile or bat entry, because both pixels would be affected. Technically I could, but I would need a lot of subpalettes to achieve this goal. More than what the PCE has.

So, I thought about what if I reduced the color count per pixel in the tile/bat to 8 colors each. I still can only show two different pixels, but now I have an easier set of differences to work with. The upper 4bit of the tile, the pixel to the left of the pair, would be colors 8-15. And the lower 4bits of the tile number, the pixel to the right of the pair, would be colors 0-7. I can now setup the subpalettes in groups of 4. Four for the right pixel, and four for the left pixel. This combination gives a total of 32 independent colors per pixel (technically 29 because of color #0).

What it gains in colors, it loses in speed. The 16 color method is fast because each pixel is a nybble. In this higher color method, you now have 5bit pixels; 3bit for the index and 2bits for palette. It’s still faster than using planar graphics AND having to deal with tile segment boundaries. And to top it off, you can still add dithering back into it. The indexes go back to 4bit, with the high bit selecting a dither pattern of C and C-1.. within that 8 color subpalette. It’s more convoluted in that colors need to be setup appropriately in the subpalette. Or, and this is more specific to certain applications, the dithering could be not from the subpixel group itself, but across the two paired pixels. If you visualize this, the 128 pixels going across the screen, each pixel is actually 4 highres pixels from 512px graphics mode. So blending two pixel colors across 8 highres pixels should provide a nice gradient effect – although strictly horizontal in application. The blending would happen by setting the 8 bit (of 0-7 for the tile index) of the BAT entry (giving both C/C-1 option and Blend mode switchable option by high bit selection). I guess you could call it blend mode, as one color blends into the next for that pair. And the second set of tiles would be this fixed pattern.

Wolfenstein 3D on the PC-Engine

Posted in Uncategorized on November 25, 2015 by pcedev

The SNES has it. The Genesis now has it. But can it be done on the PCE? Or rather, a game based on that style of engine? The glorious answer is: yes.

If you look at this logically, the PCE has enough cpu resource to pull something off along those lines. But the devil is alwaaayyyss in the details. As the SNES proves, it’s not so much about the cpu resource as it is the support format. In this case, for the SNES, mode 7 allows byte pixel format (256 colors per pixel). Packed pixel format, to be more precise. The Genesis does as well, but that format is 4bits (16colors).

So what’s the PCE got? Planar graphics. Planar graphics is not Wolf 3D engine friendly. Ahh, but there’s a trick you can do. There’s always a trick, right? You see, it’s possible to setup the PCE in such a way that the tilemap becomes a bitmap display. The quick and dirty details: set the PCE res to 512×224. That’s roughly 64 tilemap entries wide. That’s a bit low, so what if a single byte in the lower LSB of the tilemap could be the index to two pixels? Two 4bit pixels to be exact. Now you have 128 “pixels” in that 64 wide tilemap screen/res. You need a total of 256 tiles in vram to show these pixels. Not bad, not bad at all. But there’s more to it.

A PCE tilemap has the largest size of 128×64. But it can be rearranged, assuming no scrolling, to a layout of 64×128. Not remember what I said above about the 64 tilemap entries equating to double the pixels? That means a 64×128 map gives a linear bitmap of 128×128. The 128 pixels wide are actually double wide, so they will fill the screen.

So now we have a linear bitmap, but zero room for double buffering. That sucks, because no one likes screen tearing. There are a few ways around this, but one very convenient way around this is the vram-vram DMA on the VDC side. According to some tests by another coder, which I haven’t varied yet, if you set the VDC in high res mode during vblank – the VDMA will transfer ~330 bytes per scanline. The understanding is that the VDC is transfer two words per 8 pixel dot clock. I still need to verify this, but if this is true – that means not only would you not need to keep a buffer in local ram, but you also don’t need to waste cpu cycles clearing that buffer or parts of it (render empty pixels to clear sections). Vram also provides a self incrementing pointer. With this kind of bitmap, this means you could do both vertical and horizontal sequential writing. This speeds up writing sequential data. Just to note, SATB vram DMA is 84 words per scanline in low res mode ( a little bit over 3 scanlines) – so it’s reasonable to think that it would be the same for vram-vram DMA as well (84 words is 168bytes in low res per scanline, and 336 in high res mode per scanline).

So now we have a fast bitmap and free clear routine and not need to transfer local to vram buffer. Now 2D raytracing is simple in design, but you still have an issue of pixel fill. You have to read a sliver of bitmap, at as specific scaled factor, and copy it to a pixel column of the bitmap. This is going to dictate the amount of cpu resource to draw the 3D view. The fastest method to write data to vram is embedded opcodes. But that immediate doubles the data in size. Looking at Wolf 3D as an example, the textures are 64×64 pixels. If you embedded them as opcodes, you would still need to have 32 pre-scaled vertical images. 64x64x2x32 is 256k of memory. Doesn’t leave a whole lot of room for textures, plus if the window height is 128 – you need 64 versions not 32 versions. To make matters worse, on a fresh/clean bitmap the first pass of pixels simply writes – but since this a double nibble format the second column write needs to be OR’d against the previous data. Doing that as embedded code against all the scaled frames.. is going to be a horrendous storage requirement.

The normal approach, is just to have the bitmap stored as normal bitmap data (nibble stored in byte format). So take a slower approach but with prescaled data. This is still going to be a fair amount of data, and it’s going to be slower. So what can be do to speed this up, but be reasonable when it comes to storage space? When in doubt, flip the problem on its head. What if the data, the bitmap texture, remained fixed in size… but the code that read it was different depending on the output size needed? Let that sink in…

If the bitmap data is aligned to specific bank boundaries and offsets, then you can create a series of pre-calculated code paths that look to read that data from the logical address (bank mapped) and write it directly to vram. No indexing needed. No indirection needed. Simple LDA addr / sta $0002 / st2 #Fade. There’s no looping. There’s no check system for skipping a read (pixel). Everything is hard coded as specific paths. It’s brilliant. No, I’m not the first to think of this idea (pre-calculated code paths), but it did occur to me that it would really benefit from this rendering style engine. Yes, the code is going to bloat in size, but now I can store lots of different textures in rom.

The catch here, is that you need two sets of code paths and two bitmaps. One bitmap has the nibble stored to the left side of the byte (bits 4-7), and the other bitmap has the nibble stored on the right side of the byte (bits 0-3). The reason for this is on the first past, the pixel data is just written as is to vram (bitmap buffer). All even columns of pixels are like this – nice and fast. But the odd columns need to OR together the second nibble with the even column. This is only +3 more cycles, thanks to the TSB opcode (TSB $0002). So the average is just +1.5 pixel overhead. That’s still really good. Not only do I have sequential access, vertical in this case, but I also have a fast means of READ-MODIFY-WRITE operations.

Did you notice that ST2 #fade opcode above? Since the pseudo bitmap is only 16 colors, I can use all 16 subpalettes for precalculated fades of those 16 colors; 16 fades. I already know the distance of the text from the camera, I can now use this to do 3D light shading. That’s pretty freaking cool. What about objects? I’m still in the planning stages for that, but I can treat them as simple texture overlays. And I can optimize for horizontal or vertical rendering – for whatever is faster for the object design. Also, the objects can overlaid with a fade distance subpalette as well on as per pixel basis. Oh, the weapon or hand can be a sprite.

I ran some numbers and a full texture read out (max height across the screen) to a 128×128 screen, is ~2.3 frames. So 20fps with room to spare in that last frame. To get a idea here, 3 frames is 20fps, 4 frames is 15fps, 5 frames is 12fps, 6 frames is 10fps.  I think the ideal place to be would be between 15fps and 12fps, with decent amount of action and objects on screen. I should note here that the max height, player facing a wall up close, is pure pixel fill rate. An open, normal, area would actually yield a higher frame rate.. up to 30fps. (without objects). Another correction too; that 2.3 frames number assumes the wall texture is 128 real pixel tall. If the game was limited to 64 pixel tall textures (like the real Wolf 3D game), double pixel write mode kicks in and drops the overall cycles per pixel write at a much lower rate. It would be less than 2 frames per second (more like 1.33 frames) at 30fps. Double tall pixels get a boost (scaled up textures) in pixel fill rates. Of course that’s just pixel fill. That doesn’t include 2D raytrace or the small overhead of the hsync routine to reposition the map as a bitmap display – but given both to those, it should just about make it in 30fps.

ADPCM this time..

Posted in Uncategorized on November 24, 2015 by pcedev

Black Tiger had made the comment that if hucards had enough storage, they could have used streaming voices for cinemas using ADPCM and 10bit paired channel output.

That got me thinking, is that really feasible? And the answer is; yes, yes it is. I put out a demo playing two songs. One was 20khz and the other was 33khz, but neither was interrupt driven. So it got me thinking, what kind of acceptable ADPCM playback can I get from timed interrupts? What kind of resource am I looking at? Storage-wise, ADPCM is 4bit per sample. 4bits for a 13bit output is pretty decent IMO (clipped to 10bit for the paired channels).

The mednafen authour wrote the decompressor, and I’ve modified it slightly with a few case optimizations, but otherwise it’s pretty fast. So for 15.3khz (not 15.7khz) output, I’m looking at 50-55% cpu resource. And that’s the normal, non self-modifying, code version. Everything is contained within the VDC interrupt routine, so it’s self managing. That’s always nice because the other option is buffer fill and buffer read, and that gets tricky with timing.

So this soft playback ADPCM streaming sounds great at 20khz, but what does it sound like at 15khz? Hopefully pretty decent. From what I’ve heard in comparison to ADPCM on the CD unit itself, this soft playback routine seems to sound better. It might have to do with how the original ADPCM chip in the PCE CD unit is 10bit output too, but it can clip and overflow rather than saturate into positive or negative amplitudes (i.e. does it clip at 10bit, or 12bit but output 10bit?). Or maybe it’s something else, as in a filtering effect of the PCE audio circuit compared to the ADPCM output circuit of the CD unit.

Typically, CD games use 8khz ADPCM output for sound FX, and sometimes streaming.

So where is all this going? Well, I have a SF2 mapper and a flash card.. and if I reserve 2048k just for streaming audio, I can do a small demo (shmup) with streaming music. I only have 274seconds to work with, if I reserve the lower 512k for the game/demo itself. 274 seconds isn’t a lot, but I can loop tracks. At a minimum, I would need two level tracks and a boss track. Optimally though, I would want a fourth ending track. So something like three 70second tracks and one 64second track. Or whatever. How it’s divided up isn’t really an issue.

I spent yesterday reworking the ADPCM routine into a VDC interrupt routine. I also picked out two levels from two other shooter games of other consoles. The demo is going to be a simple vertical shmup/shooter.  I was toying with the idea of the canyon level of Musha, and the 3D fire level of Axelay, with the Axelay level proceeding the Musha level (kinda makes sense). The graphics won’t be exact, but the effects will be similar. I plan to rip other enemy sprites from verty shmups too, and probably do a different boss for the Musha stage. I have 512k to work with for graphic assets. For both the Axelay 3D level and the Musha canyon stage, I spent quite a bit of time doing calculations for effects as well as redesigning the approach to those effects (with 60fps in mind). It’ll be kinda tight, but I’ve worked with worse.

As for the PCM engines, I did some work on those as well. The first XM player is done and I’ll probably release a very simply demo for it, and then one with a song demo afterwards.

 

But back to Black Tigers ponderings, if you did 7khz ADPCM for voice then that’s 3.5k per second. If you reserved 512k of rom for ADPCM, that gives 150seconds of speech or audio. If you used PSG/chip for music and some sound FX, you could easily put together cinema audio tracks. The silence between speaking or other audio parts, doesn’t need to be stored. Cinemas don’t take a whole lot of resource; I could even do realtime linear interpolation for that 7khz on a 15khz output.

But all this talk about compression, makes me wonder how some other compression schemes out there would sound. Maybe something less cpu resource than ADPCM. Something like range encoding delta PCM via block segments (kinda like the snes).

 

More PCM player stuffs

Posted in Audio on November 16, 2015 by pcedev

The wave conversion tool is up and running and looping support is working flawlessly. It’s forward looping only, but I’m gonna add ping-pong loop support soon. Ping-pong support will be hard coded into the wave, since it’s too much cpu overhead, or work, to change how the PCM frequency driver works.

So the tool outputs a specialized format for the player with a small header containing the loop points. Looping on the player side doesn’t take any additional cpu resource, which is nice.

Did you know that Batman on the PCE does precalculated frequency scaled waves of a single bass guitar instrument, and actually has a loop point section? There’s the attack part and the loop part of the waveform. It means a tiny sample can be made out to be a really long sound. IIRC, there’s 2 octaves which means a total of 24 notes or 24 samples. It gives the bass guitar instrument a nice punch-y sound that the normal PCE channels just quite don’t reach when emulating/modeling it (although they do a good job).

Anyway, more on the player itself. So octaves follow a formula of 2^a, with a being the octave. This means the rate of change is increasing in between octaves. Notes are section of frequencies along with octaves, and they are also part of that 2^a format, except they exist in between octaves ranges. It now becomes something along the lines of 2^(a+(b/12)), with b being the note (ranging from 0 to 11). The frequency difference/distance between each note increases as the frequency increases. It’s not linear.

Since octaves are an exponential function with a base of 2, I can simply binary shift the frequency to get my octave range. Therefore I only need to store 12 note frequencies in a table for once octave, and the rest can be derived from there. But I need more than just notes and octaves; I need to be able to slide a frequency up and down. So I increased the table from 12 notes, with 32 frequency steps between each note – for a total of 384 entries. Still not bad. Not only do I now have frequency sliding control with precision that I can track, but I now have a method of fine tuning as well.

It works like this: O:N:S. O is the octave, N is the note, and S is the step. S ranges from 0 to 31, any carry/borrow gets added/subtracted from N. N ranges from 0 to 11 and carry/borrow affects O. O ranges from 0 to 7, with 3 being a 1:1 or rather no binary shifting. I build the frequency divider from O:N:S number, but I only need to do this when there’s a change in frequency. And when it is performed, it’s pretty fast. There’s no multiplication: only one table fetch, a few shifts, and one addition (finetune). The other nice thing is, all inter-note frequency steps are the same ratio for any octave. So if you do a vibrato effect, it has the same strength if the note is high pitch or low pitch – unlike period based music players that rarely ever compensate for this. Under period based players, this presents a problem if you have an instrument where you have vibrato effect going on and you want to slide (portamento-to-note) to a higher frequency note – that vibrato effect that sounded perfect might sound too extreme at a higher frequency. You would have to compensate by having a function scale back the vibrato strength while going up in frequency, and this would be trial/error (I have yet to see a music driver do this, but it’s doable). So this resolves that issue.

Anyway, the player (driver) is done but I’m building what amounts to a small music engine to demo it off. So that takes time. Plus, I’m trying to modularize this into sections; the driver, interface support, music engine example. I need to keep them all separate so it’s easy to pick and use just what you need (assuming anyone uses this).

 

Do I expect this to revolutionize the PCE sound? No. This is more of a proof of concept. There are definitely strengths and weakness of this approach. Some samples will sound dirty, or unpleasant, so it really depends on the sample itself and I believe that puts a limit on how useful this is. The 5bit resolution of the PCM is also another issue; for some samples it’s fine and for others it can be hiss-y or noisy. Techniques like 1 or 2 point (difference) volume-map to keep noise at a minimum might help, but it really depends on the specific waveform. That said, I think the 6bit PCM player (the other approach) has more promise than this, but this approach, this player, is simpler in both execution and interfacing. I’m also reusing some stuff here for that other driver.

Scaling the input frequency

Posted in Uncategorized on November 11, 2015 by pcedev

So.. I’m writing this wave file converter, adding all kinds of support and options, when I came across the issue of storing multiple samples of the same origin but different octaves. This is an optimization technique to help retain frequency ranges within an instrument.

So to visualize this; the main driver always outputs 7khz. It doesn’t matter what the input frequency is, the output frequency is fixed. Now, to scale a frequency on the fly – the fastest way is to do nearest neighbor scaling. If you think of it as in terms of graphics, a waveform is just a single scanline. That’s it. One scanline. The brightness of a pixel is the amplitude of the waveform. And how does nearest neighbor scaling work on a fixed resolution scanline? You either repeat pixels or skip pixels, all depending if you want to shrink or inflate that image on that single scanline.

Anybody who’s worked in photo editing software, has seen this effect first hand. But it’s even simpler than that; the SNES mode 7. We’re all aware what happens when the snes scales up – it gets blocky. But pay attention to when the image is at a point that is smaller than 1:1. I.e. in shrunken state. The pixels become distorted. This is because the pixels in the shrunken image cannot appear in between the real pixels of 256 resolution. One option, is to increase the horizontal resolution so the steps become finer. The snes obviously doesn’t have this option. But there still comes a point where the image shimmers, as it moves in and out of zoom. That’s where other fancy techniques come into play and interpolate the distance between pixels, and distributes that.. etc.

Audio works much in the same way, but our brain is more forgiving when it comes to sounds than visual data anomalies. So what’s the issue here? How can we solve this?

First, the issue is this: the output frequency of the PCE TIMER driver is 7mhz MAX. If you scale a waveform to play at a frequency below 7khz, then it’ll play just fine with all the data intact (no missing samples). But here’s the catch, you get no resolution benefit for those repeated samples. In other words, you are not working with the optimal frequency band of 7khz. If I scaled a waveform that has a 1:1 rate of 7khz, to 1:2 rate of 3.5khz.. that 7khz main driver gives it zero benefit. Quite the opposite; I’m losing potential frequency resolution output.

Now, this needs to be understood in a larger context. The idea is to avoid issue when the input frequency is higher than the output frequency – anything greater than 1:1 (like 2:1, 4:1, etc). When this happens, you get all sorts of frequency anomalies as well as unintended reflections back into the output (nyquist frequency artifacts). So one approach is to store an instrument sample in multiple octaves. When you move out of that octave range, you switch to a different sample in that group. It cleans up the sound and removes this audio artifacts from the scaling routine. There’s also the added benefit that scaling with nearest neighbor, takes nothing into account. You can be destroying potential frequency ranges simply by skipping <n> amount of samples. If you properly resample, with an external app, you can get those frequency ranges back. Well, to a point – but it’s soo much better than simply skipping samples. Think of this as mip-mapping of textures for 3D graphics. It’s much the same application, although for different reasons.

Ok. So we have this approach that fixes upward scaling (shrinking) by mip-mapping our octave range for a given instrument. The second issue arises now. If all samples in a mip-map range are 1:1, then the difference from (octave+1) to (octave) is the frequency divided by 2. So you work with 7khz from the top and as you go down in notes (notes that approach the octave one step below the current one), the input frequency falls below 7khz. Normally, if the output driver is of high enough frequency, then this wouldn’t be so big of an issue. But 7khz is pretty low as it is, and you definitely want to keep every single HZ of that driver output at such a low rate. Linear interpolation is the derivative of the main function; the distance between two points. [f(x+a)-f(a)]\(x-a). It’s the slope formula. The change in Y over the change in X. In this case, the change in X is one – so (x-a) is redundant. Using the delta symbol, [f(Δx)-f(x)]\Δx.

Doing linear interpolation on the PCE isn’t difficult, but doing it real time is still requires some additional steps. If a sound engine is approaching 20%,30%,.. 50% cpu resource, then you want to save as many cycles as you can. The idea instead, is to encode this one sample interpolation into the wave form itself. Where would you put it? In between two samples that it’s derived from. This automatically doubles the waveform in size, but more importantly double the waveform in size played back on a fixed frequency is the original waveform played back one octave lower (or half its speed). The doesn’t help us directly, but what if the input drive (the frequency scaler) always skipped two samples – regardless of the waveform? Or rather, if you want 1:1 playback of a waveform, you set the input driver to 2:1. Since the frequency of the waveform is double, and the input driver is now default to 2:1 for the top frequency, the original waveform plays back without any artifacts – even though we’re above the nyquist limit (the output driver).

The benefit here, is realized that if the top limit of the input driver is n:1, with n=2, then as n approaches 1… the interpolated samples get played back instead of repeated samples. At 5bit resolution depth, it might not have a huge impact in the output range. But as you approach 8bit resolution, this becomes more significant. 6bit is twice the resolution of 5bit in audio output. That’s a lot.

So anyway, this approach maximizes the output quality of an instrument sample as you work your way through the mip-map set. The only question here now is, what’s better than linear interpolation? What would make a smoother transition from one mip-map sample to another? Maybe if you actually embedded the sample below it into gaps in the sample above it? Of course, those individual samples would have to be resampled separately by themselves, before being inserted back into the “gaps” – else it just because the original again. I’m not sure which method is better. Maybe a blend of the sample below it with the linear interpolation, and have the weight of that blend as it approaches the lower octave. I’d have to do some tests to see if there’s an audio benefit.

This is one of the features I’m working on adding to my wave converter. I was going to do the resampling myself, on the input waveform, but Cool Edit Pro does such a nice job for me. It wins out in sheer lazy-ness factor. I just added the option for linear interpolation or embedded two input wave files.