Archive for April, 2010

Unreal Superhero 3 chiptune

Posted in Uncategorized on April 27, 2010 by pcedev

Yeeeaaaahhhhh! Unreal Superhero 3 for the PCE. The famous chiptune now PCE-fied. This one was a pretty quick and dirty conversion. Not a lot of attention to detail went into it. Anyway, enjoy šŸ˜€

Note: Recorded this one at a lower res to get a better frame rate. Also uses the Amiga 1200 filter (non-LED one). Doesn’t filter/smooth the lows, doesn’t cause aliasing in the highs. Perfect for Milkytracker and PCE stuff.

Satellite One…

Posted in Uncategorized on April 27, 2010 by pcedev

Made a video of it. Been too busy to get any real work done (away from home), so I made this video. I have another one I’ll put up soon (very famous chip tune PCE-fied) šŸ˜€

Oh, and the video’s in HD… for whatever reason.. you’d .. want to watch it in HD >_>

Emulator audio differences..

Posted in Uncategorized on April 25, 2010 by pcedev

In light of recent tests, I decided to take a closer look at different emulator outputs – relative to the real system.

There are a few major things an emulator needs to get right:
– Volume levels. Doubt any emulator is using linear volume levels, but it is important to get the “correct” non linear value system.
– Sampling of the audio reg writes. In a perfect world, every write to an audio reg would register immediately. After all, it does on the real system.
– How to handle Nyquist frequency of down sampling.

From the tests I’ve done:
Volume. Mednafen, Blargg Music player, Yame, TE, and Ootake seem to get this right. Mednafen, Blargg player, and TE definitely do. Magic Engine has some issues (as well as do a lot of other 3rd party HES players).
Register sampling. I was only able to test mednafen, yame, te, and ootake. Well, I tested ME, but didn’t record it (it’s just what you expect). For low frequency stuff, the less accurate ones do fairly fine – relatively speaking. But for higher rate of register updating, only Mednafen and TE were correct. Mednafen supposedly caps pretty high (I don’t remember the rate, but you’ll never hit it). I think TE caps about 32khz. That is, reg updates are pulled twice a scaline (probably divided evenly in half of the scanline. Once at the beginning, once at the end).

Down sampling. All the emulators appear to have some sort of method to deal with this. Blargg uses band limited step synthesis (and mednafen uses Blargg’s blip sound engine IIRC). Yame appears to be doing something of the same. Ootake is doing it too, but it looks too exaggerated. That is, the throws seem too extreme. TE is a bit higher in the throw like Ootake, but not as extreme. But more than mednafen/blargg, ME, and yame.

Now, this is the first test: Here (wave file).
(right click and save. Do not normal click and “stream”. Direct streaming doesn’t work).

This is Jackie Chan HES file. The part in question is a 7khz 5bit sample being played (IIRC track #89 or 90). In order of first to last: Blargg player, Ootake, Blargg down sampled to 4bit (not 5bit), mednafen, Turbo Engine.
A pic here: Here.
Right away, you can see all but 1 of them have the same symmetrical shape. The the positive and negative swings of the are about proportioned. Now, not all audio will look like that – but most will. What we are interested in, is that Ootake is different from the others. This is key. This means Ootake is suppressing the audio on one side of the swing/throw (from peak to trough). This is a repeating pattern with Ootake. Now, it doesn’t look like much. But when playing the ootake recorded part, against the 4bit blargg converted part – the ootake recording sound even more static-y/noisy than the 4bit conversion. That shouldn’t be. That definitely shouldn’t be.

Two pics comparing the band limited step synthesis of Ootake and Blargg/mednafen:
Here and here.
Top one is Ootake. I circled the spots in red, so you can see what I’ve talking about.
Here’s some reading about it:
http://www.fly.net/~ant/bl-synth/
http://www.fly.net/~ant/bl-synth/11.implementation.html
The second link shows how to implement. It’s cost too much calculate this in realtime (or just wasteful), so you build out pre-calculated “steps”. The whole reason for this, is to keep frequencies above your sound card output, of PSG or any audio that’s generated at a higher frequency, from folding/inverting into the opposite(downward) frequency band. Nyquist frequency artifacts.
While I doubt this is a big deal that Ootake seems to have some extreme steps for such low res samples changes, it is interesting to note. If I were to take a logic guess, I would say Ootake is using fewer/coarser steps.

Back to the waveform itself. At 7khz, it shouldn’t be much of a problem to capture those reg updates (writes to $806). You’d have to sample at about 14khz to capture all of them. 15.7khz is the scanline frequency, and most emulators I would think – would probably capture the reg updates once per scanline (giving 15.7khz sampling). More than enough for 7khz TIMER sample playback.

Now to the second recording example. This time, both volume and reg sample rate is extremely important. This is from a rom, so blargg player couldn’t be used at this time. But mednafen substitutes fine. So from first to last, mednafen, ootake, yame, turbo engine.
A little info on this rom. It’s a software ADPCM decoder that outputs a 12bit sample clipped to 10bit (last two bits are dropped). Two DACs paired provide the 10bit linear range. The output rate is a little bit above 33khz. An odd number I now, but it was based on cpu cycle timing.
Wave file here. (again, right click and save as…)
And the rom file: http://alexandria66.2mhost.com/~pcengine//sound/lonely_soldier_boy.pce
And finally, a pic: here.

Again, we can see the same trend in Ootake in this recording. The amplitude of the waveforms are pretty loud this time, so you can see the effect even more now. Ootake is definitely crippling one side of the waveform (doesn’t matter what channel you look at, left or right). Ootake audio actually appears to be inverted from all the other emulators. But this isn’t a problem (it sounds the same). As long as both outputs are both inverted, it’s the same as non inverted. It’s only when one is inverted to the opposite channel, do you get canceling out of the audio frequencies.

When you zoom in, you can see strange artifacts in Ootakes recording. Oddly enough, you can see the same for Yame. Both play static-y because of this, but Yame is a little clearer because it outputs an uncrippled waveform (relatively speaking). But zooming in also reveals something else. It seems all emulators get the volume correct. Because, looking between the artifacts in the recordings, the waveform output is really high and almost correct (if it was placed in its relative place to the rest of the waveform). Not quite correct, but on that scale – damn near enough.

Here’s a zoomed in pic of each part right at the beginning: here.
Top is Ootake, middle is mednafen, bottom is TE.

Looking at the top one, you can see the parts I circled to point out the incorrect position of the waveform. The weird thing is, if you slide the parts inbetween these artifacts up or down, the waveform will be in the correct spot. So more than just missed sample writes ( missed sample writes which probably accounts for some of the coarser points parts in that pic for Ootake). This artifact is also (almost identical) present in Yame.

What does the ADPCM rom test tell us? It shows a continuing trend in Ootake for the incorrect waveform output, but also shows that Yame and Ootake can’t handle higher reg updates. You might think, “well, that’s not really a problem since I’m not doing 32khz+ sample output”. But this effect trickles down. It might not be very audible for 7khz output, but it does effect the “phase” of normal channels. That is, if a specific phasing of two channels is required to get a specific sound (whether this was done on purpose with exact timing, or on accident but you kept it because it sounded good), it can vary depending on when the second (or third or fourth) channel(s) reg gets written to, to start the output of that channel. This is what I was running into in the XM player. Certain phasing effects sounded either completely off or non existent on anything other than the real system or mednafen. To safe guard against this, cache your channel updates to happen ALL at the same time, separate from the parser. I have yet to do this for the XM player, but I will. I most definitely will. Strangely enough, the PCM driver is cached, but that’s to reduce jitter as well as allow VDC INT to operate at the same time šŸ™‚

Conclusion: Mednafen is king. Blargg player is king too. YAME will never be updated and is only used as a reference between these emulators (though it still holds its own compared to Ootake). And Ootake needs some more audio work before I can recommend it. TE, well – it has a few bugs in weird areas of audio, but most of it spot on. ME – ok for games and simplistic HES files I guess and it doesn’t have the major audio issue like I encountered in Ootake, but I can’t really recommend it. It’s not as bad as nesticle for the NES emu scene by comparison, but it’s close (relatively speaking).

If anyone has any different results with Ootake, please post them in the comments section. I’ve love to be able to count on Ootake (the more accurate emus, the merrier) for sound listening (HES or custom roms). I really wasn’t expecting the output I got, so I’m thinking it might have something to do with my setup. If someone can record that ADPCM demo rom from their setup of Ootake and compare it, and post if the problem is present or not. Thanks šŸ™‚

More PCM and MOD stuff

Posted in Uncategorized on April 22, 2010 by pcedev

Some of you already know, that David Michael of Magic Engine made a MOD player for the PCE. No intermediate step for converting the format or anything. You just include the MOD file inside HuC. The player converts the samples over in realtime, etc. Not optimal at all for PCE, for practical purposes. But I’m sure this would have lead to something more PCE friendly.

It was after experimenting and hearing many examples of MODs playing on this PCE player, that I decided I wasn’t going to do a MOD player. The sound was just too poor for most music examples. At first, I thought it was because the samples were only 5bit. So I hacked the engine to play on a 10bit DAC setup. I soft mixed the channels (4x8bit =10bit, so no clipping). Didn’t really hear any difference. Just really slight here and there. So I figured it must be the low frequency of the TIMER irq. That was probably over a year ago or more. Still, in the back of my mind. It nagged at me…

Fast forward to last spring. Digging through PCE sound engines. Looking them in depth, but also looking at how they do sample playback. Some do 4bit samples, most do 5bit. Some do funny stuff like pseudo frequency scaling like Batman (and more, looping certain parts of a waveform to shorten it. I.e. compress the repeating parts for final waveform shaping). One thing I noticed. Crappier samples tend to sound better when something else is playing at the same time. It helps hide the noise of the sample from the low bitrate and playback rate. But a few games were interesting. Some samples were really clear when played by themselves. Something I didn’t expect. Obviously these better than expect sounding samples must have been preprocessed to sound better for the output than others. I know there’s a few techniques for preprocessing and I’ll not get into that in this post, but that got me thinking (as of doing all this work lately).

So I fired up some of these old MOD pce roms that I had made from DM’s huc source yesterday. Testing them out in different emulators because of the problems I’ve been having with certain ones not outputting sound correctly – with stuff I’ve been doing lately. At last I came to YAME. I overclocked the CPU to 21mhz on a whim. Now, normally – this shouldn’t have any effect. Because all the timing is straight off the TIMER IRQ and when I was looking at the code original, it looked fine on the scaler side. That is to say, it didn’t appear to have any issues with “jitter”. Well, I was wrong. Playing those files in yame as that higher speed, removed all the jitter (or almost all of it). I was impressed by the sound. It fixed quite a few issues that these MODs were having. I also noticed something else. The MODs played slower than normal. I don’t think this had to do with the timer code, but more with DM trying to compensate that most MODs use a BPM of 125. That number is derived from PAL vblank of 50hz (50hz x 60 seconds = 3000 / 6ticks = 500 / 4 lines = 125 BPM). Later on they changed the spec so you could independently change the BPM, which made calculating the real BPM a pain in the ass (kind of useless even naming it “bpm”).

Anyway, his MOD player naturally should have been playing all these modes faster. Since PCE is 60hz vblank rate. 120% faster. But they aren’t. They play about the speed of 100BPM. Not even the correct speed, but slower. I haven’t yet re-looked at the code (and probably won’t), but this also might account for some sound issues. Either way, yame at 21mhz playing these MOD files… this was inspiring. No, it doesn’t rival the Amiga, but the sound is fairly decent/acceptable for what it is.

So with that, I’ve decided to modify my existing XM player (which just a MOD/XM FX/pattern compatible player) to do a 4 channel MOD/XM for long samples. Not something that I’d probably use for dev, but for demo purposes or novelty. Just for fun.

More PCM driver stuff

Posted in Uncategorized on April 22, 2010 by pcedev

Here’s a working, assemble-able version. I changed quite a bit and fixed a bunch of bugs (that’s what I get for writing code and not testing it). I also modified the code and made it more flexible. Now, it can loaded anywhere from rom. Address calculation is done as assemble time, so it doesn’t matter where it is in rom – as long you define some designation address in ram (be it a BSS label already defined or a specific address). It still needs a few more things, but as least you can hear all 4 channels (I hard panned them to left/right). And they all play at their own speed.

šŸ™‚

Update: Man, this is getting annoying. Magic Engine, Ootake, and Turbo Engine emulators play this simple rom… incorrectly. ME and TE have a weird speed issue (which should be impossible since this is synced to a TIMER!). Ootake is just grainy. Mednafen, YAME, and real hardware play it fine.

Posted in Uncategorized on April 19, 2010 by pcedev

Added “4 Channel PCM driver” example code to the download and links section. But here’s a quick link to it. It’s a MOD style playback driver. No, this is not the driver I use in my XM player. Those are normal PCE channels and 1 static sample channel. But this is, this is something a bit different. It’s just the driver itself, it still needs the the support code. I’ll put together an example of how to use it.. soon. Warning: Not HuC friendly. Is a bit on the intermediate level side of coding. In the file, there are some explanations of how the code works and how to interface to it. Even if you don’t use it, you can copy the concept or whatever.

Blog structure update..

Posted in Uncategorized on April 19, 2010 by pcedev

Added Download and links page on the side bar. And, you know, I put some stuffs in there.

<o.

Custom XM player WIP

Posted in Uncategorized on April 17, 2010 by pcedev

Yeah, she’s a coming along nicely. Still needs quite a few things (like a new period LUT, the current one is missing finetune for low octaves – which this song uses). Some FX still need implementing, etc. IĀ  sorted out a few bugs in the FX I do have working, and added long sample playback support, so I figured I throw up a WIP vid/example.

What is it? It’s an XM player for the PCE. I have a converter that converts XM and MOD files into more PCE friendly format (a reformat), but all the FXs and such are still there. What is XM format? Go look up MilkyTracker

šŸ™‚

Oh and btw – the song is: HuC6280 on Fire! by Louis Gorenfeld. Thanks to Louis for providing a PCE legal MOD. <o.

A new sprite and tile mode for the PCE

Posted in Uncategorized on April 16, 2010 by pcedev

Yup. I’ve mentioned this to Charles in IRC a couple of times in passing, but in context of a different use. This requires no hsync interrupt correction logic and such, either.

The standard method:Ā  Normally the PCE build sprites from 16×16 cells. It also builds BGs from 8×8 cells. Now, the 16×16 cell is nice – but sometimes it’s too much in certain situations and even causes un-necessary flicker and… even waste a little bit of vram memory. 16×16 can be too much when all you need is 16×8 or 8×8. The rest of the invisible data in the sprite, is still contributing to sprite overflow (normal shown as either flicker or just “blank out”).

The premise: Get the PCE to display sprites in cells of 16×8 instead of 16×16. As a side result, tiles will now also be 8×4 instead of 8×8. For BGs, this increase isn’t as important as sprite cells but there are some advantages:

  • More color definition in an 8×8 area. Since you have 8×4 tilemap entries, you have up to 31(30) colors per 8×8 area instead of the normal 16(15). For the very obvious reason that the tile right below the above 8×4 tile can have a different subpalette attached to it.
  • Smaller tiles means you can exploit “patterns” easier and save tile space. This isn’t a huge benefit as 8×8 is pretty capable of doing what you need most of the time, but this added effect is practical in the right hands or situations. Especially when you have repeated part of patterns, that you necessarily don’t want to appear as reused – but as unique. Late generation 16bit titles tried to expand on this idea (not the 8×4 thing, but reusing tiles in ways that don’t appear to be strictly patten limited).

For sprites, I think this would be self evident for anyone that’s worked with them – relative to sprite scanline limit. So I’ll not bother really listing the benefits of a 16×8 cell over a 16×16.

The method: Now comes the gory details. Yeah, you’ll love them. Actually, it’s not that complex but we need a little understanding about how the VDC and VCE work together. The VDC is the heart of the graphics for the PCE. It pretty much does everything; subpalette association, building the BG, building the sprites, etc. The VCE doesn’t do as much. It holds the external color ram (the RGB values for the VDC output putted pixels), it builds the NTSC frame works – timings and such, and also sets the resolution of the display (outputs the clock source that the VDC runs on).

Now, the VDC is perfectly capable of doing ALL of this on its own. And it does… on the PCE arcade boards (like Bloody Wolf). So it’s capable of driving its own display frame (even wildy non standard NTSC stuff), But the VDC is also capable of running in subordinate mode. That is to say, it takes queue from something external instead of generating it itself. This means the VDC could work in conjunction of a series of graphics chips (including other VDCs). This is called input mode.

The VDC has this weird behavior in that, when in input mode – it can still define its “own” frame. Well maybe not weird, but it can set the number of scanlines from 242 all the way down to 1, in steps of 1 scanline. It can also define the horizontal resolution by “clipping”. But this is in steps of 8 pixels. I believe you can have a horizontal width of just 8 pixels (and you can center this too). It’s nice. You could definite a 232 or 224 wide screen and not have to worry about clipping the sprites or background yourself. This also has a nice effect that sprites take up relative more “space”. And this is perceived as just that. ‘Cause you know, any “frame” filled with sprites… is a frame filled with sprites. Regardless of it’s width. I’m surprised developers didn’t take advantage of this psychological perception. Oh well, back on topic.

So here comes the weird part. The VCE drives the sync signals. Those would be hsync and vsync. These sync signals are monitored or triggered on the VDC side… because it’s input mode. Normally, you would define the display frame parameters to match that of the VCE resolution mode. But an interesting thing happens when you don’t. Say you define a horizontal width that’s larger than the VCE frame. Guess what? The frame gets terminated because the VCE generated hsync and the VDC “obeyed”. Now, what happens if you define a frame horizontal width that’s smaller than the VCE timings for a scanline width? The VDC will generate an interrupt when this internal frame ends, start processing the next scanline. But… the VCE is still outputting the same scanline! Bingo. Somethings got to happen. And that something is the VCE eventually generating hsync and the VDC obeying. So you have the VDC stopping in the middle of this scanline that it started generating, so it starts the next scanline.

Ok, but how does this help us? For starters, that means a whole scanline of sprites and BG data will now be skipped instead of being display. Actually, that’s wrong. If the VDC scanline ends early enough and there’s room on the VCE scanline, you’ll see the “next” scanline being drawn on the same scanline. Ever tried out those funny cheat passcodes in Coryoon? The one that displays 4 frames on screen? That’s what it’s doing.

For our purposes, we don’t want to display this “next” scanline. BUT, we do want the VDC to engage and give the internal mechanisms enough time to increment the logic for looking at the next scanline for tile and sprites. So we created a frame defined that just starts to do this, then bam – the VCE forces it to start the scanline mechanism again.

This results in every other scanline being skipped. This is literally scaling the whole display down by 2. But don’t think of it as such. You need to think of the display now in terms of being “interleaved”. Still 240p 60fps mind you. This isn’t interlaced stuff. But interleaved. What does that mean? Well, since the display is skipping ever other scanline – it will display either an odd or even interleaved version of this display allllll depending on the very last bit of the vertical position of the map Y position register and the SATB’s sprite Y position register. If you kept it at zero, you would only display one interleaved frame now matter where the sprites were put and how the BG scrolled. The VDC will need a new frame vertical length number now. Luckily the VDC can hold up to a value of 512 (and actually build a 512 scanline display, but that’s another topic). So this will need to be set accordingly, or you’ll get two vertical frames in a single window (split screen). We’re not after that effect in this instance, so we won’t be needing that. So if the VDC was set to 224 displayable lines before, it will need to be set to 448. Etc.

If you’ve followed along fairly closely, you might be asking yourself – won’t that waste vram? Because it’s only showing every other row of pixels. The short answer is: no. Not at all. But the long answer requires you to now think of vram layout as much more complex. Now everything in vram is interleaved as well (well almost everything). Tile data is now interleaved. Sprite data is now interleaved. Hell, even the tilemap is now interleaved. And since only corresponding interleaved data will be display, you can use the opposite interleaved data to store your other tiles/sprites.

Some more details..

You’ll have to make the tile map taller to compensate the 4 pixel tall rows.Ā  A height of 64 would do nicely. That interleaves to two 32 height maps. And 32 is enough to handle any vertical scrolling needs. The complex part comes when you have a screen that’s multiple of 32 tiles wide. Say 64 or 128. I mean, those are already interleaved to begin with and now you’ve just added this whole new interleaved format ontop of that. Needless to say, you’ll have to write ALL new map updating routines :O

Oh come on, it’s not that hard šŸ˜‰

The sprite table on the other hand, is less complex. A simple treating the bit #0 of the Y register as the “bank select” of which interleaved sprite data to show. And that’s on a per sprite table entry level. You can mix and match sprites from different interleaved areas of vram, in a single screen instance. To avoid artifacts, you’ll have to scroll the sprites on the Y direct in multiples of 2, not 1. But that’s not difficult and sprites have a 512 value system for Y.

How convenient for us šŸ™‚

Of course sprite cell organization will be effect by this. As if it wasn’t already complex for some sprite sizes, this will make your head hurt even more. Yay! Sprite size NN x 64 never really had a good use. I mean, the sprite table even with 64 entries, was enough to handle most engines that did 32×16 and 32×32 size. But since we’ve now halved our base sprite cell vertically, that NNx64 mode is looking more useful. But (always a but), ever deal with setting up sprites in that format? Talk about alignment issues. Very strict on where the start (offsets) of that type of pattern is in vram. And now we’ve just added interleav..ation on top of that. Another YAY! šŸ˜€

What? Were you expecting of getting something for nothing? All good FX come at a price. I think, given what you gain, the negatives are just complexities of layouts and not something more severe. Maybe I should list the plus sides? Well, one of them is that you don’t need an hsync interrupt system to change -blah-blah parameters every scanline. It’s done for you. And it’s already compatible with hsync scrolling/fx. Secondly, if you really, really, really disliked having 8×4 tiles, interleaved at that, and additional interleaved layout on the tilemap – you could just do an hsync interrupt to correct this.. on EVERY scanline! Yeah, I wouldn’t want that overhead. Imagine 224 interrupts. No thanks.

But, you can turn “interleaved mode” on and off during the active display. Say, if you had a status window at the top or bottom of the screen. Like you would normally do, you turn sprites off or on for clipping in that area, use a reserved part of the tilemap for that window, sew any seams for vertical scrolling like a good boy in the non window area (like most games like that do on the PCE). You just make sure to have an interrupt right before the needed change from interleaved to normal, or vice versa, and change the horizontal regs. Don’t worry, they’ll love you for it šŸ™‚

I’m not sure if I’m forgetting anything. Did I mention sprite cells of 16×8!? Much better optimization for flicker. Man, gotta love that. Not just specific to any game type either. That’s universal appeal. :3 You don’t even know how many times have 8 tall sprite cells have helped Genesis games. It’s just that much big of a difference for optimization.

Of course, this is an advanced skill level effect. So I don’t expect everybody to replicate this. Hell maybe even no one. I dunno. I mean, to design a game around this type of setup. But you demo boys, up to the challenge?

Now, take this effect and apply it to the SGX ._. You could do it “per” VDC. You’re not limited to doing it to both.

TG16 or Turbo Duo dev kit?

Posted in Uncategorized on April 14, 2010 by pcedev

Marshallh passed on this bit of news to me, that was posted on a non gaming forum…

“I have been trying to get my cousin to let go of his NEC turbographx dev kit, which is the only one I know of in existence, he worked atNEC/turbotechnologie s back in the late 80s early 90s, I think that would bring in some serious bucks as well as a ton of unfinished projects that are in it as well as floppy”

Interesting. I wonder who will snatch this up and if they’ll release any info about it. “Guy’s” cousin worked for NEC/TTi BITD, eh? I wouldn’t mind finding out who his cousin is and if he’d like to share any stories and/or experiences from there.