A new sprite and tile mode for the PCE

Yup. I’ve mentioned this to Charles in IRC a couple of times in passing, but in context of a different use. This requires no hsync interrupt correction logic and such, either.

The standard method:Β  Normally the PCE build sprites from 16×16 cells. It also builds BGs from 8×8 cells. Now, the 16×16 cell is nice – but sometimes it’s too much in certain situations and even causes un-necessary flicker and… even waste a little bit of vram memory. 16×16 can be too much when all you need is 16×8 or 8×8. The rest of the invisible data in the sprite, is still contributing to sprite overflow (normal shown as either flicker or just “blank out”).

The premise: Get the PCE to display sprites in cells of 16×8 instead of 16×16. As a side result, tiles will now also be 8×4 instead of 8×8. For BGs, this increase isn’t as important as sprite cells but there are some advantages:

  • More color definition in an 8×8 area. Since you have 8×4 tilemap entries, you have up to 31(30) colors per 8×8 area instead of the normal 16(15). For the very obvious reason that the tile right below the above 8×4 tile can have a different subpalette attached to it.
  • Smaller tiles means you can exploit “patterns” easier and save tile space. This isn’t a huge benefit as 8×8 is pretty capable of doing what you need most of the time, but this added effect is practical in the right hands or situations. Especially when you have repeated part of patterns, that you necessarily don’t want to appear as reused – but as unique. Late generation 16bit titles tried to expand on this idea (not the 8×4 thing, but reusing tiles in ways that don’t appear to be strictly patten limited).

For sprites, I think this would be self evident for anyone that’s worked with them – relative to sprite scanline limit. So I’ll not bother really listing the benefits of a 16×8 cell over a 16×16.

The method: Now comes the gory details. Yeah, you’ll love them. Actually, it’s not that complex but we need a little understanding about how the VDC and VCE work together. The VDC is the heart of the graphics for the PCE. It pretty much does everything; subpalette association, building the BG, building the sprites, etc. The VCE doesn’t do as much. It holds the external color ram (the RGB values for the VDC output putted pixels), it builds the NTSC frame works – timings and such, and also sets the resolution of the display (outputs the clock source that the VDC runs on).

Now, the VDC is perfectly capable of doing ALL of this on its own. And it does… on the PCE arcade boards (like Bloody Wolf). So it’s capable of driving its own display frame (even wildy non standard NTSC stuff), But the VDC is also capable of running in subordinate mode. That is to say, it takes queue from something external instead of generating it itself. This means the VDC could work in conjunction of a series of graphics chips (including other VDCs). This is called input mode.

The VDC has this weird behavior in that, when in input mode – it can still define its “own” frame. Well maybe not weird, but it can set the number of scanlines from 242 all the way down to 1, in steps of 1 scanline. It can also define the horizontal resolution by “clipping”. But this is in steps of 8 pixels. I believe you can have a horizontal width of just 8 pixels (and you can center this too). It’s nice. You could definite a 232 or 224 wide screen and not have to worry about clipping the sprites or background yourself. This also has a nice effect that sprites take up relative more “space”. And this is perceived as just that. ‘Cause you know, any “frame” filled with sprites… is a frame filled with sprites. Regardless of it’s width. I’m surprised developers didn’t take advantage of this psychological perception. Oh well, back on topic.

So here comes the weird part. The VCE drives the sync signals. Those would be hsync and vsync. These sync signals are monitored or triggered on the VDC side… because it’s input mode. Normally, you would define the display frame parameters to match that of the VCE resolution mode. But an interesting thing happens when you don’t. Say you define a horizontal width that’s larger than the VCE frame. Guess what? The frame gets terminated because the VCE generated hsync and the VDC “obeyed”. Now, what happens if you define a frame horizontal width that’s smaller than the VCE timings for a scanline width? The VDC will generate an interrupt when this internal frame ends, start processing the next scanline. But… the VCE is still outputting the same scanline! Bingo. Somethings got to happen. And that something is the VCE eventually generating hsync and the VDC obeying. So you have the VDC stopping in the middle of this scanline that it started generating, so it starts the next scanline.

Ok, but how does this help us? For starters, that means a whole scanline of sprites and BG data will now be skipped instead of being display. Actually, that’s wrong. If the VDC scanline ends early enough and there’s room on the VCE scanline, you’ll see the “next” scanline being drawn on the same scanline. Ever tried out those funny cheat passcodes in Coryoon? The one that displays 4 frames on screen? That’s what it’s doing.

For our purposes, we don’t want to display this “next” scanline. BUT, we do want the VDC to engage and give the internal mechanisms enough time to increment the logic for looking at the next scanline for tile and sprites. So we created a frame defined that just starts to do this, then bam – the VCE forces it to start the scanline mechanism again.

This results in every other scanline being skipped. This is literally scaling the whole display down by 2. But don’t think of it as such. You need to think of the display now in terms of being “interleaved”. Still 240p 60fps mind you. This isn’t interlaced stuff. But interleaved. What does that mean? Well, since the display is skipping ever other scanline – it will display either an odd or even interleaved version of this display allllll depending on the very last bit of the vertical position of the map Y position register and the SATB’s sprite Y position register. If you kept it at zero, you would only display one interleaved frame now matter where the sprites were put and how the BG scrolled. The VDC will need a new frame vertical length number now. Luckily the VDC can hold up to a value of 512 (and actually build a 512 scanline display, but that’s another topic). So this will need to be set accordingly, or you’ll get two vertical frames in a single window (split screen). We’re not after that effect in this instance, so we won’t be needing that. So if the VDC was set to 224 displayable lines before, it will need to be set to 448. Etc.

If you’ve followed along fairly closely, you might be asking yourself – won’t that waste vram? Because it’s only showing every other row of pixels. The short answer is: no. Not at all. But the long answer requires you to now think of vram layout as much more complex. Now everything in vram is interleaved as well (well almost everything). Tile data is now interleaved. Sprite data is now interleaved. Hell, even the tilemap is now interleaved. And since only corresponding interleaved data will be display, you can use the opposite interleaved data to store your other tiles/sprites.

Some more details..

You’ll have to make the tile map taller to compensate the 4 pixel tall rows.Β  A height of 64 would do nicely. That interleaves to two 32 height maps. And 32 is enough to handle any vertical scrolling needs. The complex part comes when you have a screen that’s multiple of 32 tiles wide. Say 64 or 128. I mean, those are already interleaved to begin with and now you’ve just added this whole new interleaved format ontop of that. Needless to say, you’ll have to write ALL new map updating routines :O

Oh come on, it’s not that hard πŸ˜‰

The sprite table on the other hand, is less complex. A simple treating the bit #0 of the Y register as the “bank select” of which interleaved sprite data to show. And that’s on a per sprite table entry level. You can mix and match sprites from different interleaved areas of vram, in a single screen instance. To avoid artifacts, you’ll have to scroll the sprites on the Y direct in multiples of 2, not 1. But that’s not difficult and sprites have a 512 value system for Y.

How convenient for us πŸ™‚

Of course sprite cell organization will be effect by this. As if it wasn’t already complex for some sprite sizes, this will make your head hurt even more. Yay! Sprite size NN x 64 never really had a good use. I mean, the sprite table even with 64 entries, was enough to handle most engines that did 32×16 and 32×32 size. But since we’ve now halved our base sprite cell vertically, that NNx64 mode is looking more useful. But (always a but), ever deal with setting up sprites in that format? Talk about alignment issues. Very strict on where the start (offsets) of that type of pattern is in vram. And now we’ve just added interleav..ation on top of that. Another YAY! πŸ˜€

What? Were you expecting of getting something for nothing? All good FX come at a price. I think, given what you gain, the negatives are just complexities of layouts and not something more severe. Maybe I should list the plus sides? Well, one of them is that you don’t need an hsync interrupt system to change -blah-blah parameters every scanline. It’s done for you. And it’s already compatible with hsync scrolling/fx. Secondly, if you really, really, really disliked having 8×4 tiles, interleaved at that, and additional interleaved layout on the tilemap – you could just do an hsync interrupt to correct this.. on EVERY scanline! Yeah, I wouldn’t want that overhead. Imagine 224 interrupts. No thanks.

But, you can turn “interleaved mode” on and off during the active display. Say, if you had a status window at the top or bottom of the screen. Like you would normally do, you turn sprites off or on for clipping in that area, use a reserved part of the tilemap for that window, sew any seams for vertical scrolling like a good boy in the non window area (like most games like that do on the PCE). You just make sure to have an interrupt right before the needed change from interleaved to normal, or vice versa, and change the horizontal regs. Don’t worry, they’ll love you for it πŸ™‚

I’m not sure if I’m forgetting anything. Did I mention sprite cells of 16×8!? Much better optimization for flicker. Man, gotta love that. Not just specific to any game type either. That’s universal appeal. :3 You don’t even know how many times have 8 tall sprite cells have helped Genesis games. It’s just that much big of a difference for optimization.

Of course, this is an advanced skill level effect. So I don’t expect everybody to replicate this. Hell maybe even no one. I dunno. I mean, to design a game around this type of setup. But you demo boys, up to the challenge?

Now, take this effect and apply it to the SGX ._. You could do it “per” VDC. You’re not limited to doing it to both.


One Response to “A new sprite and tile mode for the PCE”

  1. Hmm. Maybe I could adapt one of the NES emulation engines to this. I would have to do an hsync interrupt every 2 VDC scanlines to straighten out the tilemap back to 8×8. Not sure if that, would be worth the 16×8 advantage to remove some flicker in some situations or to even free up half the sprite vram area. I could stick tiles in there, for games that use 8×8 sprite bank as tiles (on those rare occasions). I already do this for 8×16 sprite based games (duplicate tiles into the sprite bank since 8×16 sprites take up less vram to emulate of the NES).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: