Archive for the Uncategorized Category

SHMUP suite musings

Posted in homebrew, Uncategorized on April 7, 2016 by pcedev

I have two issues. One, is that I don’t want to have to learn a GUI setup for windows in order to make my utilities. This probably the completely wrong approach to a solution; reinventing the wheel. But I’m soo used to it. Ok, here’s how I’ve made my GUI-ish apps in the past: it’s all internal. My GUI needs have always been simple. I just need a few buttons, and a display windows. Nothing fancy, nothing much. I currently just create the buttons and layouts manually. Yes, that means providing the X/Y coords and which functions they correspond to. I’ve decided that if I’m not going to actually use an existing GUI toolset, I should at least create my own toolset for creating GUI frames. Duh! So I’m doing just that. Just a simply visual editor for placing buttons and display windows. That gets compiled into a text define file which I can include into my code and just associate functions via pointers, etc. This probably makes some people cringe, but I think it’s fun.

Secondly, enemy  pattern AI. Well, AI might be a strong word here. Reactive might be a better word for it. But that aside, simple dumb enemy pattern movement.. how do I make this flexible enough that people using the suite don’t have to rely on a list of fixed pattern movements? I’ve taken a cue from the FM playbook. Enemies can be defined by phases. Inside each phase, there are three operators. These operators are a type of behavior; sine, cos, etc. You can assign where the output of the operator is going, X or Y coords, or as a modulator for another operator. Each operator also has assigned attributes like frequency and scale control. The phase has two attributes: time and reaction. The time parameter is self explanatory (how long the current phase lasts), the reactive parameter is a little different. It could something as if the enemy gets shot (doesn’t die in one hit or does and spawns a new objects, etc), or if the enemy gets close enough to the player – change phase. This might sound expensive on processing time wise, but really it’s not. Of course there will always be predefined patterns designers can default to (familiar enemy patterns), to keep resource lighter for the bulk of the level. It’s all about balancing where the ‘spice’ is anyways.

I’ve been playing around with other idea of how to emulate the PCE video and audio in the simulator. This is easy enough, but it’s tied directly to the game engine and not the real PCE. So I’m not worried about accurate emulation – just enough simulation for the game engine. The audio part is going to be tricky for me, as I’ve not had a whole lot of experience with it, but I’m sure from the little that I have done on the PC – it’ll be fine. Again, not really emulating the PCE’s audio as I am simulating it for the chiptune and SFX engine. In my opinion, that’s pretty big difference (and in favor for my side – lol).

Ever wanted to make a game for the PCE?

Posted in Uncategorized on April 4, 2016 by pcedev

Sorry for the lack of updates. This semester is a bit tough (two big credit tough classes pushes me above normal fulltime enrollment).

On top of that, I found out that I’m forced to take a computer science course this summer because my school’s CS program sucks (not really). You can’t take two introductory courses at the same time, so what I could get done in one semester, turns out to be a full year.. and I can’t take any other CS related courses until these two are met, which opens up two more, which opens up three more, which opens up all the rest… wooh! I’m supposed to be transferring into this university as a junior, yet I’m behind because of this. Apparently most other fields/majors don’t have this problem.

The problem is not time, but actually money. I have enough aid left for two years and one semester (and not full time for this last semester either). So basically, taking this summer course helps meet that goal, but is going to financially kill me (summer classes are expensive; $1750 for this one class). Apply for scholarships you say? Yeah, I’m doing that but that’s never a guarantee. I actually did receive a small transfer scholarship, but that’s already factored in. And the requirement for that is I take 15 credits a semester (12 is full time).

So what options do I have? Well, one option would be to do a crowd funding project. My brother is making his own pencils and pens from titanium and other metals. His campaign already hit over $10k. I know a little bit about machining, and could learn to use his CNC setups, but probably wouldn’t be as nice as his stuffs (he has more experience with designs).

The other option I thought of, and have had kicking around in the back of my mind, is a PCE project. The draw back to doing a PCE project, is that I’m neither an artist or musician. It’s not that I don’t have some talent and capabilities in either of these two areas – it’s that I’m not up to snuff compared to people that specialize in these areas. I can copy/emulate these two areas on my own – but they will never be on the level of dedicated people in these fields. There are a billion and one shmups for the PCE, and I want to add one more – lol. This has always been a goal of mine. I definitely have the capability to do a great technical shmup on the PCE. But that’s not enough to make a good game.

So why not just make the damn shump already? Well, I kinda have. The problem though, is time. Making a game engine is easy. It’s one of the easiest parts. But the real work is polishing a game until it’s fun to play, etc. 10% of the work is making the game, 90% of the work is polishing and tweaking it. Ever play a technically impressive game, but you’d rather play something that’s not as fancy simply because it’s soo much more fun? Well, I sure have. Homebrew community tends to be more forgiving, but I hold myself to the same standards at the commercial softs of the system’s era. I.e. I will always find fault in the presentation of my own stuff.

Where all this rambling is going.. PC Engine Shmup Maker. I would create the game engine. I would create a suite of tools to create a shump for this game engine. You wouldn’t need to know how to code in assembly, or small C, in order to create an awesome shmup for the PCE. You would get the benefits of speed and flexibility, like a game created from scratch. I’ve been researching the most popular features of shmups from that era (16bit), and have been working on design implementations of those features. For this first project, the shmup will be vertical scrolling. There will be a whole range of effects capable on a stage by stage basis, as well as hsync effects and dynamic tiles, split screen scrolling, BG used as objects for large enemies or bosses, enemy and horde AI behavior, etc. The engine would be one giant script interpreter. This is how I’ve made my engines in the past (makes development soo much easier). The tools will be gui/mouse based, but create files that content text with definitions and scripting language structure (which means people can create new/additional tools as well). The tool set will also have a WYSIWYG approach for designing the levels; you can simulate the play through of the level without having to constantly export to rom format. There will be chip tune engine, with script format (and probably a visual editor as well), SFX editor, sample support, etc.

So this is the kickstarter or whatever crowd funding project I want to make. This, to me, is much easier to do than the time it takes to polish a game, let alone rely on other people on a team to get assets in time. I want to give back to the community and I’ve always wanted fans to be involved in making stuff for the PCE – what better way than this!? I’m sure many of you that can’t code for the PCE have always wanted to make a game for the PCE. Because this would be a crowd funding project, I’d break it down into free version and licensed version (sell your own game, etc). The free version will definitely be capable and you can freely distribute your work with the community. The licensed version will be for those who wish to sell you game. Have you guys seen the recent advances in repro hucards? Totally doable. I’ll have support for SF2 mapper as well, for those that want it. What about CD? That will probably be a goal marker thing – hucard development first.The complexity of the engine AND the toolset capability will be broken down into goal sets. Basically because it takes longer to implement more features and/or more advanced features.

I’ll be working up the details and prototyping some stuff to show. I want to open the crowd funding thing around the end of May. If you’re excited about this kind of project/idea, please share this post. I’d like to get feedback from the community.


VDMA tests

Posted in Uncategorized on January 3, 2016 by pcedev

Ok, so I confirmed what others have tested: vdma is somewhere between 81-85 WORDs per scanline in 5.37mhz mode. So in 10.74mhz mode, that’s 330+ bytes per scanline. The more you clipped active display, the more bandwidth you get for vdma. I did a 209 line display and was able to transfer 17.6+ kbytes during vblank. This is perfect for the bitmap mode and all that I’ve talked about in that regards (free buffer clear, free transfer for double buffer system, etc).

Semester has ended.. and I did some PCE timing tests today

Posted in Uncategorized on December 17, 2015 by pcedev

Some bad news for the TSB/TRB instructions for VDC port $0003; it’s damn slow. It’s not the instruction, but apparently there’s a delay for when the VDC switches from reading to writing vram back to back. I.e. doing a bunch of TRB $0003 to increment the vram point is going to block by 6.5 cycles more than the expected 9 cycles. This is even the case of back to back LDA $0003 STA $0003 instructions – the processor is stalled by the VDC and ends up being ~15.6 cycles instead of 12 for the single pair.  Of course, this was only tested at lowest resolution. If the overhead should be half that for high res. I’ll need to test for that. I still have more vdc read/write setups to test.

PCE bitmap mode

Posted in Uncategorized on December 10, 2015 by pcedev

I’ve talked about the PCE bitmap mode, albeit vaguely, in the past. It’s a 128x128x16 color mode. It’s a linear bitmap mode, which makes it fast for certain types of effects or operations. The SGX extends this to 128x128x241 colors. But I was thinking this morning, is it possible to push this limit. I approach with with a dithering angle; if 256 tiles is all that is needed to show 16 colors, but surely 1024 tiles could be used as extended dither patterns of those same colors. The limit is near 240 colors, but PCE doesn’t have enough room in vram for that (technically it does, but it would need column scrolling which it doesn’t have – so it’s the long way round by wasting memory).

Then I decided to flip the problem on it’s head. In the SGX approach, each VDC pixel is 16colors of that tile along with one of 16 color subpalette. In the PCE method, each tile (or BAT entry) is two pixels. Normally, each pixel has a max value of 16 colors. Together they don’t exceed the limit of 16 colors – in any combination. And in 16 colors, I can’t just apply different subpalettes to the whole tile or bat entry, because both pixels would be affected. Technically I could, but I would need a lot of subpalettes to achieve this goal. More than what the PCE has.

So, I thought about what if I reduced the color count per pixel in the tile/bat to 8 colors each. I still can only show two different pixels, but now I have an easier set of differences to work with. The upper 4bit of the tile, the pixel to the left of the pair, would be colors 8-15. And the lower 4bits of the tile number, the pixel to the right of the pair, would be colors 0-7. I can now setup the subpalettes in groups of 4. Four for the right pixel, and four for the left pixel. This combination gives a total of 32 independent colors per pixel (technically 29 because of color #0).

What it gains in colors, it loses in speed. The 16 color method is fast because each pixel is a nybble. In this higher color method, you now have 5bit pixels; 3bit for the index and 2bits for palette. It’s still faster than using planar graphics AND having to deal with tile segment boundaries. And to top it off, you can still add dithering back into it. The indexes go back to 4bit, with the high bit selecting a dither pattern of C and C-1.. within that 8 color subpalette. It’s more convoluted in that colors need to be setup appropriately in the subpalette. Or, and this is more specific to certain applications, the dithering could be not from the subpixel group itself, but across the two paired pixels. If you visualize this, the 128 pixels going across the screen, each pixel is actually 4 highres pixels from 512px graphics mode. So blending two pixel colors across 8 highres pixels should provide a nice gradient effect – although strictly horizontal in application. The blending would happen by setting the 8 bit (of 0-7 for the tile index) of the BAT entry (giving both C/C-1 option and Blend mode switchable option by high bit selection). I guess you could call it blend mode, as one color blends into the next for that pair. And the second set of tiles would be this fixed pattern.

Wolfenstein 3D on the PC-Engine

Posted in Uncategorized on November 25, 2015 by pcedev

The SNES has it. The Genesis now has it. But can it be done on the PCE? Or rather, a game based on that style of engine? The glorious answer is: yes.

If you look at this logically, the PCE has enough cpu resource to pull something off along those lines. But the devil is alwaaayyyss in the details. As the SNES proves, it’s not so much about the cpu resource as it is the support format. In this case, for the SNES, mode 7 allows byte pixel format (256 colors per pixel). Packed pixel format, to be more precise. The Genesis does as well, but that format is 4bits (16colors).

So what’s the PCE got? Planar graphics. Planar graphics is not Wolf 3D engine friendly. Ahh, but there’s a trick you can do. There’s always a trick, right? You see, it’s possible to setup the PCE in such a way that the tilemap becomes a bitmap display. The quick and dirty details: set the PCE res to 512×224. That’s roughly 64 tilemap entries wide. That’s a bit low, so what if a single byte in the lower LSB of the tilemap could be the index to two pixels? Two 4bit pixels to be exact. Now you have 128 “pixels” in that 64 wide tilemap screen/res. You need a total of 256 tiles in vram to show these pixels. Not bad, not bad at all. But there’s more to it.

A PCE tilemap has the largest size of 128×64. But it can be rearranged, assuming no scrolling, to a layout of 64×128. Not remember what I said above about the 64 tilemap entries equating to double the pixels? That means a 64×128 map gives a linear bitmap of 128×128. The 128 pixels wide are actually double wide, so they will fill the screen.

So now we have a linear bitmap, but zero room for double buffering. That sucks, because no one likes screen tearing. There are a few ways around this, but one very convenient way around this is the vram-vram DMA on the VDC side. According to some tests by another coder, which I haven’t varied yet, if you set the VDC in high res mode during vblank – the VDMA will transfer ~330 bytes per scanline. The understanding is that the VDC is transfer two words per 8 pixel dot clock. I still need to verify this, but if this is true – that means not only would you not need to keep a buffer in local ram, but you also don’t need to waste cpu cycles clearing that buffer or parts of it (render empty pixels to clear sections). Vram also provides a self incrementing pointer. With this kind of bitmap, this means you could do both vertical and horizontal sequential writing. This speeds up writing sequential data. Just to note, SATB vram DMA is 84 words per scanline in low res mode ( a little bit over 3 scanlines) – so it’s reasonable to think that it would be the same for vram-vram DMA as well (84 words is 168bytes in low res per scanline, and 336 in high res mode per scanline).

So now we have a fast bitmap and free clear routine and not need to transfer local to vram buffer. Now 2D raytracing is simple in design, but you still have an issue of pixel fill. You have to read a sliver of bitmap, at as specific scaled factor, and copy it to a pixel column of the bitmap. This is going to dictate the amount of cpu resource to draw the 3D view. The fastest method to write data to vram is embedded opcodes. But that immediate doubles the data in size. Looking at Wolf 3D as an example, the textures are 64×64 pixels. If you embedded them as opcodes, you would still need to have 32 pre-scaled vertical images. 64x64x2x32 is 256k of memory. Doesn’t leave a whole lot of room for textures, plus if the window height is 128 – you need 64 versions not 32 versions. To make matters worse, on a fresh/clean bitmap the first pass of pixels simply writes – but since this a double nibble format the second column write needs to be OR’d against the previous data. Doing that as embedded code against all the scaled frames.. is going to be a horrendous storage requirement.

The normal approach, is just to have the bitmap stored as normal bitmap data (nibble stored in byte format). So take a slower approach but with prescaled data. This is still going to be a fair amount of data, and it’s going to be slower. So what can be do to speed this up, but be reasonable when it comes to storage space? When in doubt, flip the problem on its head. What if the data, the bitmap texture, remained fixed in size… but the code that read it was different depending on the output size needed? Let that sink in…

If the bitmap data is aligned to specific bank boundaries and offsets, then you can create a series of pre-calculated code paths that look to read that data from the logical address (bank mapped) and write it directly to vram. No indexing needed. No indirection needed. Simple LDA addr / sta $0002 / st2 #Fade. There’s no looping. There’s no check system for skipping a read (pixel). Everything is hard coded as specific paths. It’s brilliant. No, I’m not the first to think of this idea (pre-calculated code paths), but it did occur to me that it would really benefit from this rendering style engine. Yes, the code is going to bloat in size, but now I can store lots of different textures in rom.

The catch here, is that you need two sets of code paths and two bitmaps. One bitmap has the nibble stored to the left side of the byte (bits 4-7), and the other bitmap has the nibble stored on the right side of the byte (bits 0-3). The reason for this is on the first past, the pixel data is just written as is to vram (bitmap buffer). All even columns of pixels are like this – nice and fast. But the odd columns need to OR together the second nibble with the even column. This is only +3 more cycles, thanks to the TSB opcode (TSB $0002). So the average is just +1.5 pixel overhead. That’s still really good. Not only do I have sequential access, vertical in this case, but I also have a fast means of READ-MODIFY-WRITE operations.

Did you notice that ST2 #fade opcode above? Since the pseudo bitmap is only 16 colors, I can use all 16 subpalettes for precalculated fades of those 16 colors; 16 fades. I already know the distance of the text from the camera, I can now use this to do 3D light shading. That’s pretty freaking cool. What about objects? I’m still in the planning stages for that, but I can treat them as simple texture overlays. And I can optimize for horizontal or vertical rendering – for whatever is faster for the object design. Also, the objects can overlaid with a fade distance subpalette as well on as per pixel basis. Oh, the weapon or hand can be a sprite.

I ran some numbers and a full texture read out (max height across the screen) to a 128×128 screen, is ~2.3 frames. So 20fps with room to spare in that last frame. To get a idea here, 3 frames is 20fps, 4 frames is 15fps, 5 frames is 12fps, 6 frames is 10fps.  I think the ideal place to be would be between 15fps and 12fps, with decent amount of action and objects on screen. I should note here that the max height, player facing a wall up close, is pure pixel fill rate. An open, normal, area would actually yield a higher frame rate.. up to 30fps. (without objects). Another correction too; that 2.3 frames number assumes the wall texture is 128 real pixel tall. If the game was limited to 64 pixel tall textures (like the real Wolf 3D game), double pixel write mode kicks in and drops the overall cycles per pixel write at a much lower rate. It would be less than 2 frames per second (more like 1.33 frames) at 30fps. Double tall pixels get a boost (scaled up textures) in pixel fill rates. Of course that’s just pixel fill. That doesn’t include 2D raytrace or the small overhead of the hsync routine to reposition the map as a bitmap display – but given both to those, it should just about make it in 30fps.

ADPCM this time..

Posted in Uncategorized on November 24, 2015 by pcedev

Black Tiger had made the comment that if hucards had enough storage, they could have used streaming voices for cinemas using ADPCM and 10bit paired channel output.

That got me thinking, is that really feasible? And the answer is; yes, yes it is. I put out a demo playing two songs. One was 20khz and the other was 33khz, but neither was interrupt driven. So it got me thinking, what kind of acceptable ADPCM playback can I get from timed interrupts? What kind of resource am I looking at? Storage-wise, ADPCM is 4bit per sample. 4bits for a 13bit output is pretty decent IMO (clipped to 10bit for the paired channels).

The mednafen authour wrote the decompressor, and I’ve modified it slightly with a few case optimizations, but otherwise it’s pretty fast. So for 15.3khz (not 15.7khz) output, I’m looking at 50-55% cpu resource. And that’s the normal, non self-modifying, code version. Everything is contained within the VDC interrupt routine, so it’s self managing. That’s always nice because the other option is buffer fill and buffer read, and that gets tricky with timing.

So this soft playback ADPCM streaming sounds great at 20khz, but what does it sound like at 15khz? Hopefully pretty decent. From what I’ve heard in comparison to ADPCM on the CD unit itself, this soft playback routine seems to sound better. It might have to do with how the original ADPCM chip in the PCE CD unit is 10bit output too, but it can clip and overflow rather than saturate into positive or negative amplitudes (i.e. does it clip at 10bit, or 12bit but output 10bit?). Or maybe it’s something else, as in a filtering effect of the PCE audio circuit compared to the ADPCM output circuit of the CD unit.

Typically, CD games use 8khz ADPCM output for sound FX, and sometimes streaming.

So where is all this going? Well, I have a SF2 mapper and a flash card.. and if I reserve 2048k just for streaming audio, I can do a small demo (shmup) with streaming music. I only have 274seconds to work with, if I reserve the lower 512k for the game/demo itself. 274 seconds isn’t a lot, but I can loop tracks. At a minimum, I would need two level tracks and a boss track. Optimally though, I would want a fourth ending track. So something like three 70second tracks and one 64second track. Or whatever. How it’s divided up isn’t really an issue.

I spent yesterday reworking the ADPCM routine into a VDC interrupt routine. I also picked out two levels from two other shooter games of other consoles. The demo is going to be a simple vertical shmup/shooter.  I was toying with the idea of the canyon level of Musha, and the 3D fire level of Axelay, with the Axelay level proceeding the Musha level (kinda makes sense). The graphics won’t be exact, but the effects will be similar. I plan to rip other enemy sprites from verty shmups too, and probably do a different boss for the Musha stage. I have 512k to work with for graphic assets. For both the Axelay 3D level and the Musha canyon stage, I spent quite a bit of time doing calculations for effects as well as redesigning the approach to those effects (with 60fps in mind). It’ll be kinda tight, but I’ve worked with worse.

As for the PCM engines, I did some work on those as well. The first XM player is done and I’ll probably release a very simply demo for it, and then one with a song demo afterwards.


But back to Black Tigers ponderings, if you did 7khz ADPCM for voice then that’s 3.5k per second. If you reserved 512k of rom for ADPCM, that gives 150seconds of speech or audio. If you used PSG/chip for music and some sound FX, you could easily put together cinema audio tracks. The silence between speaking or other audio parts, doesn’t need to be stored. Cinemas don’t take a whole lot of resource; I could even do realtime linear interpolation for that 7khz on a 15khz output.

But all this talk about compression, makes me wonder how some other compression schemes out there would sound. Maybe something less cpu resource than ADPCM. Something like range encoding delta PCM via block segments (kinda like the snes).