Monday, March 31, 2014

Progress towards CPU speed-ups

The last few days I have been fighting against some subtle bugs introduced recently while trying to speed the processor up a bit.  The machine is almost working, but not booting properly for reasons I have yet to figure out.

Today at lunch Redback dropped by the lab to say hello and see the hardware in action.  However, I didn't have a working FPGA bitstream for the FPGA from the last good point.  

But after some poke and fiddle with the latest bitstream, it spontaneously booted to the C64 READY prompt after I wrote to a random piece of memory from the serial monitor interface.  Most bizarre.

Nonetheless, we seized the moment to run synthmark64, which was showing x18.69, I think because the read-modify-write (RMW) optimisation was running, even on IO addresses.  I need to fix that. So then I set the bit to enable the optimisation for hiding memory read wait states when possible.  That did work, and as the result below show provides a roughly 30% speed up.  

This makes the current C65GS prototype 24 times faster than a stock C64, and faster than all other known accelerators overall and for each instruction group, excepting for reading zero page, where the Chameleon is slightly faster.

Anyway, here is the screen show before things went bad when I tried to toggle the RMW optimisation.

I still hope to push the acceleration to closer to x50 in due course.


Sunday, March 30, 2014

Running synthmark64 before optimisation of CPU

I have a few CPU optimisations I plan to implement soon, primarily relating to fast instruction decode by making use of the 64-bit wide bus to fetch whole instructions in a single cycle*.

Thanks to the folks on #c-64, I found out about the synthmark64 benchmark program and summary.

Interesting to read through.  As can be seen, things like the SuperCPU and Chameleon are both around 20x faster than a stock C64, provided you don't touch IO.

I managed to get synthmark64 running on the C65GS in C64 mode, although not without some weird fiddling with the serial monitor to get it unstuck.  [Update: found the problem. I had left a CPU breakpoint enabled, which synthmark was triggering.  Cleaning that up lets it run properly without fiddling].

Once unstuck, all the tests run through repeatedly without further intervention.

This means I have a synthmark64 speed rating for the C65GS as it stands, before I optimise the CPU, which will allow me to properly evaluate the utility of the improvements I make.

[Update: Here is a screenshot after I fixed the CIA timer bug that was causing the display to show 37x instead of 18x, as discussed further down]

Display showing correct speed up after I fixed the CIA timer bug.
Before you look at the screen shot and go all googly-eyed over the prospect of the machine currently being 37x faster, and thus almost 2x the speed of the SuperCPU or Chameleon, be aware that there is a bug in the CIA where phi0 is being fed in at 500KHz, not 1MHz, and so the counters are running at half speed, which means you need to halve the results.

Nonetheless, this means the machine is already about as fast as the Chameleon or SuperCPU for most operations, despite the dreadful 2-cycles-per-byte-of-instruction decoder it is currently using.  And for IO operations it is much, much faster.

Don't forget to halve these values for real comparison, because a CIA clock bug makes the machine look 2x faster than it is!

The obvious thing to see here is that the C65GS doesn't slow down when it hits I/O, because the I/O all runs at 48MHz instead of 1MHz like on a real C64.

It is also interesting to see that the C65GS has a relatively fast JSR compared with the others.  This is probably because JSR causes three pushes to the stack, which are writes, and so don't incur any wait-states.

More on this as I implement the various improvements, and also get around to fixing that CIA clock bug.


Tuesday, March 25, 2014

Hello World from the C65 ROM

After fixing a variety of things, the C65 ROM now boots to the READY prompt, and the screen editor works.  Pressing F1 even toggles between 40 and 80 column display modes.

This is a very nice milestone.  See the video below:



The BASIC interpreter isn't parsing commands for some reason, so every time you press enter you get a READY prompt, even if you have entered gibberish.  I will try to look at that soon.

Tuesday, March 18, 2014

Two steps forward, one step back, and perhaps one sideways as well

Trying to track down why the C65 ROM was continually redrawing the startup banner, I discovered that part of the ROM in memory was wrong.  Either it was being corrupted, was being loaded incorrectly from the SD card, or the copy of the ROM on the SD card was corrupted.

It turns out that two of these had happened.

During some earlier testing of the DMAgic controller, it seems that a DMA happened that wrote to the SD card registers, resulting in at least one sector of the C65 ROM being overwritten with gibberish.

That was fairly easy to fix.  However, once it was done, the startup banner was not being displayed at all.

I hadn't changed anything else, so I knew it had to be something related to the ROM.

Some poking around confirmed that at least part of the ROM was still not being loaded correctly, and that it seemed to change behaviour if I deleted and recreated the ROM file so that it had different cluster numbers on the FAT file system.

So I wrote a little utility in C that replicates the FAT file system code in the kickstart ROM to make sure that that logic was okay.

That was fine.  So I wrote another utility that can read the memory out of the machine while running to see exactly what parts of the ROM were corrupt.

It turned out that $06A00-$06FFF and $16A00-$16FFF were consistently being loaded with the wrong contents, suggesting a reproducible file system bug.

The addresses affected were bizarre.  I managed to find the location on the SD card that was being loaded, and augmented the kickstart ROM to show the location of each sector being loaded.

The sector addresses were nicely incrementing by 512 ($200) bytes, until it reached $004FFE00, after which it advanced to $014F0000 instead of $00500000.  I had found my bug.

A quick patch to the "increment address of next sector" routine in the kickstart ROM did the trick (fortunately not requiring a 2hour FPGA rebuild to test).  The result was all the odd ROM misbehaviours I was seeing earlier disappeared. But some strange side-effects have popped up, as can be seen in the screen shot.  (I had to restrict the screen shot to the upper left so that the camera would get the exposure reasonable):


Two things in this image:

1. For the first time, I have the C65 ROM displaying a READY prompt! This is really good.  This is the two steps forward.

2. For some unknown reason, the colour RAM is being filled with $E6 instead of $06 for blue. $E6 has bold attribute is set, causing it to use palette entry $16 instead of $06, and that palette entry is black. This is weird, because it looks like the DMA to fill the screen is okay.

3. Starting with IRQs enabled, or enabling IRQs still causes major problems.  I need to keep debugging what is going on with the VIC-IV raster interrupt implementation, since that seems to be the source of the problem in one way or another.  It is a bit annoying to debug, because the raster interrupts are happening at 60Hz, which makes it tricky to see just what is going on. Until IRQs work, keyboard entry won't work.

So a bit of a mixed bag of progress.

Sunday, March 16, 2014

C65 start up message without tweaking

In previous posts I had reached the point where the startup display was partially right, but with @'s everywhere because the DMAgic DMA controller had not been implemented.  Also, the colour bars were not showing, which I had presumed to be due to the same reason.

Implementing the FILL DMAgic command got rid of the @'s but the colour bars were still missing.

So I set about implementing the COPY functions of the DMAgic in case that was the reason, which as I suspected was fairly easy to do, and was verified by seeing that the screen scrolled, since the C65 uses the DMAgic chip to scroll the screen.

I wasn't confident that this would fix the problem, but it made sense to implement this DMA function at the time, anyway.  And indeed the colour bars remained missing.

Some digging through the C65 ROM tracked down the code that sets and clears reverse character mode.  This is done by setting bit 7 in $F4. On the C64 or C128 this would require an LDA / ORA / STA or LDA / AND / STA instruction sequence, requiring six bytes and a dozen or so cycles.

The C65's 4510 on the other hand has instructions for setting and clearing bits in bytes directly.  SMB0 through SMB7 set the corresponding bit in a zero-page memory location, and RMB0 through RMB7 clear the bit.  As a result the C65's reverse-on routine is simply SMB7 $F4 followed by an RTS.  Three bytes instead of six, and just four cycles for the memory modification, and no registers or flags modified in the process.  Those new instructions really do help to write faster and more compact bit-fiddling code.

As I discovered this I had the sudden realisation that I had never got around to implementing those instructions.  Oops.

A quick bit of VHDL coding to add them in, followed by the usual slow synthesis process (about 90 minutes at present), the C65 startup display finally shows correctly, as can be seen below in the screen grab which was taken without any hand tweaking of the display, instead only with the CPU paused after drawing the complete display.

All I did was pause the CPU at the right point, since it still loops continuously redrawing the display for reasons I have yet to discover.  I will probably need to disassemble the C65 ROM a bit more, and use the serial monitor interface I built to better understand what is going on here.

Nonetheless, it is a nice milestone that the CPU is compatible enough now to actually draw the display correctly.


Note that rather inexplicably the C65 ROM now thinks that it can see expansion RAM.  I will need to figure out what is going on there.  I'll look into that as part of the process of understanding why it doesn't work with IRQs enabled.

There is also an issue where the colour memory doesn't scroll with the text, which I think is because the DMA command used to scroll colour memory operates on the memory at $001F800 - $001FFFF, rather than the IO-mapped access at $D800-$DFFF.  I'll have to think about a solution for this, because this is a difference with the C65GS where to save display cycles, and also have enough colour RAM for large text displays (up to 240x150 characters is possible), I have colour RAM in a separate 64KB RAM.  I might have to implement a C65 compatibility mode where accesses to $001F800 - $001FFFF map to that RAM instead of to the chip RAM.

Saturday, March 15, 2014

Fill command of DMAgic DMA controller implemented

In the previous post, the C65 startup display had lots of @ characters that I had to remove by hand, because the DMAgic DMA controller was not at all implemented.

DMAgic has four functions: FILL, COPY, MIX and SWAP.

FILL is the easiest, and COPY and SWAP aren't that hard to implement either.  MIX is not implemented in the real C65 prototypes, so that is a lower priority.

As FILL is easiest, and has direct impact on the startup display, since DMA is used to clear screen lines on the C65, I set about implementing that first.

DMAgic is a separate IC on the C65, but in the C65GS I have made it part of the CPU, since it simplifies a number of things, and avoids me having to duplicate the address decoding logic.

After pulling my hair out over some precedence of assignment issues in VHDL, I managed to get it working.  I had to guess whether 1 or 0 in the DIRection bit meant forwards, and initially guessed wrong, but that was easy to fix.

With all that in place, the C65 ROM now displays almost correctly, as can be seen below.  This screen grab was captured as-is, with no hand modification.


All those pesky @'s have gone away, giving me confidence that DMAgic is working more or less correctly for the FILL command.

The colour bars are still not showing, which I had originally thought might have been DMAd over the top.  A bit of investigating reveals that they are drawn just using RVS-ON and spaces.  So they really should be showing up.

I'll look into that after I implement the DMAgic COPY command, just in case DMA copies are used to set other important structures up.  I also know that COPY is used heavily in the BASIC interpreter, and COPY should be so easy to implement now that FILL is working, anyway, that there really is no reason not to knock it off now.  SWAP and MIX will likely wait.

Other niggly problems are related to IRQ handling, where if IRQs are enabled on power up the machine does something odd instead of booting.  Similarly enabling IRQs after boot causes lots of "BREAK" messages to scroll past (well, they will scroll when DMAgic COPY works).  So I'll have to have a look into that once DMAgic is working satisfactorily.

Thursday, March 13, 2014

Progress towards booting C65 ROM

After working on getting the kickstart ROM (which should probably be more correctly called a bootstrap ROM), I have been able to load the C65 ROM into slow RAM on the C65GS for a couple of weeks now.

Efforts have been turning to getting the C65 ROM to actually start up properly.  This has been a bit slow, as I have chased CPU bugs of various sorts, and dealt with a subtle issues in the bank switching side of things.  For example, the $0000/$0001 CPU register only appears in bank 0, and the MAP and $D030 methods of banking take precedence over it, except for controlling the appearance of IO at $D000.  Or at least that's what seems to be the case.  I will need to get a C65 owner to verify some of this for me at some point.

Anyway, I had it working to the point where it would select 80-column output (and have the VIC-IV actually switch to the right mode to drive it) for a while, but there was no visible text, just the 2nd half of the screen filled with blinking @ characters because the colour RAM was presumably not being initialised.

After bashing my head on that for a while, I discovered that the start message was in the screen memory, but not visible. I tracked the problem down to the palette being set incorrectly: the high and low nybls of the palette needed to be reversed, as the C65 only uses the lower nybl in palette entries.

After fixing that, suddenly I had a visible and correctly coloured startup display.  Well sort of. As you can see there are some rather glaring problems.


Most obviously, the screen is largely full of @ characters.  This is because the C65 clears the screen lines (filling them with spaces) using the DMAgic chip to make this process quite a bit quicker than using the CPU alone.  I haven't implemented the DMAgic chip yet.

Also, the 2nd kilo-byte of colour RAM doesn't seem to be getting initialised.  But that could also be related to DMA, as the kickstart ROM initialises the first KB of colour RAM.

One funny thing I need to debug is that the ROM only displays the startup message if IRQs are initially disabled.  It is pure luck that I had IRQs physically disabled (I have an override hardware switch on the FPGA board to disable IRQs as it makes single-stepping the CPU easier if IRQs aren't being triggered all the time).  Otherwise I would have just seen a blue screen full of @ characters.  However, the ROM still doesn't display a READY prompt. This might be due to a ROM mapping bug that I just discovered, that is likely messing up the 1581 DOS, and preventing it from returning after looking for the disk drive.

Finally, the colour bars on the left of the screen are missing, with just the sloped graphic character visible, but nicely in the right colour. Looking at the colour RAM, the correct colour is set for the other characters, but the reverse video bit is not set, and only a space character is being written there. I will need to dig to find out what the problem is here. It might again be that DMA is used to fill the colour RAM bytes with the reverse-attributed colour.  So there is probably little point in trying to figure out the cause until I get the DMAgic chip implemented.  DMAgic is also need for the BASIC ROM, so that really is the priority.  The DMAgic chip will be fairly easy to implement when I get the chance.

So while not quite there yet, it is progress.

EDIT: I couldn't resist fixing the @'s, colour memory values and reverse attributes on the running machine, just to see what it would look like.  Screen grab below:


Ah. That's better.  While I had to hand-doctor the display memory, this is being run in the real C65GS (FPGA) hardware, with a real 8-bit computer inside, and a real, if rather incomplete and imperfect VIC-IV video controller generating it in real time.

Note that 80-column mode has no side-borders, because 640 pixels * 3 = 1920, the native resolution of the C65GS. There are some vertical borders.  They could be stretched out if desired, or left in to keep the more 8-bit feel, which is my preference.

Anyway, onward and upward to get this display happening without hand-tweaking of memory after boot ...