Thursday, August 7, 2014

De jevu: C65 startup banner and dead BASIC, again.

The title says it all: The 48MHz CPU is almost working, but there is clearly something fruity going on.  Sadly it isn't the same issue as when I got to this point last time, but I have some extra clues, like BASIC in C64 mode also behaves similarly. There are also some other nuisances, like IRQs don't seem to be getting masked properly.  But it is getting pleasingly close.

I am also interested to know whether people think the display is better with little side borders, or whether people would prefer no side borders in 80 column mode.  I personally think that having little ones adds to the authentic feel, but without sacrificing acres of screen.  Also, the real C65 has quite narrow side borders compared with the C64.  But I would like to hear what you think


Oh, and if you got this far, you might be able to spot the display glitch that I have to fix.  If you can, and can accurately explain the cause (I already know it), I'll greet you on the next screen shot I post on the blog.

80 column now has side borders and other misc progress

I am conscious that I haven't had many fun screenshots to share lately as the CPU redesign drags on.

Fortunately, the CPU design is almost complete, and I am now shaking out the bugs that are still lurking so that I can get back to where I was up to before the redesign.

Today I made some changes so that I could simulate booting the C65 ROM.  This requires disabling Kickstart, since that would look for the SD card, and I don't have the means to simulate that (yet).  So instead I made a model for the slowram that holds the ROM after it has been loaded by Kickstart, and made a control flag to supress Kickstart when in simulation.

This means I can easily simulate the system booting the C65 ROM, and look for anything odd with nice cycle-by-cycle memory access information at my fingertips.  This has already allowed me to fix some DMAgic bugs in the redesigned CPU (DMAgic lives in the CPU in the C65GS).

Simulation is nice, because with tools I created and described in an earlier post, I can capture the VGA output digitally which is nice for screenshots here.  

However, simulation is REALLY slow -- taking between 20 minutes and an hour to simulate one frame of activity.  This isn't quite as bad as it sounds, because the C65GS can do a LOT in one frame. With the CPU at 48MHz and a 60Hz display, this means 800,000 CPU cycles per frame.

The slowness of simulation means that it is a good idea to avoid any unnecessary delays in the startup process of the ROM.  The main problem in this area is the PAL/NTSC detection routine, which has to wait until the end of the frame to work out if the machine is PAL or NTSC based on whether it has more than 263 raster lines.  

Fortunately, when you control the hardware, you can tweak things, and so on power up, the VIC-IV's VIC-II raster number register is purposely incorrectly set to raster 264 at the start of the frame. This means that the PAL/NTSC detection routine takes just a couple of dozen cycles to decide that it is on a PAL machine.  The register gets reset at the bottom of the frame, and then works as per normal in all successive frames, so the solution is a nice one.

The end result is that the C65 ROM does all its preliminary work in <100 physical rasters.  You get an idea for how fast this is in the following screen shot. The black part is the rest of the frame that would have been drawn, except that I stopped the simulation before it got that far.  

If you make it full size, you can see a couple of rasters of junk at the very top.  That is the time from when the CPU powers up until the ROM sets the VIC-II/VIC-III video registers for 80-column mode and sets the border colour. 

Then it is all border until the 80-column text starts, during which time various bits of memory are being setup by the Kernal and possibly internal drive DOS, and then uses the DMAgic to clear the screen.  The first line of text shows the wrong contents either because the badline happened at the very top of the display, and no badline happens because of the changing value of $D011 in the middle, or because the CPU does actually take a few dozen rasters to set everything up.  I haven't looked into which is actually the case, and it doesn't really matter.  What is clear is that by the second row of text the CPU has already cleared the screen.  Of course, it should also be showing the C65 start-up banner, which it isn't, and so I have some more bugs to hunt.

The other interesting feature of the screen shot is that it shows the reworked 80-column mode using the new horizontal hardware scaler that allows non-integer numbers of physical pixels per logical pixel.  By judicious selection of the scale factor, we now have some little side borders so that it feels much more 8-bit than when it occupied the full width of the display.


Monday, August 4, 2014

Ethernet receiving now has hardware CRC check, and IRQs are close

I managed to get a little time this evening to work on the remaining bugs in ethernet reception.

The main bug was caused by using a bad example ethernet frame that had reversed bit-order in every byte.  I confirmed this by capturing a real ethernet frame, and using that as the test vector in simulation.  CRC check failed, so I tried reversing the bit order, and viola, I had valid CRC detection.

Of course, I could have left CRC calculation to software, but it is a rather boring thing to implement, and modern network cards all do this, so it seemed like a good idea to do.  After all, if I can make it work once in the hardware, then all software can ignore it forever after.

This is one of the nice things about FPGA design, in that it is much more realistic to make the hardware nice to program, because the incremental cost of making an interface that bit nicer is quite low.

Another thing I have almost working for the ethernet is IRQ triggering on packet sending and receiving, so that you don't have to poll the ethernet adapter all the time to know if you there is a packet to receive, or if you can send another frame yet.

To give an idea of how simplified the interface to this ethernet adapter is, here are the register addresses that matter:

$DE800 - $DEFFF - 2KB received frame buffer.  First two bytes are the length of the frame.  If bit 15 is high, then the frame failed the CRC check.  Frames with a bad CRC don't trigger IRQs.
$DE041 -  bit 7 - Enable IRQ on frame RX
$DE041 -  bit 6 - Enable IRQ on completion of frame TX
$DE041 -  bit 5 - A frame has been received since $DE041 was last written.
$DE041 -  bit 4 - A frame has been sent since $DE041 was last written.
$DE041 -  bit 2 - Which RX buffer was last written to by the ethernet adapter.
$DE041 -  bit 1 - Which RX buffer is mapped at $DE800 - $DEFFF for reading.
$DE041 -  bit 0 - Set to 1 to force ethernet PHY into reset state.  Reset to 0 to allow normal operation.

As described above, there are two receive buffers, so that you don't have to worry too much about losing frames while reading them out, or similarly having the part of the frame overwritten in the buffer while you are reading it out.

To clear any IRQs write anything to $DE041, although this should normally be whatever you read from it, so an LDA / STA pair or similar is the best bet for now.  I might change the interface in time so that DEC can be used to do it a little quicker, but not just yet.

To send a frame, you write the bytes to $DE800 - $DEFFF, write the frame length to $DE043/$DE044, and then write $01 to $DE045.  Note that the TX buffer is mapped to the same address range as the RX buffer.   In other words, the TX buffer is write-only, while the RX buffers are read-only.  When you transmit a frame, the ethernet adapter automatically calculates and appends the ethernet CRC to the end of the frame.  This still has some bugs, so the CRC is incorrect, but you hopefully get an idea of how easy it is to send and receive ethernet frames with this ethernet adapter.  I might add automatic IP checksum calculation later on, too.