Thursday, February 26, 2015

I haven't blown up my DDR RAM, so that's good

The funny problems I am having with the DDR RAM on the new board, combined with the fact that I accidentally ran the bitstream intended for the non-DDR board had me a bit worried that I had blown up the DDR RAM or some of the pins on the FPGA.

To see if this was the case, I asked the guy who kindly supplied me with my Nexys4 DDR board, and who has one of his own, to test the latest bitstream on his board.  If it booted to the C65 ROM on his, then it would indicate that I have fritzed my DDR RAM, if not, then it would indicate that the problem was with the design, and that I just need to do more work on fixing the problem, without worrying about if the cause was a faulty board.

Here is his board running the bitstream:


And, thankfully, his board behaves just like mine -- so there is nothing wrong with my board.


Now I just have to figure out what is causing the unreliability with the DDR controller interface. 

I have largely taken the example one, but to keep things simple, I am running it on the same 193.75MHz clock that is driving the VIC-IV, instead of the 200MHz clock that was used in the example DDR controller.  I had figured that the small difference would not upset things.  

Now given that it is having problems, and I can't think of any more probable cause, I will take the extra effort to derive a 200MHz clock for the DDR RAM controller.  

Unfortunately, the clock generator I am using can't generate ~192MHz and 200MHz at the same time.  I did think about just pushing the VIC-IV to 200MHz, which would result in 62.5Hz video output, which is not that appealing. It would also put more pressure on the timing closure.

This means I need to figure out how to chain two clock generators, which I have previously not managed to do.  I have so far worked out that I need to remove the global buffers from the front of the clock generators.  What I have yet to work out is if a global buffer will be automatically instantiated and attached to the front of each clock generator. I have it trying to synthesise right now, so we will find out soon enough.

Tuesday, February 24, 2015

Still almost working on the Nexys4DDR board

I have managed to get the DDR RAM on the board almost working.

Sometimes it works fine, and I can write and read back data, and it appears in the correct place in memory.

Other times, all manner of fascinating odd things happen.

Having dug through some information, it looks like I will have to implement temperature compensation between the FPGA and DDR RAM.  Basically, as the FPGA, RAM and traces between them change temperature, their electrical properties vary enough to cause problems.

Did I mention that I really don't like DDR memory, because it is too complex?  Oh how much nicer it would be if there were cheap 128MB SRAM chips available.  If you happen to know of any, I'd love to hear.

Fortunately the FPGA has an internal temperature sensor, and the DDR RAM controller that Digilent and Xilinx provide has an input for the temperature, so that it can recalibrate the communications between the FPGA and the RAM whenever necessary.

I'll take a look at implementing this when I get a chance, and then, hopefully, the C65 ROM will work on the new Nexys4DDR boards.

Almost all working on the Nexys4DDR board

First, confession time.

It turns out that the funny pin assignment problems I reported in the previous post were because I hadn't correctly compiled the FPGA bitstream, and so I was using a bitstream that was meant for the original model of the Nexys4 board. So one dunce cap for me.

Second, good progress.

Having identified the cause of the compilation failure, I was able to build a bitstream, and it did indeed have the correct pin outs all of a sudden.  Funny that.

Reading from the slowram was now giving some results, but they were all a bit weird, like the following:

.M8000000                                                       
 :8000000 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000010 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000020 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000030 70 71 70 71 70 71 70 71 70 71 70 71 70 71 70 71
 :8000040 70 71 70 71 70 71 70 71 70 71 41 08 41 08 41 08
 :8000050 41 08 41 08 41 08 41 08 41 08 41 08 41 08 41 08
 :8000060 41 08 41 08 41 08 41 08 41 08 41 08 41 08 41 08
 :8000070 41 08 41 08 41 08 41 08 41 08 41 08 41 08 41 08
...

The same values were being repeated.  Using the serial monitor like this can be very diagnostic, because it pumps the CPU to perform each 16 memory reads on a row in sequential CPU cycles, so any lag in the memory reading shows up as repeated values.  

This had me suspecting that I needed to increase the number of waitstates on the slowram.  This was a bit annoying, because it already had six wait states, so takes 7 cycles per access, giving an effective speed of just 6.8MHz when working in slowram.  

Some experimentation discovered that the minimum stable setting was $22 = 34 waitstates. In other words, less than 2MHz. There was also a funny thing happening, as can be seen below where I set 16 memory locations to sequential values:


.sffc00a0 22

.m8000000 
 :8000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.s8000000 1 2 3 4 5 6 7 8 9 a b c d e f 0

.m8000000
 :8000000 03 04 03 04 07 08 07 08 0B 0C 0B 0C 0F 00 0F 00

.

Basically the memory lookup was ignoring address line 1, resulting in reading the same 16 bits twice.  I think I have found and fixed the cause of that problem, which I will find out after I resynthesise the design.  I have also fixed the address mapping of the slowram, so that it is available as one 127MB contiguous block.

The amazingly abysmal latency of the DDR2 slowram  has me thinking about simple caching strategies.  About 30 of the 34 wait states are due to the RAS and CAS latency of the RAM, and so are unavoidable in that sense, although I could in theory work out when only a CAS select needs to occur, and trim some cycles off when that happens.  

But what is a much better idea is to take advantage that the 34 cycles gets you 16 bytes of data, and make a nice little cache.  That cache could be accessed in perhaps 2 waitstates (so 48/(2+1) = 16MHz effective speed), and could hold a few KB of data, and do some pre-emptive reading to hide the 34 cycle delay when it is incurred.  The end result should be faster, on average, than the old slowram was.  The motivating factor for implementing this is that the C65 DOS will be about four times slower until I implement the cache, because it runs out of "ROM", which is really held in slowram.  As a result this might happen sooner rather than later.

If I get really excited, I might also add some IO registers that allow you to ask the cache to pre-fetch a line or memory, so that if you know you will need some memory soon, you can ask for it to be fetched ahead of time, but that will take a bit of effort, and since it won't provide any immediate benefit, won't be too high up the priority list. 

Monday, February 23, 2015

First light on the Nexys4DDR board -- but some significant problems remain

I have managed to get the C65GS to synthesise targetting the new Nexys4DDR board, however there are still some big problems to solve.

First, here is the hypervisor trying to load the kernal ROM.  Now that we have a new lab at work, I have a new way to get decent analog screen-shots, by simply taking a picture of the wall-mounted 60" flat-screen.  It also makes for a very big cursor :)


Back to the technical bits.

On the plus side, it can see the SD card, and can read from it, so that side of things is all working nicely.  However, it is going through the cluster numbers one at a time, because it thinks that I have a physical switch on the FPGA board set to the debug position, which I don't.

Taking a look at the board, I can see that clearly the pinout on the FPGA isn't what I think it is, as the segmented leds are doing odd things, and those two super-bright LEDs shouldn't be trying to beam light to the moon.  The green indicator LEDs look like they are wired up properly, though, so that's a plus.


I let it go through the slow debug-mode loading of the ROM, to see if I could get it to think that the debug switch had been released.  While the debug switch is set, it won't actually boot.


Unfortuntely, I couldn't.  

A bit of digging in the serial monitor confirmed that the pinout, or something else is wrong, as watching $D6F0/$D6F1 for the switch positions showed no change when moving the switches. 

Supporting the idea that the pinout is all mixed up, I did see that one of the switches gets interpreted as one of the directional buttons on the FPGA board.

It was thus no surprise to find that the DDR memory was also not being responsive, since it is also likely to have messed up pin assignments.

On the positive side, the DDR controller synthesised, and the system seems to be mostly working, apart from these issues.

Now my challenge is to find out why the pinouts are all messed up, given that I copied the new pin assignments from Digilent's demo project for the Nexys4DDR2 board.  I might have to confirm the schematic with them.

I had a few minutes to work on moving to the Nexys4 DDR board

I had a few minutes this evening to try to move the code over to the new DDR memory based version of the Nexys4 board.  

After fighting with all the signals and things that I had to rename, it looks like it is trying to synthesise, and in a couple of hours I will know whether there are more errors to deal with.

I had thought about implementing my own DDR memory controller, but it all looks a bit too much like magic from my initial look.  I may yet change my mind if I have too much trouble building with it.

The big problem for now is that there are silly "all rights reserved" messages in some of the files, which seems rather at odds with it being "sample code" designed to make ones life easier. I will try to poke Digilent to see if they will update the copyright notices to better reflect the intended purpose of the code.  In the interim, I guess I will just update the make file so that if given the zipped up demo project, it can extract the files it needs.

Hopefully it won't take too much longer to get the DDR memory working, and we can get back to more fun posts with screen-shots.

Tuesday, February 17, 2015

A screenshot for those who are getting bored looking at FPGA boards and want to see something fun

Well, following the sale of a C65 recently for what amounts to about AUD$29,800, a few more people are coming here.

Seeings as the last several posts are all rather esoteric, and don't have anything much fun for those just wanting to see a crazy 8-bit computer at work, here is a rerun of the C65GS running BoulderMark, the Boulder Dash inspired benchmark for C64-compatible computers.  For the record, a regular C64 scores 313 points on BoulderMark.  As you can see below, the C65GS is ever so slightly faster.  With any luck, the C65GS prototype board will be making a visit to Revision this Easter for those wanting to see it first hand, and perhaps have a play.


Oh, and just for more fun, here is what happens when you try to play Stunt Car Racer on a computer that can do as much work in a second as a real C64 can in a whole minute:


You would need some pretty good reflexes to play that.

Finally, here is another C64 benchmark program (yes, there are two major C64 benchmark programs!)



With the above, the C64 benchmark table really should be updated.  Maybe someone can poke them to add the C65GS.

Monday, February 9, 2015

Nexys4 DDR board arrives

Today a parcel arrived in the post containing the updated Nexys4 board, the Nexys 4 DDR. The main difference is the 16MB Cellular RAM is replaced by a 128MB DDR2 RAM, apparently because Digilent can't get the Cellular RAM part any more.  The pinouts have changed a little as well.  Anyway, because of these changes, I need to slightly tweak the design to work on this new board, which I will hope to do in a couple of weeks time when I finish writing some grants applications for work (which is also why not much has been happening on the project the last few months).

I must also say a big thank you to the supporter who bought this board for me!  The generosity is very appreciated.  But more than that, it excites me when people support this project, including when someone else provided me with joysticks for testing, as it tells me that a community is growing -- which is what I had hoped.


My German is slowly improving, and so I was able to respond to "Hier offen", upside down, without having to resort to the English cheat text  (nur erwarte nicht mich mit dir auf Philosophie noch bereden ;).


As you can see, the board and case are very similar to the old one.  The DDR RAM is located below the FPGA.  The increased bandwidth of the RAM does open up some new possibilities, but it will be some time before I can explore those.  For now, it will just be mapped into the conveniently already almost 128MB (well, 112MB) piece of address space from $8000000-$EFFFFFF.  When I finish implementing the 32-bit addressing modes, I will also map it somewhere above 256MB in its entirety.