Thursday, April 30, 2015

Nexys4 DDR Boards are out of stock everywhere it seems

So, it seems that there are no Nexys4DDR boards in stock in the USA, UK or continental Europe.

We probably aren't the major cause of this, but it would be funny to find if we were.

For those desperate in the meantime, it might be possible to get them from the Australian distributor, although where I am at the moment it isn't trivial for me to find out their stock levels.  Perhaps someone can leave a comment with what they find out.

Otherwise, Digilent seem to have another production run scheduled for later this month, with stock shipping again from the 29th.

Monday, April 27, 2015

Wir sind in den Nachrichten

That is, we are in the news, in German, in Germany.

Two fairly well known IT/computing magazines there have run stories on the MEGA65 / C65GS in the last couple of days:


I have been trying to respond to comments on the Heise.DE article, which has been stretching my limited German, but this is a good thing, because I am trying to improve my German :)

I'll try to summarise some of the recurring questions there and elsewhere here, for your convenience:

What will the case look like?

We are trying to make the case look as much like the original C65 prototypes as we can.

Will it have a real 3.5" floppy drive?

We hope so! We are certainly intending that this will be the case.

Do we have to choose between SD Card support and having a real floppy drive?

No, if we add floppy drive support, it will do both at the same time.  Probably how this will work in practice is that the hypervisor will read sectors from the floppy drive, and put them on the SD card, from where they will be read using our C65 F011 floppy drive controller compatible SD card controller.  Writing sectors to disk will do something similar in reverse, again using the hypervisor to trap the disk accesses.

Isn't all this going to cost a lot of money (just think of the injection moulds alone)?

Yes, it will.  This is why we are asking for donations on  Also, if you happen to have the skills and facilities to make injection moulding tools, and would like to volunteer to help us, we would love to hear from you, and probably even put your name on the inside of every case.

Isn't this whole project rather pointless?

This is a rather common question on forums and in comments to these articles.

The answer, of course, is that it rather insane. That's the point.  So now we can all move on happily: they can be right, and we will just keep making it anyway ;)

However, while it is rather pointless, it isn't completely pointless. It's like the difference between mostly dead and completely dead in The Princess Bride.  The MEGA65 will be fun for those for whom it is fun, which is one purpose.  Also, I intend to build and use a set of MEGA65 computers in teaching University and high-school students about "hardware intimate programming", to hopefully improve their understanding of computers and improve their future career prospects and enjoyment.

Wednesday, April 22, 2015

Introducing the MEGA65 Retro Computer

Hello all,

For a few months now I have been working behind the scenes with the good folks at, exploring our mutual desire to create a physical 8-bit computer in the spirit of the C65, but that is open-source and open-hardware so far as is possible, so that the community can sustain, improve and explore it.

Basically, we agreed that we wanted to do this, and that the C65GS was the logical basis for this, and thus the MEGA65 project was born, to take the C65GS core, to work together to improve it, and plan towards creating a physical form that is strongly reminiscent of the C65 prototypes.

Our initial announcement is online at

This is also the point at which we are ready to offer pre-prepared FPGA bitstreams.

These bitstreams are still experimental, and updated bitstreams may break things that were working in previous bitstreams as we progressively spiral in on the final product.  This process might take another year or two depending on what support we can raise, being done voluntarily in our discretionary time as it is.

What we have decided to do to help support the project is to ask for a donation of your choosing to be given access to our FPGA bitstream build server that will contain the latest bitstreams.  Our desire is not to exclude anyone -- which is why we are not dictating a donation that is in any way representative of the costs of undertaking this project.  Also, everyone will still be free to compile the bitstreams themselves, but we hope that you will appreciate both the convenience and opportunity to support the project.

Finally, we now have a Twitter account (@Mega65Retro), and Facebook page ( -- so feel free to spread the word.

Sunday, April 19, 2015

Generalised disk access routines now work

After a fair bit of refactoring and debugging, I have the new generalised disk access code working,It still has a few wrinkles (like file names longer than 14 letters get chopped off at 14 letters), and the C65GS can now boot up completely using the new code.

To give an idea of the interface now, here is the code that loads the C65 ROM into place:

ldx #<txt_c65gsrom
ldy #>txt_c65gsrom
jsr dos_setname

; Prepare pointer for load address ($0020000)
lda #$00
sta <dos_file_loadaddress+0
sta <dos_file_loadaddress+1
sta <dos_file_loadaddress+3
lda #$02
sta <dos_file_loadaddress+2

jsr dos_readfileintomemory
bcs loadedok

As you can see, for the simplest use-cases, it is pretty simple: just set the name of the file you want to load, provide the load address, and then use the dos_readfileintomemory utility function that looks after everthing else.  We are using the convention of setting the carry flag if a function completes successfully, or clear otherwise.  In the case of an error, dos_error_code contains the reason fore failure. I have tried to provide meaningful error codes for all things that can go wrong at the moment.

Drilling down a little, we can see how the dos_readfileintomemory function works.  There are few little complexities in that function that I will explain:

; file name must be already loaded into
        ; dos_requested_filename,
; with length in dos_requested_filename_length

We keep track of the number of sectors read, so that we don't get stuck forever if we hit a file with a tangled cluster chain.
; Clear number of sectors read
ldx #$00
stx dos_sectorsread
stx dos_sectorsread+1

Next, we need to find the file and get its details in the single directory entry structure (dos_dirent). A call to dos_findfirst leaves the file descriptor for the directory search open, in case you want to call dos_findnext to find more matching files. So we need to take care to close the file descriptor so that we don't run out.  This is really important, because the hypervisor supplies a grand total of only four file descriptors.  We can't trust dos_closefile to not mess with the carry flag, so we save the processor flags on the stack before closing the file descriptor, and then propagate any error up by jumping to dos_return_error_already_set, which preserves the value in dos_error_code.

jsr dos_findfirst
; close directory now that we have what we were
        ; looking for so that we don't leak file descriptors ...
jsr dos_closefile
; ... but report if we hit an error
bcc dos_return_error_already_set

jsr dos_openfile
bcc dos_return_error_already_set

Next we need to make the SD card sector buffer visible at $DE00-$DFFF.  In future, this and the following code will need to be generalised a bit to allow disks on other media, in which case we will need to instead have a pointer to the sector buffer.  But for now we have a structure that works, and has the flexibility to add such functionality later without breaking compatibility.
        ; Make sector buffer visible at $DE00-$DFFF
jsr sd_map_sectorbuffer

Now we enter the main loop where we read the data into memory.  We read the current sector, then copy the 512 bytes into place, update the pointer, and ask for the next sector, if there is one.  For now, we just use a simple copy loop using the 32-bit pointer access.  It would be quite a bit faster if we used DMA to transfer the sector buffer. It might even take a few less bytes, especially if we embed the load address pointer directly into the DMA list.

jsr dos_file_read_current_sector
bcc drfim_eof

; copy sector to memory
ldx #$00
ldz #$00
lda $de00,x
nop ; 32-bit pointer access follows
sta (<dos_file_loadaddress),z
bne drfim_rr1
inw <dos_file_loadaddress+1
lda $df00,x
nop ; 32-bit pointer access follows
sta (<dos_file_loadaddress),z
bne drfim_rr1b

jsr dos_file_advance_to_next_sector
bcc drfim_eof

Here we have a little security feature: You cannot load across a 16MB memory boundary. This is designed to prevent calls from user-space inadvertently or maliciously trying to load code that will end up loading over the top of the hypervisor.  The 16MB boundary is enforced simply by refusing to increment the upper byte of the load address when we update the memory pointer used to write the data into memory. It isn't actually sufficient yet to be completely effective in this function, because any load to the IO memory space, which includes the hypervisor, could potentially overwrite the hypervisor, but I am already thinking about how to make these calls secure, so that, (a) the machine doesn't crash easily, and (b) so that it actually has good security.  

; We only allow loading into a 16MB space
; Provided that we check the load address before starting,
; this ensures that a user-land request cannot load a huge file
; that eventually overwrites the hypervisor and results in privilege
; escalation.
inw <dos_file_loadaddress+1

; Increment number of sectors read (16 bit valie)
inc dos_sectorsread
bne drfim_sector_loop
inc dos_sectorsread+1
; see if there is another sector
bne drfim_sector_loop

This point in the code is reached if we have read 64K sectors = 64K x 512 bytes = 32MB.  Note that we supply a meaninful error code so that the caller knows what has happened.
jsr dos_closefile

; File is >65535 sectors (32MB), report error
lda #dos_errorcode_file_too_long
jmp dos_return_error

Finally, when we reach the end of file, we need to close the file to protect our precious few file descriptors, and indicate success to the caller.
jsr dos_closefile
jmp dos_return_success

You can also hopefully see in the above that there are functions for various other operations that are necessary.  So I now have all the basic building pieces to make a few useful disk access functions available to user-land, which I will look to do soon, so that hopefully in the near future you will be able to load a file of up to a few mega-bytes quickly and easily with just a few lines of assembly code.

Friday, April 17, 2015

More work on disk routines

I have made some more progress on the reworked disk routines.

I have almost completely refactored the code so that there are convenient routines for opening and reading a directory, retrieving each entry from a directory, finding a file in a directory, opening a file from its directory entry, as well as a convenience routine for finding, opening and loading a file all in one action.

I have also mostly implemented VFAT long file names.

Of course it doesn't actually work yet, but the infrastructure is all there now, and I will move into debugging mode when I get the chance.

Thursday, April 16, 2015

Making the SD access routines available from outside the hypervisor

I already have had working FAT file system reading code for some time now, because it has been needed to load the ROM into the running machine.

The code was, however, rather specialised, only allowing a single partition, and not really made to be callable from outside of the hypervisor.  So I am starting to refactor this so that it can be called from outside, and to setup a logical mechanism for calling it from outside of the hypervisor.

The first step is setting up a suitable calling mechanism from a running machine into the hypervisor.

For this I have taken some inspiration from modern CPUs that have nice ways to call into the operating system.  However, unlike CPUs like the MIPS and x86 CPUs, which can dedicate a separate instruction to this process, the 4502 had already allocated all opcodes.  So we need another way.

What I have implemented is a block of IO registers at $D640-$D67F that if written to, cause the CPU to switch into the hypervisor.  Depending on which register is written to, the hypervisor enters at a different address. In other words, these 64 registers correspond to a jump table in the hypervisor.

When the hypervisor is all done, it writes to $D67F, causing it to exit back to the caller.

Saving registers is a time consuming process, so I wanted this to be as fast as possible. So the GS4510 has about 30 dedicated shadow registers that save various aspects of the processor state simultaneously on trapping to the hypervisor. This means it takes only one clock cycle -- about 20 nano seconds -- to trap into the hypervisor.  The contents of A, X, Y and Z are passed into the hypervisor, as well as being saved in the shadow registers, so the hypervisor doesn't need to load them on entry. The shadow registers all get restored on exit from the hypervisor, restoring the CPU state, also in a single cycle.  This also means that when we enter the hypervisor, we can set a specific memory configuration, so that the hypervisor can get right to work.

Let's think about how how fast this can actually be in practice.

From the user process, you must write to one of the trap registers, e.g., with:

STA $D640

We don't have to set A to anything first, because the trap process ignores all register values (although the hypervisor might look at them once trapped in).

STA absolute takes 5* cycles on the GS4510. Add 1 cycle for the trap process, and we are in the hypervisor. Let's consider a minimal trap, that just returns to the caller without doing anything, and that will require a write to $D67F, so another 5 cycles, and then 1 more cycle to exit the trap.

Thus the total overhead is 12 cycles, or about 240ns.  That is, you could do an empty trap like this around 4 million times per second. In this regard, the GS4510 is much closer to the performance of much faster processors.

With that in place, we can start implementing a useful system call facility.  We'll focus on the disk access (DOS) calls for now.

First, we don't want to use up all 64 system call address for one major function, so we will use a register to indicate a sub-function.  We will use $D640, which traps to $8000 in the hypervisor, and have that jump to our call dispatch routine:

; Sub-function is selected by X.
; Bits 6-1 are the only ones used.
; Mask out bit 0 so that indirect jmp's are valid.
and #$FE
; to save memory we only allow this table to be 128 bytes long,
; thus we have to check that bit 7 is clear.
bmi invalid_subfunction
jmp (dos_and_process_trap_table,x)
; $00 - $0E
.word trap_dos_getversion
.word trap_dos_getdefaultdrive
.word trap_dos_selectdrive
.word trap_dos_getdisksize
.word trap_dos_getcwd
.word trap_dos_chdir
.word trap_dos_mkdir
.word trap_dos_rmdir

The first part of this routine takes the X register to use it to pick which routine to call.  The 4502 has a nice JMP indirect indexed mode that is made for jump tables, which we will use.  

However, there is a bit of an unfortunate design decision in that instruction, in that it doesn't double X before indexing, so odd values of X will cause it to jump to an address which consists of one byte each from two neighbouring vectors in the jump table.  That would be bad, as it would cause it to jump into strange places in the hypervisor, probably messing memory up and never returning, or at least posing a security risk.  So we clear the lowest bit of X before doing the lookup, which we have to do by using the accumulator. This costs us 4 cycles just to do the TXA / AND #$FE / TAX, but more importantly means that if we want to inspect the value of A in the syscall, we have to load it from the relevant shadow register, which costs another 5 cycles.  So there is an almost 200ns penalty due to this one little thing!  I am thinking I might change the behaviour of this instruction when the CPU is in hypervisor mode to make the X index be doubled when used in this instruction so that the traps can be much faster.

To save memory in the hypervisor (it is limited to 16KB) I also check that the upper bit of X is clear, so that we have only 64 vectors available in this system call.  This costs another 60 - 80 ns, plus the 6 cycles for the indirect jump (another 60ns).

Let's look at the one system call that I have implemented so far:

; Return OS and DOS version.
; A/X = OS Version major/minor
; Z/Y = DOS Version major/minor
lda #<os_version
sta hypervisor_x
lda #>os_version
sta hypervisor_a
lda #<dos_version
sta hypervisor_z
lda #>dos_version
sta hypervisor_y
jmp return_from_trap_with_success

This basically just sets A, X, Y and Z to contain version information.  Each LDA / STA takes 7 cycles, so this block of code takes 28 cycles = 560 ns.

Then we jump to return_from_trap_with_success which sets the carry flag (our convention for success) and exits from the hypervisor:

; Return from trap with C flag clear to indicate success
; set C flag for caller to indicate success
lda hypervisor_flags
ora #$01   ; C flag is bit 0
sta hypervisor_flags
; return from hypervisor
sta hypervisor_enterexit_trigger

This takes about 17 cycles, so another 340ns.

So all up we have 200ns over head for the trap, then about 260ns for the dispatch, 560ns for the useful work, and another 340ns to return.  So all up, our system call to get the OS and DOS version requires about 1460ns - about 1.5 microseconds, allowing for better than 600,000 requests per second for a call of this complexity.

Now to work on making more useful disk functions available via this interface.

Sunday, April 12, 2015

Making kickstart a little prettier

The hypervisor boot process is currently pretty ugly.  It is just plain text with largely debug information.

We do need a facility for this sort of thing, something that can act as the Blue Screen of Death or Guru Meditation for the C65GS.

However, like those, we can make it a little more distinctive.

The current model I am looking at is to have a 64x64 256-colour image in the top left of the display. To the right of this will be the message, either a version indicator on boot, or an indication of what has gone wrong, e.g., Page Fault, Illegal Action of Some Sort etc.  Below this will be the lines of output.

To do this, I need to enable 16-bit text mode, so that I can have some full-colour characters on the screen.  These characters use one byte per pixel instead of one bit per pixel, with the colour looked up from the 256-colour VIC-III palette.  So I need a colour cube in the palette.

I also need a way to load the 4K pixels into RAM.  Fortunately, recent refactoring of the ROM loading code to make it easier to load a CHARROM means I have a handy fs_readfileintomemory function in the hypervisor already that I can re-use to load the pixels.

Okay, so I can load pixels. But I need pixels to load.  So I wrote a little program, pngprepare that can convert a 64x64 PNG file to the right format.  Incidentally, it can also be used to create the default character ROM for the C65GS.

Along the way I also discovered a nasty little bug in the VIC-IV full-colour text mode, where the character generator accidentally skips one extra character when switching from full-colour back to normal VIC-II/III character mode on a raster line.  This is pretty easy to work around, but it is rather annoying. I'll have to work my way through the character generator state machine and work out where the problem lies and fix it at some point though.

So with that done, I could create a BOOTLOGO.G65 file and put it on the SD card.

Then, with a bit of fiddling around, I had managed to massage the kickstart ROM into displaying the logo at boot time, along with the relocated version message, with the regular boot messages below, like this:

Thanks go to Adam on IRC for providing me with the initial image.  The colour distortion artifacts are my fault, not his.  I'll have to look into that at some point.

For those who have been reading lately, you will notice that the C64 charset has been loaded in the process, and thus is being used by kickstart at this point.  If the screenshot were taken earlier in the boot process, or if CHARROM.G65 was missing from the SD card, then it would be showing our new default font, the modified VGA 8x8 one.  That font is going to be replaced by an improved default font fairly soon.

Friday, April 10, 2015

Character ROM Copyright (Part 2)

The other day I posted about the legal quagmire surrounding the C64 character ROM.

It was really interesting to see everyone's comments.  I also did a bit more digging, and now I believe that it is very unlikely that any copyright impediments exist to using most of the characters in the ROM, and very likely using the whole ROM would be fine, too.  

That said, the best defense in these situations to avoid the project getting held up, is to just avoid them.

So I have reworked the FPGA design a little so that the character "ROM" can be written at $FF7Exxx, and the Kickstart ROM so that it loads CHARROM.G65 at that location, and then loads the full 128KB ROM after that.

Until CHARROM.G65 is loaded, we still need some sort of character set of course, even if it is just so that it can tell you "CHARROM.G65 NOT LOADED (BROKEN OR MISSING)".

To solve this, I have used the VGA 8x8 font which is in the public domain, and then started removing the serifs and other things that (to me) make it a rather unpleasant font.

To make this easier, I made a little program that displays the bytes of the fonts as comments in the VHDL file for the ROM, and can then be re-run over the VHDL file to up date it. So for example, the first few characters look like:

type ram_t is array (0 to 4095) of std_logic_vector(7 downto 0);
signal ram : ram_t := (
  -- PIXELS:  *****  
  -- PIXELS: **   ** 
  -- PIXELS: ** **** 
  -- PIXELS: ** **** 
  -- PIXELS: ** **** 
  -- PIXELS: **      
  -- PIXELS:  ****   
  -- PIXELS:         
  x"7c", x"c6", x"de", x"de", x"de", x"c0", x"78", x"00",
  -- PIXELS:   **    
  -- PIXELS:  ****   
  -- PIXELS: **  **  
  -- PIXELS: **  **  
  -- PIXELS: ******  
  -- PIXELS: **  **  
  -- PIXELS: **  **  
  -- PIXELS:         
  x"30", x"78", x"cc", x"cc", x"fc", x"cc", x"cc", x"00",
  -- PIXELS:  *****  
  -- PIXELS:  **  ** 
  -- PIXELS:  **  ** 
  -- PIXELS:  *****  
  -- PIXELS:  **  ** 
  -- PIXELS:  **  ** 
  -- PIXELS:  *****  
  -- PIXELS:         
  x"7c", x"66", x"66", x"7c", x"66", x"66", x"7c", x"00",
  -- PIXELS:   ****  
  -- PIXELS:  **  ** 
  -- PIXELS: **      
  -- PIXELS: **      
  -- PIXELS: **      
  -- PIXELS:  **  ** 
  -- PIXELS:   ****  
  -- PIXELS:         
  x"3c", x"66", x"c0", x"c0", x"c0", x"66", x"3c", x"00",

You still have to update the hex values, but that's all.

So now we have a bitstream that is safe to distribute without legal problems.  I will work out how we are going to do that in the next few days.

Wednesday, April 8, 2015

Is the C64 font protected by copyright?

This is an interesting question that I am having to tackle with, because the C65GS bitstream needs to contain a default font for the hypervisor to show on power-up.  C64 and C65 ROMs are loaded from SD card, so distribution of the C65GS FPGA bitstream is not affected by any copyrights that may still exist on those.

First, copyright law in many countries makes it almost certain that the C64 font resulted in the creation of copyrights for someone at some point.  This is the easy starting point.

The next question is who owns the resulting copyrights today, following the multiple liquidations of Commodore-related entities.

Another question is, given that no one has been prosecuting those copyrights for many years now, what impact does this have on their enforceability.

Yet more questions arise because it is has been noticed that the C64 and Atari 8-bit computer fonts have identical lower-case characters, with the ones on the Atari pre-dating the C64 ones considerably. I.e., it is probable that the lower-case characters of the C64 font were created by someone other than Commodore.

Then arises some more interesting questions.

Commodore had tried on occasion to update the C64 font, but gave up when they realised that in various subtle ways that software depended on the exact contents of the C64's character ROM to function correctly.  Apparently they tried just moving the dot on the lower case I and found that it caused at least one program to fail.

Thus it may be argued that the C64 font is required, 100% verbatim, to create a computer that is compatible with the C64. This rather exotic legal argument is important for countries that have a copyright exemption for the purposes of interoperability.  Australia is one of those.

The relevant Australian legislation is section 47D of the Copyright Act, 1968:

Reproducing computer programs to make interoperable products             (1)  Subject to this Division, the copyright in a literary work that is a computer program is not infringed by the making of a reproduction or adaptation of the work if:
                     (a)  the reproduction or adaptation is made by, or on behalf of, the owner or licensee of the copy of the program (theoriginal program ) used for making the reproduction or adaptation; and
                     (b)  the reproduction or adaptation is made for the purpose of obtaining information necessary to enable the owner or licensee, or a person acting on behalf of the owner or licensee, to make independently another program (the new program ), or anarticle, to connect to and be used together with, or otherwise to interoperate with, the original program or any other program; and
                     (c)  the reproduction or adaptation is made only to the extent reasonably necessary to obtain the information referred to in paragraph (b); and
                     (d)  to the extent that the new program reproduces or adapts the original program, it does so only to the extent necessary to enable the new program to connect to and be used together with, or otherwise to interoperate with, the original program or the other program; and
                     (e)  the information referred to in paragraph (b) is not readily available to the owner or licensee from another source when the reproduction or adaptation is made.
             (2)  Subsection (1) does not apply to the making of a reproduction or adaptation of a computer program from an infringingcopy of the computer program.
Given that the character ROM has been acknowledged by Commodore employees in the past as being vital to providing interoperability with the C64, does this section apply?

First, if you own a C64 character ROM, it is clear that you could rely on sub-section 1(a) to make a copy, and include that in your own FPGA bitstream.  That would require you to run the synthesis process yourself.  Not very convenient.

Since that test fails, we must now satisfy 1(a) through 1(e) for a developer of the C65GS to be able to distribute a bitstream that includes the C64 character ROM.

1(a) requires that the ROM be sourced from a legal copy.  This can be done by reading the ROM from a real C64.

1(b) is satisfied if such a legally sourced copy is used to satisfy the needs of interoperability with any program. Since there exists programs that I would like to make the C65GS interoperable with, that are understood to rely on the exact contents of the C64 character ROM, this is satisfied.

1(c) is satisfied because we need the entire ROM to provide the interoperability.  Having only part of the ROM would not suffice.

1(d) is satisfied because such programs require the ROM to be available just as it was on the original C64 -- i.e., mapped to a particular memory location, and visible by default to the video controller in particular ways -- in order for the C65GS to interoperate with them.  Failing on any point of this would not suffice to provide interoperability with the C64's software portfolio.

1(e) requires that the information (the contents of the ROM) is not readily available from another source.  This is where it gets a bit interesting again.

The C64 & C65 ROMs can be freely downloaded from the internet in many places.  Thus, it can be argued that they are readily available. However, if they are readily available in this way, then it must be on the basis of those copies being legal copies.  If they are not legal copies, then they are not candidates to contradict 1(e).

This is a bit interesting, because depending on the answer to this unanswerable question, indicates from where the ROM should be sourced -- and existing online source, or from a real C64 ROM chip (or other undeniably legal source).

To add to the complexity is JiffyDOS, which is an adaption of the C64 ROM set, and for C64C's the JiffyDOS ROM contains the character ROM.  JiffyDOS is apparently still available for purchase. However, what is not clear is whether JiffyDOS includes the C64 character ROM on the basis of an interoperability exemption, or on the basis of a license from a past or present owner of the ROM.

The JiffyDOS situation has yet another facet: To provide interoperability with a JiffyDOS-enabled C64, it is just as necessary to have the complete C64 character ROM as for a non-JiffyDOS-enabled C64.

So what is the situation? Can I include the C64 character ROM or not, and if so, where should it be copied from?

I think the answer is clear as mud.

Then there is the question of whether any of the symbols defined in ASCII 1963, ASCII 1967 or Unicode which have only one or very few possible representations in an 8x8 grid are actually copyrightable at all.  Similarly, the publishing of PETSCII as a de facto standard probably makes the ordering of the characters unenforceable in terms of copyrights.

It is rather frustrating that the situation is so horribly vexed.

The simpler solution to avoid all possible problems in this regard, is that I should include a freely-distributable 8x8 font, which gets replaced in memory by the correct font when the C65GS loads a ROM from the SD card, so that the bitstream contains material that could be subject to an external copyright claim.

Fortunately, there are some free 8x8 fonts out there, like this one or this one.

So the only down side is that the kickstart display will not be in the C64 font.

Of course, if I get the OS a little further, it could boot up with a proportional font for the boot display, and side-step this sticky problem altogether... So I have a couple of solutions open to me.

Sunday, April 5, 2015

C65GS Plays Games at Revision 2015

After the DDR saga, finally some eye candy for you.

Deft managed to find some games to run on the C65GS at Revision 2015, as you can see in the following video:

There are some bugs with sprite colour that I have to track down, but otherwise it is really nice to see some games running on it.

Leaderboard Golf is very fast to redraw!

More as it happens.

Friday, April 3, 2015

About C65GS Memory

As promised, here is a very brief introduction to the memory available to a programmer on the C65GS, and some related notes:

First, there are three types of addresses you need to know about on the C65GS:

1. C64-style 16-bit CPU mapped addresses.  These are your good old friends like $0800, $D020 and $FFD2 and so on.  They are how you reference things with the CPU if you are using the machine like a normal 6502-based C64.

2. C65-style 20-bit addresses.  These are the addresses you can reference using the C65's MAP instruction.  You identify which 20-bit address, like $20000 where the C65 DOS ROM lives, you want to map somewhere in the 64KB address space, do some strange calculations, and voila, you have some piece of the 1MB address space mapped.

3. C65GS 28-bit addresses.  There aren't too many computers with 28-bit address busses, so I though that we should have one.  The first 1MB of these match up with the C65's 20-bit address space, so $0020000 is also the C65 DOS ROM.  Needless to say in the 256MB of address space, there are other interesting things.

Now, for a horribly simplified memory map with only the relevant parts left in:

$0000000 - Same as $00 on C64 (CPU port)
$0000001 - Same as $00 on C64 (CPU port)
$0000002 - $000FFFF - C64 ~64KB RAM. Zero wait states. VIC-IV can see this.
$0010000 - $001F7FF - 62KB. C65 2nd 64KB RAM. Zero wait states. VIC-IV can see this.
$001F800 - $001FFFF - First two 2KB of colour RAM. 1 wait state. VIC-IV can see this only for colour information, so don't try putting character sets or bitmap data there.
$0020000 - $003FFFF - 128KB C65 "ROM". Zero wait states. VIC-IV can't see this. Really a RAM! You can replace the contents.  Think of it as like fastram on an Amiga.
$8000000 - $FEFFFFF - 127MB of DDR RAM, well it will be when I get the DDR controller working. VIC-IV can't see this. yet.
$FF80000 - $FF8FFFF - 64KB colour RAM. 1 wait state. VIC-IV can see this only for colour information.

So all up, you have:

126KB "chipram" that the VIC-IV can use for bitmaps, sprites and so on.
128KB "fastram" which will have the ROM in it when you start, and after you replace the ROM, you won't have the ROM any more.
64KB colour RAM.  Works great, except when you try to run code from it for reasons I have yet to investigate.
and later, 128MB DDR RAM, which has horrible latency, made up for only slightly by a little cache. currently very buggy, as described in previous posts.

So you thus have about 256KB RAM for code, and 64KB of colour RAM which can also double for storing stuff in a more general sense, including code, just don't try to run code from there right now.  In time I will fix this.

Now, for the truly dedicated, you can delaminate the 128KB chipram from its shadow RAM.  The shadow RAM is what the CPU really reads from.  The chipram then becomes write-only to the CPU, i.e., the VIC-IV still reads it, but the CPU can't tell what was put there anymore.  In many cases, this isn't a real problem.  You can then map the 128KB shadow RAM somewhere else in the first 8MB of address space (it is configurable), and have an extra 128KB of fast RAM.  Thus you can have 126KB chipram, 256 KB fastram and 64KB colour RAM.  I might later disable this option, since it makes it almost impossible to freeze a program that uses it, since you would have to use sprite collision tricks to read the data back out of the chipram, which would take many frames, or I'd have add some sort of horrible reflection process from the VIC-IV, all while trying to not overstrain the already overstrained memory bus on the VIC-IV side of things.

I completely concede that this is a very bizarre arrangement. It is however what you get when providing backward compatibility with a machine that was never really finished, and which in turn provides backwards compatibility with a machine that was almost ten years old at the time.  I'd also say that it adds a certain degree of charm.

Now, for those wanting an easy way to access any byte in the 28-bit address space, you can use the new-and-improved Z-indirect addressing mode.  Ordinarily it works like the indirect-Y addressing mode you know:

LDA ($nn),Y

The 4502 has the Z version:

LDA ($nn),Z

This also behaves as you would expect, de-referencing the 16-bit pointer at $nn and $nn+1

However, if you preceed this instruction with a NOP, then it dereferences a 32-bit pointer at $nn through to $nn+3, allowing easy access to single bytes anywhere in memory. So the following routine would read the colour RAM byte for the first column of the 2nd row of the C64-mode screen (address $FF80028).

pointer: .byte $28,$00,$F8,$0F

LDZ #$00
LDA (pointer),Z

This instruction takes just 2 more cycles than the normal 16-bit indirect version, making for very fast access to arbitrary memory.  If the C65 BASIC were re-written using this, it would likely be quite a lot faster (as it stands it is about 3x slower than C64 BASIC).

 Note that this mode allows access to 4GB of address space, allowing for fun future expansion.  Consequentially, you should always make sure the upper nybl is $0, so that your programs will work on the future C65GS+.

To make running programs bigger than 64KB easier, I am also part-way through implementing 32-bit addressed JMP, JSR and RTS instructions.  These will automatically map the correct 16KB of RAM to $4000-$7FFF and jump to it.  These will be selected by preceeding JMP, JSR or RTS with two SED instructions, e.g., would jump to the routine located at addr, where addr is a 32-bit address:

JMP <<addr
.word >>addr

This avoids the need to do crazy bank switching calculations every time you want to call a little routine, or return from one.

Indeed both these enhancements were planned in response to how horrible and inefficient it was to endlessly use DMA lists and the MAP instruction.  Now we just need for me to finish the far-jump stuff, and then to make some tools that can actually use them.  Of course, these two features make it much easier to contemplate targeting a C compiler at the C65GS, because the compiler can ignore all banking apart from the requirement that each function be less than 16KB long, or be broken into 16KB long pieces.

C65GS at Revision 2015, and running on DDR

I still haven't fixed the DDR RAM controller.  However, I have found a work around to get the C65GS working on the DDR version of the board.  I am very relieved, because the DDR controller is still doing weird things, and I have yet to figure it out, which as readers of this blog will know has me very frustrated.

But, as I say, it is now possible to run the DDR board just fine, booting to C65 BASIC and everything.  This means I should be able to live with out a "Don't ask me about the DDR controller" T-Shirt.

The solution is that I am using almost every last spare bit of BRAM in the FPGA to have a 128KB "ROM" in the design, instead of storing the C65 ROM in DDR RAM. This has three main effects for now:

1. The "ROM" is now zero wait state, and as a result BASIC and the DOS routines fly. They are about 6x to 8x faster than they were previously.  The C65's horribly slow DOS routines can load somewhere around 50 - 100 blocks per second. You can also do FORI=1TO25000:NEXTI in about 1 second in C64 mode, i.e., around 50x faster than on a stock C64. In fact, BASIC is now so fast that you can use POKE to change the border colour about every 5 or 6 C64 raster lines.

2. I was planning to use the BRAM for enhanced sprites.  Clearly that can't happen now. I am thinking about how to feed sprites direct from DDR RAM, which would be fun.

3. The "ROM" is of course really RAM.  The Hypervisor can make it read-only for compatibility with any of the very few C65 programs that exist, if any happen to try writing over the ROM address space.  But for C65GS specific programs you can of course just use it as an extra 128KB of RAM.  I'll do another post about this soon, describing all of the RAM that is usefully available on the C65GS for programmers.

Play with a C65GS a Revision 2015!

Now, for anyone who happens to be in Saarbr├╝cken for Revision 2015, a C65GS prototype will be their with deft.  He will be on the #revision IRC channel.  He has his FPGA board mounted in a C64 case with real C64 keyboard.  It is of course something that has been quickly pulled together, and a real C65GS would have a special PCB so that all the C64/C65 ports are available.  But it is still really nice to be able to interact with a C65GS with a real "body".  Hopefully deft will have Turbo Assembler and some other goodies on it (including a lame little demo I wrote for the C65 back in 1994) that you can try out if you would like.