Tuesday, January 23, 2018

Freezing and Unfreezing - part 1

Freezer cartridges are a common tool on the C64, to allow for saving, debugging and a bunch of other useful tasks.  However, getting the things to work on the MEGA65 is not easy, and even if we could get them working, they wouldn't be aware of the extra memory and special functionality of the MEGA65. So for these and other reasons, the MEGA65 will have a built-in freeze function, which will be accessed by pressing the RESTORE key for 0.5 - 2 seconsds, i.e., longer than a quick tap as would be used for RUN/STOP-RESTORE, but shorter than the 2 - 4 seconds which is used to trigger a reset (we might remove the reset function, once we have a functional reset button on the MEGA65, since it will become a bit redundant).

(Talking about the difficulties of supporting real freezer/fast load cartridges, the Epyx Fastload is a good example of the kind of strange problems that can come up.  As I discovered here, there is a little Resistor-Capacitor circuit on that cartridge, that causes it to disable itself, if it is not accessed for ~400 clock cycles.  Kickstart takes longer than this to check the SD card and load the user's preferences etc, and so the cartridge is never visible.  The Action Replay doesn't do that, but gets upset if you try to read from $DExx, and I have yet to actually get it to present its ROM. Apparently the Action Replay also uses some illegal opcodes. All in all, I have discovered them to be rather a pain to work with.  All the more reason to have a native freeze function on the MEGA65.)

One of the advantages we have on the MEGA65 is that we have the 16KB Hypervisor memory, plus Hypervisor traps, so we can completely suspend a running program, without having to overwrite even a single byte of the stack.  Thus, the resulting freeze function should be quite robust.

A challenge, however, is the saving and restoring of the SD card access registers and sector buffer.  The access registers are only a few bytes, so we can copy those to a scratch area in the Hypervisor memory.  Then we can write whatever is in the sector buffer to a reserved area on the SD card, so that it is safely recorded.  At that point we can safely make use of the SD card direct access facilities, without corrupting the state of the program being frozen, which should probably start with the stashed SD card access registers.

To un-freeze, we can finish off by loading the stashed sector buffer contents, and reading those stashed access registers back into the actual SD card access registers.  At that point, even a program that was making direct SD card access should be safely restored.

There are a couple further complications.

First, the F011 floppy controller sector buffer and the SD card direct access sector buffer cannot be direct memory mapped at the same time, so copying the F011 sector buffer to the SD card sector buffer is rather painful.  In fact, it is painful enough, that I have already written a fix, that makes the buffers visible at $FFD6E00-FFF (SD card direct access) and $FFD6C00-DFF (F011 floppy sector buffer).  A pleasant side-effect of this is that it also gives us 3KB of scratch space at $FFD6000-$FFD6BFF, which we can use in the freeze process, or indeed elsewhere in the Hypervisor.

Second, if a program was accessing the SD card directly, the things it was accessing may not be there any longer, which may result in the reading of sensitive data, or the corruption of who knows what.  This is actually a very strong argument for not allowing programs to have direct SD card access, except in the rarest of situations, and even then, having the user confirm granting of the permission to do this before the Hypervisor grants direct SD card access.  In fact, all storage access has similar problems. However, at least for F011 disk images and the real floppy drive, these can be reasonably managed by remembering the disk image that needs to be re-mounted on unfreeze, or similarly, if the F011 floppy controller should instead be connected to the real 3.5" floppy drive.

Apart from the above, it should just be a reasonably straight-forward process of writing blocks of memory out to the SD card. It would be really nice to have a DMA to/from SD card option to rather automate this, however, we don't (yet) have such a beast.

In terms of the various blocks of memory, some are not entirely trivial, such as the VIC-IV palette blocks, which we have to select which is visible in the memory map.  To solve this, and the fact that all of the state that we need to save is not contiguously arranged in the 28-bit address space, there is a list of regions that need to be saved/restored, together with their lengths, and a one-byte option that is used to select a little routine that is run before access, so that the region is visible and otherwise ready for copying.

As I have started to write the freeze routing, it hasn't surprised me that I have managed to mess up the SD card file system by writing sectors in the wrong place -- notably over the Master Boot Record, i.e., the partition table.  While I can use the native MEGA65 FDISK and FORMAT utility to put it back, it does wipe the FAT32 file system in the process, and thus the C65  ROM is removed, as well as the MEGA65 system partition.  This is important, because the freeze function can only work if there is a MEGA65 system partition, as otherwise it doesn't know where it can safely save the frozen program, and without the C65 ROM, the machine doesn't ever leave the Hypervisor, and so the freeze trap to the Hypervisor cannot happen.

My solution to this is to allow the native FDISK/FORMAT utility to realise when a ROM has been loaded, and to save that to the newly formatted file system at the end of the FORMAT function.  So now, if I mess up the SD card, I can just reset while holding the ALT key to get the Hypervisor Utility menu, select FDISK/FORMAT, reformat the SD card, and be on my way again, ready to mess it up again in no time at all -- and without having to pull the SD card out, which is a bit annoying to do with the prototype PCB.

The problem I am facing now is some reliability problems when wrting to the SD card: Basically the SD card controller locks up under certain conditions, and has to be reset to fix it.  These problems have become annoying enough, that I want to solve them once and for all.  This ties in with the lack of SDHC support, which is related to the same module. So my very next step is to try to switch out the old SD controller for the newer version of the open-source one that has been released in the meantime. Hopefully that will fix both SDHC support, as well as the lock-ups, and after that it should hopefully be relatively smooth sailing to having a working freeze function.

Saturday, January 20, 2018

Improving the DMAgic controller interface

The C65 has a DMA controller, the "DMAgic". Well, actually, each C65 has either the A or B revision of the F018 DMAgic IC.  This means we already have some magic to support different revisions of the C65 ROM that use assume either one or the other revision of the F018.

Then add to that that the MEGA65 already has extensions to DMAgic to support memory access outside of the 1st mega-byte of memory. Until now, those were implemented as memory mapped registers, as that was the most convenient way to implement them.  However, as those extensions have grown, having more and more memory mapped registers that affect the way that a DMA job is interpreted was getting a bit hairy, as it meant that each caller of a DMAgic job needed to be aware of those registers, or alternatively, the previous caller had to be well behaved and clear all the extra registers out. Neither seemed a really satisfactory option.

So instead, I have removed all those extra registers, leaving only two additional registers, that are required for issuing DMA jobs, plus the register that allows selection between F018A and F018B mode for normal DMA jobs:

$D703 - Select F018A/B mode.
$D704 - Sets the upper 8 bits of the 28-bit address where the DMA list is to be loaded from, i.e., which mega-byte of memory the DMA list lives in.
$D705 - Like $D700, it sets the bottom 8 bits of the DMA list address, and triggers the start of a DMA job, but unlike $D700, it triggers an enhanced DMA job.

All the extra options, like which MB of RAM is being copied from and to, and DMA stepping rates are now specified in a set of variable-length options prefixed to the front of the DMA list.

For example, the DMA job to clear the screen in the hypervisor now looks like:

        ; Set bottom 22 bits of DMA list address as for C65
        ; (8MB address range)
        ;
        lda #$ff
        sta $d702

        ; Kickstart ROM is at $FFFE000 - $FFFFFFF, so
        ; we need to tell DMAgic that DMA list is in $FFxxxxx.
        ; this has to be done AFTER writing to $d702, as $d702
        ; clears bits 27 - 22 of the DMA list address to help with
        ; compatibility.
        ;
        lda #$ff
        sta $d704

        lda #>erasescreendmalist
        sta $d701

        ; set bottom 8 bits of address and trigger DMA.
        ;
        lda #<erasescreendmalist
        sta $d705


erasescreendmalist:
        ; Clear screen RAM
        ;
        ; MEGA65 enhanced DMA options
        .byte $0A      ; Request format is F018A
        .byte $00 ; end of options marker
       
; F018A DMA list
        .byte $04   ; COPY + chained request
        .word 1996  ; 40x25x2-4 = 1996
        .word $0400 ; copy from start of screen at $0400
        .byte $00   ; source bank 00
        .word $0404 ; ... to screen at $0402
        .byte $00   ; screen is in bank $00
        .word $0000 ; modulo (unused)



The bold lines are the ones that are different to the old method of calling such a job: We write to $D705 instead of $D700 to initiate the DMA job, and then the DMA job now begins, in this case, with two option bytes: The first tells the DMAgic that the DMA list will be in F018A format, after the end of option marker ($00).  Now there is no confusion for a particular list as to whether it expects an F018A or B, and any further extensions that we add can be safely ignored, because they are all disabled by default for each job.  Also, the job setup code is shorter, because it doesn't need to set or clear any DMA options, and there is no longer any need for DMA cleanup code, to put options back to how a naive caller might expect them.

The current list of supported options are:

$00 = End of options
$06 = Use $86 $xx transparency value (don't write source bytes to destination, if byte value matches $xx)
$07 = Disable $86 $xx transparency value.
$0A = Use F018A list format
$0B = Use F018B list format
$80 $xx = Set MB of source address
$81 $xx = Set MB of destination address
$82 $xx = Set source skip rate (/256ths of bytes)
$83 $xx = Set source skip rate (whole bytes)
$84 $xx = Set destination skip rate (/256ths of bytes)
$85 $xx = Set destination skip rate (whole bytes)
$86 $xx = Don't write to destination if byte value = $xx, and option $06 enabled


$00 and $0A we have already met.
$0B is the opposite of $0A, and tells the DMAgic to expect an F018B format DMA list.
$06 and $07 allow enabling/disabling of a "transparent value", that is a value that is not written during a DMA copy. For example, if you were copying an image with a transparent colour, you can now tell the DMAgic what colour that is, and it will copy all the bytes that don't have that value. The value is set via the $86 option
Then $80 and $81 allow setting of the upper 8 bits of the 28-bit source and destination addresses, i.e., which mega-byte of memory to copy to/from.
$82 - $85 allow setting the stepping rate of the DMA. This allows memory copies that smear out or squish up the source, say, for example, if you wanted to scale a texture when drawing it.

Anyway, the net result is a nicely extensible architecture for the DMAgic in the MEGA65, and one that results in increased compatibility with the C65 when faced with lazy programmers, as well as saving bytes  for the typical case where most of the options are not required.  It also makes it easier to freeze and resume a MEGA65 program, because there are now fewer registers to save and restore.

Wednesday, January 17, 2018

Repairing an Amiga Mouse, and then using it on the MEGA65

As anticipated earlier, I have added transparent support for Amiga mouses to the MEGA65, so that people don't have to find a 1351, and can use the existing USB to Amiga mouse adapters, to allow use of newer mouses.

To get that working, I put out an appeal on Facebook for anyone who had an Amiga mouse to spare (not necessarily 100% working, just enough so that I could test with), and a kind person from Tasmania sent me one with a problem with the X axis.  Nonetheless, it was enough for me to do what I needed to do, and as a result the MEGA65 works very nicely indeed when an Amiga mouse is plugged in, and can automatically detect an Amiga mouse, 1351 mouse, paddles and joystick -- in real time -- and switch modes on the joystick port as required, so that no user fiddling is required.

This all works by looking at what the digital and analog lines are doing on each joystick port, to see whether the behaviour matches one or the other type of device.  It all turned out to be relatively simple in the end, requiring only a modest amount of fiddling to get it stable.

However, as I mentioned, the Amiga mouse that was donated was a bit sick, and I would really like it fully working, as I can use it with the MEGA65 r1 PCB without any funny adapters to route the POT lines, as I currently do for the 1351 mouse (this will of course be fixed on the r2 PCB).  So I started poking around in the mouse today to find out what is wrong, exactly.

It didn't take long with the multi-meter to work out that the voltage output of one of the infra-red light sensors on the mouse was a bit lower than it should be. The mouse shows signs of having been physically repaired around that part of the assembly, so it isn't too surprising, really.  A bit of looking at Amiga mouse schematics I could see that the mouse sensors are fed through a LM339 type comparator to produce a nice 5V square-wave output from the fuzzy light sensor readings. 

Each light sensor is paired with a reference voltage that is used to determine whether to output 5V or 0V, depending whether the sensor voltage exceeds the reference voltage or not.  Since the problem was the sensor voltage was a bit low, I figured the easy solution was to put a bit of extra bias towards 0V on the reference voltage, so that the sensor voltage would again be correct.  It looked like there was about 1.5K resistance between the reference voltage and ground, so I thought I would start by adding a 2K Ohm resistor between the reference voltage and ground, like this:

Only I realised when testing that I had put the bias on the sensor voltage instead, i.e., making the problem worse, rather than better.  So then I though, well, why don't I just put the bias towards 5V on the sensor voltage, like this:


Only it turns out that that doesn't work. I didn't bother exploring why.  So, back to Plan A, only this time with the correct pins:


The astute observer will notice that the above picture shows a 1K not 2K resistor. This was because in testing I found a 1K resistor provided the correct bias correction.

Then it was time to test on the MEGA65 using the Mouse Test 2 program, which as we can see below I was able to use to move the 1351 mouse indicator, with the others staying put -- even though I was using an Amiga mouse:

 However, that is all a bit dull to look at, so I made a short video showing it in action, with me turning on and off the 1351 emulation mode on and off in real-time:


So now the mouse all works very nicely, and I can work on software that uses the mouse.

Tuesday, January 16, 2018

Rebuilding SD card access

In my enthusiasm reworking the SD card and F011 code, it turns out I got a bit overzealous, and stripped out the code that provides the memory-mapped sector buffer for SD card access. Clearly not optimal. So I had better fix that up.

Before the rework, there were three separate sector buffers:
1. One for the CPU to read as a memory mapped sector buffer;
2. One for the CPU to write to, and the SD controller to read from; and
3. One for the F011 emulation.

At the moment, only the third one is still there.

This means that SD card access is currently only working for mounted disk images -- but there is no way to mount a disk image, as the Hypervisor can't even read the partition table of the SD card.

We could go back to having three buffers, but that seems a waste of precious BRAMs.  The question is whether we can do everything we need with just the one buffer, or whether we need two.  The answer to that is that we do need a second buffer if we want to have memory mapped access to the sector buffer.  That second buffer would be connected to the MEGA65's FastIO bus for reading, so that the CPU can read from it when it wants to. This buffer would be written to when the SD card controller reads data from the SD card.  To allow the CPU to write to the buffer, we would trap writes to the sector buffer location, and pass them to the SD controller to actually perform the write operation.

The complication is that when the SD card controller is asked to write a sector, it has no way to read this buffer, as only the CPU can cause it to be read.  We can solve this by making the F011 sector buffer 1K instead of 512 bytes (since BRAMs are 4K, this doesn't cost us anything extra), and whenever the CPU writes to the SD card sector buffer, we also write to the second 512 bytes of the F011 buffer.  When the CPU is asked to write a sector to the SD card, it uses this second copy of the data, which it does have read access to, in order to perform the write.

Another wrinkle is virtualised F011 mode, where the hypervisor gets a trap whenever a sector read or write is attempted.  This is used with monitor_load to allow feeding a disk image via the serial monitor interface, instead of using the SD card (handy for some software development tasks, and if your SD card interface is broken, as it is on Falk's r1 PCB).  So I need to preserve that.

Probably the best solution here is to have the two buffers, each with 2 x 512 bytes, with the lower half the F011 buffer, and the upper half the SD card buffer, and have a register bit that allows selecting which one is being accessed.

After a bit of fiddling about, this is all done now and working nicely, and the saved BRAM is also a nice result.

Switching back and forth between the SD card and floppy drive works, but it seems that it is possible for the track numbers to get out of sync, so it is necessary to seek to track 0 after switching from SD card, to make sure that everything matches up.   Ideally we would have some program to allow switching back and forth in a really easy way. Initially modifying the disk chooser menu program is probably the right way to do this, so that there is an option to select "internal drive" as one of the disk choices.

After that, the next step on the floppy drive now is to get writing working, including ideally formatting disks which requires unbuffered writing.

Improving my bench MEGA65 prototype hardware

After the last several posts focussing on VHDL implementation of various interfaces and things, here is a much shorter read with more pictures, following the improvement of my bench-test MEGA65 revision 1 PCB. 

Here it is before improvement:

 

The main problems I wanted to solve were, in no particular order:

1. The floppy drive had to live externally and loose, and be powered by an adapter on the joystick port. I want it internal, and firmly held in place.

2. No keyboard. Is further explanation really necessary?

3. The headphone jack and FPGA programming port are very close, which I can't change, but the hole in the case for them was too small to allow both to be plugged in at the same time.  Very annoying when the kids want to play games, and I have to keep unplugging the sound to plug in the data interface and vice-versa.

4. The hole for the joystick ports was very tight, and needed a little enlarging.

5. The hole for the HDMI port was also too small.

6. The cartridge port hole was also too small.

7. I wanted to install pull-ups for the IEC bus, so that it would just work, without having to plug anything strange in (and so that it wouldn't lock the C65  ROM during boot-up).
 
8. I wanted all the improvements to result in a reasonably physically robust arrangement, that wouldn't be at risk of falling down and damaging itself or shorting itself out when used, whether by myself, or by the kids.

The clear plastic case is big enough for everything, so I figured I would just enlarge holes, and in the case of the floppy drive, make some new holes.  It would be nice if I had the correct tools for cutting holes in plastic. Instead, I have a power drill and some 3mm wood drill bits.  Making the hole for t he floppy drive consisted of drilling perforations around the outline, and then joining them up using the drill as a kind of power saw.  Not ideal at all, but it worked. Here it is with most of the holes drilled:


Then with all the holes joined up, and the piece knocked out, but yet to be filed into a nice rectangle. Sorry the shot is a bit blurry. Cameras don't like photographing nearly invisible objects very much.


 We'll come back to that a bit later.

Then it was time to think about how to attach me genuine C65 keyboard (without printet key caps) to the top of the box, such that it couldn't fall off, fall in, or snag the fragile ribbon cable that connects it to the mother board.

I had a piece of acrylic the right size to sit on top of the box, so I traced out around where I wanted the keyboard to be mounted on it:


Then fitted a couple of scrap plastic blocks to the underside, so that when it rests in the top of the plastic case, it can't move in any direction.  Here is the arrangement from underneath:


With those in place, and a couple of extra holes to fix the keyboard to the top (using the existing two holes in the keyboard, which presumably were designed with a similar purpose in mind), I had a keyboard sitting nicely and securely on top of the box:


Here you can see it from the side, with the green plastic thingoes you put in walls before you put screws in.  If only I could remember their name. Anyway, even nameless they work just fine for this job:


Then it was time to fix a few issues with the PCB, adding a floppy power connector, and pull-ups for the IEC serial bus, so that that can work without any special cable attached. By permanently fitting the pull-ups, it means I can't use the IEC port to drive the POT lines on the joystick, as I did while implementing 1351 mouse support, however as that is done, and since I can use an Amiga mouse transparently (a blog post on that coming up soon), I figured this was no great loss. The expansion/cartridge port was the easiest place to find power:


So after doing that, and having finished making the hole for the floppy drive (which is held in place with four screws on the underside, like in an old PC), I had all the electronics inside. By this time I had also already enlarged the various holes in the case.


Connecting the ribbon cable for the keyboard was fairly straightforward:


Here it is all together. They keyboard ribbon pokes up a bit, which I don't really like, as it is still at some risk of damage like that. But it is not snagging on anything or under any tension, so it will have to do for now:
 And the view from the lest side with the power and joystick ports etc:


 And set up in my office, ready for use:
So while it is clearly a bench prototype, it is now all assembled and functional, without a mess of cables and having to plug and unplug things all the time.

Monday, January 15, 2018

Bringing the internal 3.5" floppy drive to life - part 2

Again, a longer post documenting the process of making the real 3.5" floppy drive work.  What I thought would be hard, the MFM decoding sector reading from the real disk, turned out to be quite easy -- taking only a day or so.  But what I thought would be easy, plumbing this into my existing F011 implementation, turned out to be a long string of strange bugs to track down, and took a week or so to work through.  Anyway, here it is.

So, yesterday, I managed to successfully decode MFM pulses from the 3.5" floppy drive, but only in a little C test program on my laptop.  Today, I want to move that into VHDL, make sure I can decode MFM data there, and then correctly parse the sector markers etc, to find a requested sector, and push the sector bytes out, and do all the other little bits and pieces, like CRC checking, required to plumb it into my F011 floppy controller implementation in the MEGA65. The result should be that the MEGA65 can read data from a real 1581 floppy disk in the 3.5" floppy drive.

Yesterday I had broken down the MFM parsing into a set of clearly defined steps: measure pulse gaps, quantise pulse gaps, turn pulse gaps into bits/sync markers, turn bits into bytes.  Turning those things into VHDL was rather easy, give or take the odd spot of debugging.  Similarly, implementing a parser that looks for sync marks and captures the track/sector numbers and compares them with requested track/sector, and works out whether the following sector data should be read out was also fairly easy.  Then came the CRC checks.

The 1581 uses a CRC check that is similar to, but not exactly like the CCITT CRC16 algorithm.  The C65 specifications manual provides an explanation and example routine:


Generating the CRC

     The  CRC  is a sixteen bit value that must be generated serially,
one  bit  at  a  time.  Think of it as a 16 bit shift register that is
broken in two places. To CRC a byte of data, you must do the following
eight  times,  (once  for each bit) beginning with the MSB or bit 7 of
the input byte.

     1. Take the exclusive OR of the MSB of the input byte and CRC
        bit 15. Call this INBIT.
     2. Shift the entire 16 bit CRC left (toward MSB) 1 bit position,
        shifting a 0 into CRC bit 0.
     3. If INBIT is a 1, toggle CRC bits 0, 5, and 12.

     To  Generate a CRC value for a header,  or for a data field,  you
must  first  initialize the CRC to all 1's (FFFF hex).  Be sure to CRC
all bytes of the header or data field, beginning with the first of the
three  A1  marks,  and ending with the before the two CRC bytes.  Then
output  the  most  significant CRC byte (bits 8-15) and then the least
significant CRC byte  (bits 7-0).  You may also CRC the two CRC bytes.
If you do, the final CRC value should be 0.

     Shown below is an example of code required to CRC bytes of data.

;
; CRC a byte. Assuming byte to CRC in accumulator and cumulative
;             CRC value in CRC (lsb) and CRC+1 (msb).

        CRCBYTE LDX  #8          ; CRC eight bits
                STA  TEMP
        CRCLOOP ASL  TEMP        ; shift bit into carry
                JSR  CRCBIT      ; CRC it
                DEX
                BNE  CRCLOOP
                RTS

;
; CRC a bit. Assuming bit to CRC in carry, and cumulative CRC
;            value in CRC (lsb) and CRC+1 (msb).

       CRCBIT   ROR
                EOR CRC+1       ; MSB contains INBIT
                PHP
                ASL CRC
                ROL CRC+1       ; shift CRC word
                PLP
                BPL RTS
                LDA CRC         ; toggle bits 0, 5, and 12 if INBIT is 1.
                EOR #$21
                STA CRC
                LDA CRC+1
                EOR #$10
                STA CRC+1
       RTS      RTS

It is super helpful to have an example implementation, as well as explanation. Nonetheless, it took me about 2 hours to actually get the CRC calculating correctly, as CRC routines are notorious to get exactly right, as they require considerable attention to detail and very sound comprehension of the algorithm.

Another interesting problem was how to test this in simulation.  I already have a debug register on the MEGA65 that allows me to read the FDC data read line. However, as some signals can be as narrow as 120ns, this requires sampling at at least 5MHz.  Using DMA I could sample it at around 20MHz, however, this meant being able to capture only a part of a sector.  And even at 50MHz, the CPU is fractionally too slow, unless I completely unrolled the data capture loop, in which case I would still only be able to capture a relatively few samples, as 64KB of data capture loop would only be able to cover about 10K samples.  What I realised after, was that I should just write a C program that MFM encodes a set of sectors, and feed that in.  If my existing C program for decoding MFM data can read it, then it should make fine test input data for the VHDL. Writing such a program would also be the first step towards being able to write data to floppies, so it is work that needs to happen, anyway.  If I have problems with the current work, I will certainly follow this path.

Then it was a case of debugging some out-by-one errors and the like on sector read lengths, and making sure the right flags are asserted at the right times.  Finally, the whole new MFM assembly had to be plumbed into the existing VHDL, and some new registers added to the SDcard (and now, floppy drive) controller, so that it is possible to select the SD card or the floppy drive as the source for C65 F011 floppy controller accesses.

At this stage, all the machinery is now in place for this to work, assuming that there are no bugs in my VHDL.  Because synthesis takes a long time, I have added some debug registers that will allow me to interrogate more closely what the floppy drive and MFM decoder are doing.  While it would be nice to think that I won't need them, and the test-benching of the MFM decoder with real captured data helps reduce the risks, I suspect I will be getting familiar with them.  We will see how that pans out in a few hours.

So, synthesis has finished, and the selection register to switch to using the real floppy drive seems to work, and the read request line gets asserted to the MFM decoder, but there is no sign of bytes being read from the drive.  So, I will do what I should have done before synthesising the first time, and debug registers to see right down to the lowest level of the MFM decoder.

Okay, next resynthesis, and I can see that it is finding gaps, and decoding bytes, but it never finds a sync byte. This is likely due to the register I setup for setting the number of cycles per MFM bit to be limited to too low a value, thus preventing the gaps from being properly detected.  I rather confused myself here for a while, because I couldn't remember definitively the data rate for a 720K drive. Was it 250K, 500K or 1Mbit?  It took a long time to actually find a list. It is 250K, so 4 usec per bit. At 50MHz, this means we should be using 200 as the cycles per interval. So I should be seeing gaps of around 200, 300 and 400 cycles.

So, in terms of gaps, I am seeing plenty of them around 200 cycles, corresponding to runs of identical bits. This makes sense.  I am also seeing a reasonable number around 400, which also makes sense, as well as some around 300. However, I am also seeing a lot of seemingly randomly distributed gaps, anywhere upto at least 1000 cycles, which is more than 2x the maximum we should see.  There is something clearly wrong here.  However, looking at the source code for the gap collecting code, it is outrageously simple, and of course, it works fine in simulation.  What is particularly curious is that the failure mode appears to be the missing of pulses, not seeing spurious pulses, for example, if there was high frequency glitching on the floppy read-data line.  Yet to be missing pulses is very strange, since the debug register that allows direct reading of the floppy read-data line shows no sign of these random-length pulse intervals. This makes me a little worried about intermittent glitches I have seen, where a couple of registers in particular are read with the wrong values.  As much as I would like to blame the synthesis tools, it is quite possible I have a subtle timing bug that might just be tickling things here.  My immediate next step is to resynthesise with the mfm_gaps module outputting a history of the values it has seen on the f_rdata line, so that I can see if it is indeed missing pulses.

And things get stranger.  While waiting for synthesis of the above debug register, I tried again, this time simply having the M65 try to boot the C65 ROM using the real floppy drive.  In this situation, the C65 ROM will try to load an auto-start file from the floppy.  And all of a sudden, I am seeing better distribution of gaps. I think that what is happening here is the ROM steps the head, which results in it being definitively on a track, where as if the drive was powered off, the head might have been able to move off track a little (or outside tracks 0 - 79).  I'm still not really sure, but it seems to be something mechanical.

To avoid the long resynthesis times,  I have imported my MFM controller into my joystick controlled test bench.  That way I can iterate after only a few minutes. The down side is I have very limited input and output, and can't capture and stream signal values.  This is what I used to work out the stepping the head helps.  However, even after stepping, while I see quantised gaps being detected, with essentially no invalid gap lengths, it is still not detecting any sync bytes.
So, I added yet another debug option (and had another 8,000 second synthesis) to log the quantised pulse gaps, and wrote a little program to interpret a capture of those signals.  That is, I tested the function of the pulse detection and gap quantisation.  And the results are good. Here is what I saw from my captured trace after decoding:

 $42 $42 $42 $42 $42 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72
 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72 $72
 $72 $72 $72 $70 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $05
Sync $A1
Sync $A1
Sync $A1
 $fe $00 $01 $01 $02 $fd $5f $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00
Sync $A1
Sync $A1
Sync $A1
 $fb $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
...

$00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00 $00 $00 $da $6e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
 $00 $00
Sync $A1
Sync $A1
Sync $A1
 $fe $00 $01 $02 $02 $a8 $0c $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e


First, we can see that Sync $A1 bytes are correctly detected.  Second, we can see that a complete sector is observed, with the $FE header ID byte and track, side, sector and sector size bytes. Then there is a $FB ID byte following the next burst of Sync bytes indicating the sector, and a completely empty sector, followed by two CRC bytes s($da and $6e), and the $4e and $00 gap bytes, before the header of another sector is visible. I did notice an error in checking this: I was pulling the header bytes out as Track, Sector, then Side, not Track, Side and then Sector. That's easy enough to fix.

So now we know that the intervals are being correctly quantised, and the stream of intervals that should be detected as a Sync mark are being generated.  Attention thus turns to the mfm_gaps_to_bits module, where the sync bytes are generated. Assuming that the problem must be in the collection and assembly of the strings of gaps into bytes, I am modifying my VHDL test rig so that I can sample the most recent four gaps at any point in time, and see that sample held.  I am wondering if the gap_valid signal is being asserted for more than one clock cycle, for example, which is causing a gap to be registered more than once.  It is also possible that gaps are being missed, but that will be harder to determine.

The logged sets of recent gaps seem to be okay, and I even managed to see a sync mark. So I added a counter for the number of times the sync marks are found. I also added a safety catch in case the strobes to indicate a new pulse has been seen were sticking around for more than one clock cycle.  Whether that was the issue or not, I am seeing of the order of 256 sync marks per second.  Given that there should be 5 rotations per second, and 10 physical sectors each with 2 x 3 sync marks per second, this means I should be seeing 5 x 10 x 2 x 3 = 300 sync marks per second.  This basically matches my eye-ball investigation of ~256 per second, watching the highest order bit of an 8-bit counter blinking about twice per second.  In short, it seems reasonable to conclude that sync marks are now being reliably recognised.  The question is whether that tweak to the strobe line handling fixed anything, or not. This is at least easy to test: Change one line of VHDL to remove the extra check I put in place, and then resynthesise the VHDL test rig. Hmm. Seems that it wasn't necessary.

So, it seems that the problem must be higher up the processing chain.  So I took a closer look at the top level mfm_decoder module, and noticed a couple of errors in CRC checking of sectors, and when the found_track/sector/side variables are set.  I then realised that my test bench tested that the sectors were found on the disk, but not that this information was communicated back up to the caller.  Needless to say I have fixed those now, and am resynthesising.

While waiting for synthesis of the whole MEGA65, I also made the same change to my VHDL test rig.  Finally, that is now finding sectors! I can also see which track I am on, from the reported track number, as well as watching the sector numbers cycle through.  Now I am impatiently waiting for synthesis to complete, so that I can test it in the MEGA65 again...

Resynthesis has finished, so now trying it out.  I have hit the problem that sometimes happens where some registers for the SD card interface and the non-existent input switches misbehave. This mean booting up without an SD card, and hence, without the F011 disk present flag being set by mounting a D81 image.  It turns out that this is the only source of the disk present flag at the moment. So I need to plumb the disk ready line. 

However, I have just hit the exact same problem that causes Amiga 600 and 1200 disk drives to tick constantly when there is no disk inserted: The /DISKCHANGE line does double duty as the disk present signal.  As it doesn't seem right for an 8-bit computer that is likely to only sometimes have a disk in the drive to tick like a bomb, I'll just lie and tell the F011 that it always has a disk inserted when it is talking to a real floppy drive.

I also found a similar problem in the logic that checks if a disk is available before dispatching a sector read or write job via the F011.  I have now fixed that logic ready for next synthesis. This was again because the existing F011 code assumed that the SD card was the only source of floppy disks.  It also meant that I could work around it by setting the disk image enable flag for floppy drive 0, so that it would look like a disk is available to the internal logic.

With those two fixes, the C65 BASIC no longer reports DRIVE NOT READY when attempting to run a DIR command.  Instead it hangs... because I don't assert the Request Not Found (RNF) flag if a requested sector is not found.  Reading the C65 specifications, the RNF flag should be set if the requested sector has not been found before six index pulses, i.e., within about a second.
This is implemented now, ready for the next synthesis run.

The hanging, I think, was because I had stepped the disk drive to a different track, which confused it, as the sector headers would not have matched, and so it would have kept on searching.  Now with the drive on the correct track, the DIR command returns a broken directory, that looks to me as though it is reading all zeroes from disk, or some similar problem.

Taking a look at the sector buffer for the F011, it looks like it is finding all zeroes.  Ah, that would be because I set the "match any sector" flag.  This is actually encouraging, as it means that it is reading the empty sectors on the disk.  With that flag turned off, DOS is back to hanging. Perhaps I have the side 1/2 head select flag wrong?  The request that it is currently stuck on is track 39 (this is floppy track numbering, the C65 is talking about track 40, the directory track on a 1581 disk), sector 1, side 0.  Checking the sectors that the MFM decoder is seeing (via the debug registers I added at $D6A3-5), I can see that it is in fact correctly finding the sectors for that track and side.  More specifically, I can see that it finds track 39, sector 1, side 0.  Yet it never seems to indicate to the F011 that it has read the sector, and the busy flag stays asserted, which is why DOS is hanging.

At this point, I could try to work my way through the C65 DOS to figure out everything that is going on. However, as the regular SD card mode of operation works fine, I am instead going to write a little test program that tries to drive the F011 to read a sector, and see if that works. If not, then I will know where the process is failing.  If it does work, then I can take a look at the DOS after all.

This reveals that reading a sector doesn't seem to properly complete. So I finally got around to writing a program that produces a whole sector's worth of MFM data to input into simulation, to see if that sheds any light on the situation. Found one more bug: sector_found was being reset the same cycle as the sector_end flag was being asserted, and the MEGA65 was not handling that correctly.  So I have added a fix for that, which is now synthesising.  I have also added some extra debug registers so that I can see if the MEGA65 thinks it is accepting sector bytes from the MFM decoder.  Synthesis time again.

Following synthesis of the above, I have worked out that the SIDE flag from the F011 registers is being inverted in sense, so I have pushed a fix for that, which will require another synthesis.  The next mystery is that the Request Not Found flag is still being set, even when the track and side are correct. Setting the match any sector flag solves this, by accepting any sector. So this suggests that there is still something faulty with the track/sector/side match logic.  Indeed, I can see that the found_track/sector/side matches what is being asked for from the F011.
This problem was caused by comparing unsigned values, instead of first casting them to integers.  This is one of many really annoying "features" of VHDL.

After various other little fixes, I can now see the correct sector is being found, and the various signals are being asserted, that should be telling the MEGA65 that bytes are ready for writing to the sector buffer as they are read from the floppy drive. However, still no bytes are read.  The combinations of signals that I am seeing, using the debug register $D6A1 as the source of my information are recorded by incrementing a byte of RAM corresponding to each value read using a routine like LDX $D6A1, INC $0400,X. In this way the contents of $0400-$04FF form a histogram, which I can see on screen as it is gathered, and yet the sampling rate is still able to be ~1MHz.  This will miss some instances of short-lived signals, however, by repeatedly plotting in this way, those will still tend to show up over time.  So here are the combinations of signals I am seeing:

fdc_sector_end - Very frequent. This is pulsed each time the end of a sector is reached, whether or not it is the sector we are looking for.

fdc_sector_found - Is held for a period of time, ~5 times per second, i.e., each time the sector we are searching for passes under the head.  This is encouraging, as it means we are finding our sector quite reliably.

fdc_sector_found | fdc_sector_data_gap - As above. This indicates that we have passed the sector header, and are now in the gap between the sector header, and the data of the sector itself. Again, a very healthy sign.

fdc_sector_found | fdc_byte_valid - This also pulses a number of times each rotation, indicating that the MFM decoder has the bytes off the sector to serve.

fdc_sector_found | fdc_byte_valid | fdc_first_byte - This gets indicated only once per reading of the sector, to indicate the first byte of the sector is being presented. Again, a very healthy sign.

So, we have clear evidence that the sequence of signals is more or less correct when the MFM decoder is doing its job.  It should be noted that these are generated automatically, whether or not the F011 has asked for a sector to be read or not.  When a read sector command is given, there is an extra signal that is asserted by the F011 so that it knows that it should accept the bytes presented by the MFM decoder, fdc_read_request.  By running a bunch of read requests while my histogram is collecting, I can verify if all of these combinations also occur while fdc_read_request is asserted. 

Because the histogram is running continuously collecting, I have to manually issue the read command via the serial monitor, so the number of samples is much smaller.  As a result I didn't detect any instances of the rather rare combination of fdc_sector_found | fdc_byte_valid | fdc_first_byte, however the face that all of the others are showing up gives me confidence that it should be showing up.

For a byte to be written, fdc_read_request, at least one of fdc_sector_found and fdc_sector_end, and fdc_byte_valid must be simultaneously asserted. From my histogram, I can see that fdc_read_request | fdc_byte_valid | fdc_sector_found happens quite frequently, as one would expect. That is, the conditions are satisfied for writing the bytes read from the floppy into the buffer. However, there is not so much as a single byte changed in the buffer. The logic that makes this decision consists of a couple of nested if statements, so I might put something in one of the outer if statements, so that I can see if it is getting there.  I'll also go through the synthesis warnings to see if there is anything fishy there that looks like it could be causing this problem.

Nothing fishy turned up in the synthesis warnings, however instrumenting those if statements also showed that it never gets to the appropriate test.  A little further searching, and it looks like the fdc_sector_end signal was not being handled correctly: Whenever it occurred -- whether it was the end of the sector that was being searched or not -- it would cancel the current read request.  I have now added this extra check, and am resynthesising. 

As I was thinking about this bug, I was initially thinking that this should mean I have a 1 in (sectors per track) chance of reading a sector. However, it instead required that the read request be initiated in the gap after the end of the sector just before the one desired, and the sector to be read.  This was quite a comforting revelation, as it explained why I was never seeing a sector being read -- because the probability was probably less than one in a thousand that this condition would be met. Even when set to accept any sector, the inter-sector gap is only a small percentage of the length of a sector.  Now we await synthesis to see if this theory is correct.
 
Indeed, with that fix, the sector data thinks it is being read into the sector buffer. I say thinks, because the sector buffer data doesn't change.  I lost several hours to this problem, until Falk reminded me that we have two copies of the sector buffer, one where the CPU controls access to it, and the other where the SD card writes directly into it, because the SD card can't be held off while the CPU is looking at the sector buffer, and vice-versa, when the CPU is reading from the buffer, there isn't a mechanism to add an extra wait-state if the SD card (or now also the floppy controller) is writing to the buffer.  There is special logic that allows each side to cause writes to happen to the other copy of the sector buffer, and I had forgotten to implement that for floppy reads. That is now synthesising.

Although the data itself is not being read into the sector buffer, in theory, the status signalling should be mostly complete, and I should be able to boot the C65 ROM with the real floppy drive enabled, so I tried doing this. This revealed that there is a problem with the track stepping: The DOS gets stuck looking on boot trying to read 1581 track 40, sector 0, because no data ever arrives, which is because the floppy drive has stepped only to track 38, not track 40. I'm not quite sure what the cause for this is.  If I manually step the drive forward to the correct track, the read can then complete, although I think I am seeing a problem with the buffer empty/full flag being erroneously set to indicate the buffer is empty, rather than full. This is used by the C65 ROM to work out if a sector has been read or not.  This might just be because of how I am probing the registers to debug, as the EQ flag inhibit when reading gets cleared if $D087 (the port register to the sector buffer) is read. Since I am reading 16 registers at $D080-$D08F as a batch, this would cause that problem.  So I expect that this is a non-problem. So it really just leaves the track stepping problem.

The C65 DOS steps tracks assuming that this works without problem, and always lands on the correct track. The routine for seeking to a track is at $9B98, and looks something like:

findtrack:
   phy
   ldy #$00
   ldx $1FE8    ; presumably the current drive ID
   lda $d084    ; The target track
waitforready:
   jsr $9A88    ; Wait for F011 busy flag to clear
   cmp $010F,x ; compare with the track we think we are on?
   beq foundtrack
   bcc lowertrack

   inc $010f,x   ; increase the track we think we are on?
   ldy #$18
   bra issuestepcommand

lowertrack:
   dec $010f,x      ; decrease the track we think we are on?
   ldy #$10

issuestepcommand:
   sty $D081     ; Tell F011 to step
   bra waitforready

foundtrack:
   tya
   beq done      ; if we didn't step, skip head settling time
   jsr waitforheadsettle

done:
   ply
   rts

Single-stepping through this routine, it seems to do what it should. However, after that $14 gets written to the command register in the waitforheadsettle routine. And that's where the problem is: I was interpreting that as a head step command, because the high-nybl is $1.  Again, a couple of lines to fix it, and probably a couple of hours of waiting for synthesis to run.  Then I found that the F011 disk-side select line was inverted, and I wasn't updating the pointer to the location in the second copy of the sector buffer, i.e., the one that the CPU sees.  So, it's off to synthesis again, but very much with the feeling now that it is the last few barriers this time.

It turns out there were still a number of other little niggly bugs to track down with filling the sector buffer, which once solved frustratingly still haven't quite got it working.  Sectors do now get read, and loading a directory goes through the motions of working -- but it is reading the wrong sectors for some reason.  Or perhaps looking at the wrong halves of the sectors, since the 1581 uses standard 512 byte sectors as the on-disk format, which each contain two logical 256 byte sectors, presumably because the PC-style floppy controller IC it uses doesn't support 256 byte sectors, or because using 256 byte sectors would have resulted in a slightly lower disk capacity.  Whatever the cause, the question is how SD card and floppy can behave differently, when all the buffer pointer management is the same regardless of whether a sector is read from the SD card or from the floppy drive -- and yet the problem only occurs when reading from the floppy drive.  Thus there must still be some subtle difference. 

To try to figure out what the difference might be, I tried loading the same sector from both SD card and from the real floppy drive.  Lo and behold, the sector from the floppy drive was rotated by one byte through the sector buffer.  Looking through the source code, I can see that the buffer pointer in question was not being zeroed when the read sector job was accepted, i.e., the part common to both SD card and floppy drive access, but instead in the setup stage for SD card access. Satisfying that I can see that this makes sense. In retrospect, I saw that the buffer write addresses were out by one in simulation, but the consequences didn't fully occur to me at the time.  Anyway, time for synthesis again...

Okay, so now the sector buffer rotation bug is fixed, and yet DIR still shows gibberish, as though it is reading the sector data incorrectly.  The C65 DOS uses unbuffered reading, where it accepts each byte as it is supplied by the F011 floppy controller, rather than waiting for it all to arrive. So I wondered if there was some sort of bug in that handling -- although, again, for SD card reads, it works without problem.  So I wrote a test program to make sure that unbuffered reading works correctly, and the correct data is read out, and in order. All seems to be correct there.  However, when trying to load the directory of a disk, it still looks very much like it is accessing the wrong half of the sector.

Unlike for the sector write offset, I can see no difference in the way the read pointer into the sector buffer is handled between the FDC and SD card data paths. Yet, like the last time I said this, there must be some difference, or else it would be working.

The F011 has a SWAP bit, that allows the halves of the sector buffer to be switched from the CPU's perspective, and the handling of this is the prime suspect as the root cause for this bug.  Single stepping through the C65 DOS, it turns out that this is not the problem.  Rather, the problem is that the floppy drive reads data at a slower rate than the CPU can read it, and the EQ flag that indicates if the buffer is empty claims not-empty as soon as the sector starts being read, regardless of whether the data has been read far enough or not.  As a result, the problem is actually that the C65 DOS is reading the contents of the buffer, before it has been filled from the floppy drive.  So the SWAP flag can stay as it is, but the EQ flag requires correction.

In the end, I looked at the whole handling of the EQ flag, and with the distance of time from when I first implemented, it seemed rather over complicated to me.  So I re-worked the entire sector buffer handling code, so that it now works much more like the real F011, and the duplicate copies of the sector buffer to solve bus contention have been reduced to a single buffer, with a bit of clever pre-fetching and write-buffer. The result is a whole lot simpler, and, it works.  FINALLY I can stick a 1581 disk in the real floppy drive, type DIR from C65 BASIC, and get a directory listing. Here is one I specially formatted the other day using my resurrected 1581:



And here is an old disk from the mid-90s when I was using the C65 I had at the time as my main computer:


I tried a bunch of my other old 1581 disks, but only a couple would work.  What I don't know is whether that means those disks have simply rotted away, which is entirely possible after a 20+ years, or that my error correction for MFM decoding could do with a bit more work.  I guess I will pull them out again at some point, and do some data captures from them, and from there work out whether my error correction could be improved. But adding write-support to the floppy drive is a higher priority than that.

Wednesday, January 10, 2018

Bringing the internal 3.5" floppy drive to life - part 1

While it will take a while longer before it can read or write disks, the next step for the internal floppy drive is to make it mirror what the SD-card floppy emulation thinks the drive should be doing, for example, when the motor should spin, the light come on, and when the head should step in and out tracks. I also want to have debug registers that allow me to directly control and read the floppy drive state, so that I can work towards being able to read and write real disks.

So, first step is to plumb the floppy status and control signals into the VHDL module that handles F011 emulation and SD card access, and provide a debug register for those.  This is already done, and $D6A0 is the register.  Writing sets control lines, and reading reads the status lines. As this is a debug register, you have to remember the state of the control lines yourself.  The control lines are:

7              f_density - 0=1.44MB, 1=720K
6              f_motor - 0 = motor on, 1 = motor off
5              f_select - 0 = drive selected (and LED on), 1 = drive not selected
4              f_stepdir  - 0 = step inwards, 1 = step outwards
3              f_step - 0 = generate step pulse (two required per track)
2              f_wdata - bit to write
1              f_wgate - 0 = turn write head on
0              f_side1 - 0 = head on side 1 selected, 1 = head on side0 selected

The status lines are:

7              f_index - 0 when passing over index hole
6              f_track0 - 0 when head is over track 0
5              f_writeprotect - 0 when disk is write protecteed
4              f_rdata - data bit read from head
3              f_diskchanged - 0 when disk has been changed

I have already confirmed that I can make the motor and LED turn on and off. Next step is to write a little test program that lets me test all of these functions.

To do this, I need to make a set of pull-up resistors for the floppy interface, as the revision 1 PCB lacks pull-ups on the status lines, so once they go to 0, they never go back to 1.  The pull-up resistors gently pull the lines high again, so that when the floppy stops driving them low, they return high, and thus back to a 1.  As previously mentioned, it is rather annoying that the 34-pin floppy cable has no +5V line, which makes it a nuisance to make a pull-up kit that can fit on the cable. Fortunately, however, there are some lines that we control, and can drive high.  For current testing, the density select line can stay +5V, since we don't need to do any 1.44MB disk access just yet. Here is my little home-made pull-up kit:


With this plugged in, I can easily see when the track 0 sensor etc, so all is good. I can also see data being read from the test disks, so that answers that question. Writing is more complex, so can't be immediately tested.

XXX - Drive stepping, LED and motor tracking SD card working (short video).

So, with that working, now I want to get reading data from the floppy working.  The first step is to acquire a decent slab of data from one of the tracks on the floppy.  The trick is how to capture it at a decent rate, since floppy data pulses come at approximately 500Hz, but the pulses are only 150ns - 800ns wide.  The easiest solution is to DMA read the register, as this will read every two clock cycles (the alternate clock cycle will be writing the value that had been read into memory), for a sample time of 40ns. This creates a data capture problem, because the MEGA65 has only a limited amount of RAM. Using a single DMA to read 56K samples (we could push it to 64K, but that would require fiddling with memory a bit more). At 40ns, that equates to 2,293,760ns = 2.29 milliseconds.  Given that a floppy spins at 300 RPM = 5 revolutions per second, a single rotation is 200 ms, so we are sampling only ~1% of a track this way. Admittedly at very high resolution.  This is not really enough to capture even a single sector for decoding. However, what it is useful for is to let me see just how long the pulses are on this floppy.  Here is a sample of the captured data:

:0012310 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012320 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012330 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012340 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012350 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012360 1F 1F 0F 0F 0F 0F 0F 0F 0F 0F 1F 1F 1F 1F 1F 1F
 :0012370 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012380 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :0012390 1F 1F 1F 1F 1F 0F 0F 0F 0F 0F 0F 0F 0F 1F 1F 1F
 :00123A0 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :00123B0 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :00123C0 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F
 :00123D0 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F

We know from the table of input signals above, that it is only the upper 5 bits that matter.  These samples all have bit 7 (index hole sense), bit 6 (track 0) and bit 5 (write-protect) equal to 0, which means asserted. That is, we are reading track 0 on a write-protected disk, while the index hole is passing.  Bit 4 is the data bit itseslf, and we see that it is mostly high, with a couple of pulses lasting 8 samples each, i.e., 8 x 40 ns = 320 ns in duration.  The length of the pulses is within the specified range, so that looks good.  In this case, the pair of pulses are 51 samples, i.e., 51 x 40 = 2040 ns = ~2 usec apart.  Then pulses appear throughout the capture at varying intervals, as we expect.

Without going into the gory detail of how MFM works, a summary is that the gaps between the pulses vary to encode the information.  For a given data rate, gaps of 1, 1.5 and 2 time units are possible, corresponding to the reception of the following bit sequences:

+-----------------+--------+--------+
|Last received bit|Interval|New Bits|
+-----------------+--------+--------+
|      NONE       |   1.0  |11 or 00|
|      NONE       |   1.5  |   01   |
|      NONE       |   2.0  |  101   |
|        0        |   1.0  |    0   |
|        1        |   1.0  |    1   |
|        0        |   1.5  |    1   |
|        1        |   1.5  |   00   |
|        X        |   2.0  |   01   |
+-----------------+--------+--------+
There are some exceptions to this for synchronisation marks, where the pattern can be different.  In particular, the common "A1" sync mark consists of intervals of 2.0, 1.5, 2.0 and 1.5.  Encoding $A1 using MFM would normally result in gaps of 2.0 (101), 1.5 (00), 1.0 (0), 1.0 (0), 1.5 (1), where values in brackets are the bits of the byte $A1 (= 101000001 in binary) being encoded. The ambiguity at the start of a byte is solved by using these sync bytes to first synchronise the decoder at the start of a string of bytes.

We should therefore expect to see disk data consisting of long runs of intervals that are 1.0, 1.5 and 2.0 times the basic time interval.  The first time I captured data, I was seeing all sorts of crazy intervals, which had me thinking that something was terribly wrong.  However, second time around, the stream looks much better, with intervals all within a 1:2 range of size ratios, as we expect.

So now I need to write a little program that tries to find the sync marks in the data stream, and then begin decoding the data from there, to see if it looks like what it should be. Here is how one of the traces decoded:

$e4 $e4 $e4 $e4 $ce $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00 $00
Sync $A1
Sync $A1
Sync $A1
 $fe $01 $01 $01 $02 $8b $eb $4e $4e $4e $4e $4e $4e $4e $4e $4e
 $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e $4e
$00 $00 $00 $00
 $00 $00 $00 $00 $00 $00 $00 $00
Sync $A1
Sync $A1
Sync $A1
 $fb $00 $00 $00 $00 $00 $00 $00 $00 $00 $00


The 1581 service manual describes the byte sequences that we should expect to see, as:

12 bytes of 00
3 bytes of Hex A1 (Data Hex A1, Clock Hex 0A) <- i.e., $A1 Sync
1 byte of FE (ID Address Mark)
1 byte (Track number)
1 byte (Side number)
1 byte (Sector number)
1 byte (Sector length. 02 for 512 byte sectors)
2 bytes CRC (cyclic redundancy check)
22 bytes of Hex 22
12 bytes of 00
3 bytes of Hex A1 (Data Hex A1, Clock Hex 0A) <- i.e., $A1 Sync
1 byte of Hex FB (Data Address Mark)
512 bytes of data
2 bytes CRC (cyclic redundancy check)
38 bytes of Hex 4E

If we try to match this with what we saw, it is pretty close.  There is some junk at the beginning, that looks like the tail end of the 38 bytes of $4E, allowing for lack of synchronisation, then we see the start of sector 2, track 1, side 1 (bold), following the prescribed format exactly, with the exception that the 22 bytes are hex $4E, not $22 (underlined). It is possible that $22 is a typo in the 1581 service guide, given that there are 22 bytes stipulated.  Indeed, we find evidence that this is the case from the C65 specifications guide, which describes the on-disk format as:


  quan      data/clock      description
  ----      ----------      -----------
    12      00              gap 3*
    3       A1/FB           Marks
            FE              Header mark
            (track)         Track number
            (side)          Side number
            (sector)        Sector number
            (length)        Sector Length (0=128,1=256,2=512,3=1024)

    2       (crc)           CRC bytes
    23      4E              gap 2
    12      00              gap 2
    3       A1/FB           Marks
            FB              Data mark
    128,
    256,
    512, or
    1024    00              Data bytes (consistent with length)
    2       (crc)           CRC bytes
    24      4E              gap 3*

    * you may reduce the size of gap 3 to increase diskette capacity,
      however the sizes shown are suggested.

Here we have the byte value $4E specified for gap 2, however, it suggests that the gap should be 23 bytes long, not the 22 we have observed, or that is stipulated in the 1581 service manual.

Nonetheless, we have proven that we can read sensible data from the disk, and use a simple table of relative gap size to drive decoding of the MFM data.  What we have not discussed here, is how to deal with the variable (and varying) disk rotation speed, and the errors that it can introduce.  A simplistic perspective on this is that we have to have something approximating a phase-locked loop that recaptures the clock on each transition that is encountered.  There are various ways to do this.  The C65 floppy controller supports three such algorithms, none of which exactly correspond to the method I have described here, of considering the gap intervals, rather than the arrival time of the transitions.  The error correction in my scheme relies on quantifying the gap intervals to be either 1.0, 1.5 or 2.0, with values in between being rounded to the nearest of those.  It stands to be seen how well (or badly) my scheme works in practice. I have also not yet covered generating or checking the CRC of sectors.

However, we now have enough information to be able to create a VHDL implementation that takes the raw input, extracts gap intervals, quantifies the gap intervals, detects sync marks, and extracts the sync and byte stream, and can feed this into a higher-level decoder that can check the track and sector that has been found, and extract the sector data.  Indeed, I can use the captured sequence as a test vector into this.  In short, we are well on our way to being able to read 3.5" floppies using the internal drive.