LLX > Neil Parker > Apple II > Speeding Up DOS 3.3
Let's face it—Apple DOS 3.3 is infamous for it slowness. Anyone who has ever watched it load a hi-res image is familiar with the problem, waiting as first a little bit of the image appears, then a little bit more, eventually taking typically around twenty seconds to load the whole image. Clearly it doesn't need to be that way—Apple's ProDOS, for example, can usually load the same image in under five seconds.
So what's making it so slow, and what can we do about it?
An Apple DOS 3.3 disk's surface is divided into sectors of 256 bytes each. At a low level this is the only way to access the disk, in units of 256 bytes at a time.
But file accesses don't conform to one-sector-at-a-time rules. It's normal, and quite common, to want to access just part of one sector, or starting partway through one sector and finishing partway through another sector. So when a file opened, DOS sets aside a 256-byte area of memory (the data sector buffer) for accessing its sectors, and then copies bytes into or out of it, one byte at a time. That way programs can read or write anywhere in the file, without regard to sector boundaries.
But all that one-byte-at-a-time copying takes time—time during which the disk continues to spin. In fact, while the copying is taking place, chances are the file's next sector is spinning past the read/write head unnoticed, so that by the time DOS is ready for the next sector, it has to wait a whole disk revolution for it to come around again. Whenever you load a file with more than one data sector, DOS spends most of its time waiting for the disk to spin (this is called blowing revs).
Several third parties produced modified versions of DOS that are much faster, for example, Diversi-DOS, DAVID-DOS, ProntoDOS, etc. All of these work essentially the same way, by recognizing when a program is accessing a part of a file that starts at a sector boundary, and still has at least 256 bytes to go. In that case, they load or save the sector directly from the program's memory, skipping the data sector buffer and all that one-byte-at-a-time copying. This is usually sufficient to catch the file's next sector the first time it comes around instead of the second, and the file access is up to five times faster.
Below, a patch is developed for DOS 3.3 that speeds up loading, and hopefully clarifies the mechanics of how fast versions of DOS work. (Saving is not patched here—saving is more complicated than loading, and speeding it up requires additional memory, and a slightly different approach.)
In preparing what follows, I've relied heavily on the classic DOS reference, Beneath Apple DOS by Don Worth and Pieter Lechner, and on the final DOS source code, rescued from oblivion by David T. Craig and findable on the usual Apple II download sites. Several snippets of assembly language below are taken from an assembly listing made from this source code—the addresses in the listing differ from those in Beneath Apple DOS because the source code assembles a "master" DOS that loads from $1B00 to $3FFF and then relocates itself. Addresses can be matched with normal 48K DOS, and with Beneath Apple DOS, by adding $8000.
As with any non-trivial DOS patch, the fast load patch has to go in memory
somewhere, and for the sake of compatibility it needs to go somewhere where
it won't add to DOS's memory needs. The good news is that DOS 3.3 has three
areas of unused memory inside itself where such patches can go (provided
that they aren't too big). The bad news is that Apple released three
versions of DOS 3.3, and in each version the amount of memory available in
those three gaps is different. The first version has the most space
available for patches like this, but it's the third release that DOS 3.3
users should prefer, since it (almost) fixes the notorious
APPEND bug.
In order to have the benefits of both a working APPEND and
fast loading, this patch assumes that my own APPEND patch,
which is discussed in another article, has
already been applied. As explained there, the APPEND patch
leaves more free space in the three internal gaps that any of Apple's
official DOS 3.3 releases, including more than enough space for a fast load
patch. In particular, memory from $B692 through $B6FC is free, and the fast
load patch will go there.
Of course there are drawbacks to this choice, in the form of software that assumes it can put its own code into DOS's internal gaps with impunity. The language-card version of Global Program Line Editor (GPLE), for example, overwrites memory from $B6B3 to $B6FC, so using it with this fast load patch is likely to be fatal. (Fortunately the 48K version of GPLE doesn't have this problem.)
Here's the routine that's responsible for DOS's slowness. This is the DOS file manager's "read a range of bytes" routine, which gets control whenever a request is made to read more than just a single byte.
2C96: 218 ; 2C96: 219 * RNXBLK - READ NEXT BLOCK 2C96: 220 ; 2C96: 2C96 221 RNXBLK EQU * 2C96:20 B5 31 222 JSR DTBLN ; GO DECR LEN (NOT RTN IF=0) 2C99:20 A8 2C 223 JSR GETBYT ; GO GET BYTE 2C9C:48 224 PHA 2C9D:20 A2 31 225 JSR MIBDA ; GO MOVE BLOCK ADR AND INCR 2CA0:A0 00 226 LDY #0 2CA2:68 227 PLA 2CA3:91 42 228 STA (ZPGFCB),Y ; SET DATA BYTE 2CA5:4C 96 2C 229 JMP RNXBLK ; GO FOR NEXT BYTE 2CA8: 230 ; 2CA8: 231 * GETBYT - GET A DATA BYTE 2CA8: 232 ; 2CA8: 2CA8 233 GETBYT EQU * 2CA8:20 B6 30 234 JSR LOCNXB ; LOCATE NEXT BYTE 2CAB:B0 0B 2CB8 235 BCS EOFIN ; BR IF EOF 2CAD:B1 42 236 LDA (ZPGFCB),Y ; GET DAT BYTE 2CAF:48 237 PHA ; SAVE IT 2CB0:20 5B 31 238 JSR INCRRB ; INCR REC BYTE 2CB3:20 94 31 239 JSR INCSCB ; INCR SEC BYTE 2CB6:68 240 PLA ; GET SAVED BYTE 2CB7:60 241 RTS ; RETURN 2CB8: 242 ; 2CB8:4C 6F 33 243 EOFIN JMP ERROR5 ; GO TO EOF RTN
This just reads one byte, repeatedly, into the target memory, until all the requested bytes have been read. This is the routine that we need to patch into, to recognize when a whole sector can be read into directly into the target memory.
So under exactly what conditions can a whole sector be read directly into the target memory without all the one-byte-at-a-time copying?
But that's not all. DOS maintains two current file position
counters, the record/byte-within-record counter (updated by the JSR
INCRRB above), and the sector/byte-within-sector counter (updated by
JSR INCSCB). For figuring out whether we're on a sector
boundary the sector/byte-within-sector counter is the only one needed, but
for maximum friendliness to programs the record/byte-within-record counter
should also be maintained accurately. But given an arbitrary record length,
finding the new record/byte-within-record position after reading a whole
sector requires a full long division, which would use up more memory than
we'd like.
So this patch adds a third condition:
Since both the LOAD and BLOAD commands set the
record length to 1, this accounts for all of the multi-sector loads
that DOS does—only file manager calls made by programs that explicitly
set the record length to something other than 1 will miss out, and they
won't miss out fatally—they'll just get the usual slow read.
The JSR DTBLN above tests whether or not the number of bytes
remaining to be loaded is zero; if so it exits the file manager, otherwise
it subtracts one from the number of bytes and returns to its caller. This
turns out to be a good call to replace with our patch, which will look like
this (my apologies for the GOTO's, but the crossed conditions make nice
structured indentation difficult):
IF the record length is not 1, OR the byte-within-sector is not 0, THEN GOTO p2
p1: IF the number of bytes remaining to be read is at least 256, THEN GOTO p3
p2: JMP DTBLN (This is what would happen normally without the patch)
p3: Read the next sector directly into target memory
Add 256 to the target address
Add 1 to the sector-within-file counter
Add 256 to the record-within-file counter
Subtract 256 from the number of bytes remaining to be read
GOTO p1
Then all the fast loading is handled by the patch, and if the conditions are not satisfied, the rest of the read loop takes over none the wiser.
The fun part is the "Read the next sector directly into target memory"
part. DOS has a routine that almost does what we need—the
JSR LOCNXB above. But it has two drawbacks, the first of which
is that it always loads the sector into the data sector buffer, not the
target memory. So when fast-loading, we need to patch it before calling it,
and unpatch it after calling it.
But this leads directly to its second drawback: If the previous file
file access was a write, it first writes the current sector to the disk.
If chaos is not to ensue in that case, the sector that it writes must be
the data sector buffer, not the target memory sector. Thus we
must make sure the write happens before patching
LOCNXB.
Taking care of the write is easy—the routine WRSECT at
$AF1D tests whether the current data sector buffer needs to be written, and
writes it if so. It's normally called form within LOCNXB, but
if we call it ourselves beforehand then LOCNXB will see that
it's already been done and doesn't need to be done again.
So now the patch looks like this:
IF the record length is not 1, OR the byte-within-sector is not 0, THEN GOTO p2
p1: IF the number of bytes remaining to be read is at least 256, THEN GOTO p3
p2: JMP DTBLN (This is what would happen normally without the patch)
p3: JSR WRSECT ($AF1D)
Patch LOCNXB to read into target memory
JSR LOCNXB ($B0B6)
Unpatch LOCNXB
Add 256 to the target address
Add 1 to the sector-within-file counter
Add 256 to the record-within-file counter
Subtract 256 from the number of bytes remaining to be read
GOTO p1
How about the patching and unpatching? There's a routine at $AFE4 that's
called by LOCNXB to select the data sector buffer as the
destination for the next sector read:
2FE4: 223 ; 2FE4: 224 ;MVSBA - MOVE SECTOR BUFFER ADR FOR I/O 2FE4: 225 ; 2FE4: 2FE4 226 MVSBA EQU * 2FE4:AC CB 35 227 LDY CFCBSB ; GET SECTOR BUFF ADR 2FE7:AD CC 35 228 LDA CFCBSB+1 2FEA:8C F0 37 229 MSB1 STY IBBUFP ; SET IOB SECTOR 2FED:8D F1 37 230 STA IBBUFP+1 ; BUFF PTR 2FF0:AE D6 35 231 LDX DCBTRK ; GET TRACK 2FF3:AC D7 35 232 LDY DCBSEC ; GET SECTOR 2FF6:60 233 RTS ; RTN
If we alter this to select the target memory pointer at $B5C3 and $B5C4
instead of the data sector buffer pointer, then LOCNXB will do
exactly what we want. So here's what the patcher needs to look like:
UNPATCH LDX #$CB
PATCH STX $AFE5
INX
STX $AFE8
RTS
Then to install the patch we say,
LDX #$C3
JSR PATCH
And to remove the patch, just JSR UNPATCH.
All we need now to convert the pseudocode to assembly language is a few memory locations.
So here's the full fast read routine:
FASTREAD LDX $B5E8 ;Record len lo
DEX
TXA
ORA $B5E9 ;Record len hi
ORA $B5E6 ;Byte in sector
BNE P2
LDA $B5C2 ;Range length hi
P1 BNE P3
P2 JMP $B1B5 ;DTBLN
P3 JSR $AF1D ;WRSECT
LDX #$C3
JSR PATCH
JSR $B0B6 ;LOCNXB
JSR UNPATCH
BCS P2 ;Back to DTBLN if LOCNXB ran out of sectors
INC $B5C4 ;Target address hi
INC $B5E4 ;Sector within file lo
BNE P4
INC $B5E5 ;Sector within file hi
P4 INC $B5EB ;Record number hi
DEC $B5C2 ;Range length hi
BCC P1 ;(Always taken)
;
UNPATCH LDX #$CB
PATCH STX $AFE5
INX
STX $AFE8
RTS
To activate this, replace the JSR DTBLN at $AC96 with a
JSR FASTREAD.
But we're not quite done yet. If an I/O error occurs in
LOCNXB, it exits the file manager instead of returning. If
that happens during our routine, then LOCNXB will be left in
its patched state, causing chaos for anything else that calls it. So we
need to intercept the error exit from the file manager, and make sure that
LOCNXB always gets unpatched.
When LOCNXB gets a disk error, the flow of control ultimately
ends up here to translate the RWTS error code into a DOS error code:
30A1:AD F5 37 108 BADIO LDA IBSTAT ; GET STATUS 30A4:A0 07 109 LDY #CREVMM 30A6:C9 20 110 CMP #IBVMME ; WAS IT VOLUME MISMATCH 30A8:F0 08 30B2 111 BEQ BD2 ; BR IF YES 30AA:A0 04 112 LDY #CREPRO 30AC:C9 10 113 CMP #IBWPER 30AE:F0 02 30B2 114 BEQ BD2 30B0:A0 08 115 LDY #CREIOE 30B2:98 116 BD2 TYA 30B3:4C 85 33 117 JMP ERRORB ; GO RTN
That last instruction is the one we need to intercept. To the above patch, add these two lines:
JSR UNPATCH
JMP $B385
And then replace the JMP ERRORB at $B0B3 with a
JMP to it.
Here's the full patch, in hex, for a 48K DOS (remember, the
APPEND patch must be applied
first, to free up needed memory):
B692:AE E8 B5 CA 8A 0D E9 B5 0D E6 B5 D0 05 AD (Fast read patch) B6A0:C2 B5 D0 03 4C B5 B1 20 1D AF A2 C3 20 CC B6 20 B6B0:B6 B0 20 CA B6 B0 ED EE C4 B5 EE E4 B5 D0 03 EE B6C0:E5 B5 EE EB B5 CE C2 B5 90 D8 A2 CB 8E E5 AF E8 B6D0:8E E8 AF 60 20 CA B6 4C 85 B3 AC97:92 B6 (Hook into DOS) B0B4:D4 B6 (Hook into error exit)
Here's a DOS 3.3 disk with software on it that applies the fast load
patch. The APPEND patch has already been applied to the DOS
on this disk.
The disk image contains two programs, one which patches the currently-running DOS 3.3 in memory, and one which patches the DOS image on a disk.
Remember, there are a few program that aren't compatible with this patch, including the language card version of GPLE—if you want to use GPLE with this patch, use the 48K version.
LLX > Neil Parker > Apple II > Speeding Up DOS 3.3
Original: December 27, 2022