LLX > Neil Parker > Apple II > Decimal Multiplication & Division
Many introductions to 6502 programming describe how to do BCD (binary-coded decimal) addition and subtraction by turning on the decimal flag, and may mention the original 6502's decimal mode bug (the N, V, and Z flags aren't reliable after a decimal operation). But that's usually all they have to say on the subject...multiplication and division in decimal mode are never described, even if binary multiplication and division are.
After a bit of experimenting, I now understand the reason for this oversight: Decimal operations other than simple addition and subtraction are a pain in the butt on the 6502, requiring substantially more code than their binary equivalents. The routines end up being just too long for the average introductory chapter on machine language arithmetic.
But adventurous souls that we are, we'll forge ahead undaunted by such concerns. Decimal multiplication and division on the 6502 are not as difficult as some might think, despite the previous paragraph's pessimism.
We'll need a couple of tools to get the job done. First, we'll need to be
able to multiply a decimal number by 2. We can't use a simple
ASL or ROL for this, since the 6502's shift
and rotate instructions always operate in binary mode regardless of the
D flag, but we can do it easily by adding the number to itself:
SED
LDA NUMBER
ADC NUMBER
STA NUMBER
CLD
Dividing a decimal number by 2 is a little harder, but we can do it
with a shift and some post-processing on the result: If any decimal
digit of the result is 8 or greater, subtract 3 from the digit. The
BIT N08 line could be considered a trick--we need to test
the $08 bit of the accumulator, and luckily we have a PHP
instruction nearby, whose opcode happens to be $08. (On the 65C02 and
65816, one can say BIT #8 instead, eliminating the need
for this trick, but for maximum portability this article uses only
instructions that work on all 6502s.)
ROR
N08 PHP
BPL HALF2
SEC
SBC #$30
HALF2 BIT N08
BEQ HALF3
SEC
SBC #3
HALF3 PLP
Note that this is done with the D flag off.
These two tools, together with decimal addition and subtraction, can be used to build routines for multiplying or dividing by any decimal number. Already, though, we can begin to see why the decimal routines are so long...both of the above tools do jobs that can be done with a single shift instruction in binary mode.
A reasonably obvious algorithm for decimal multiplication would be to implement the usual "grade school" method for multiplying with pencil and paper. On the plus side, this is reasonably straightforward for numbers of almost any size, but on the minus side, manipulating the individual digits of a BCD number is somewhat inconvenient, and we still need a subroutine to do decimal multiplication of single digits.
So instead, the routine below implements the so-called "Russian Peasant" algorithm, which goes like this:
For example, suppose we want to multiply 35 by 43:
35 43 *
17 86 *
8 172
4 344
2 688
1 1376 *
The starred numbers on the right are the ones opposite odd numbers on the left, so we add 43+86+1376, and get 1505, which is the correct answer.
This actually turns out to be identical to the grade school algorithm in base 2, but by doing it in terms of doublings and halvings instead of digits, we have a method that works without change in any base, and no auxiliary subroutine is needed to multiply single decimal digits.
By now it should be fairly obvious where we're going with this, so without further ado, let's talk about some actual code.
The routine below multiplies two one-byte (two-digit) decimal
decimal numbers in memory locations ARG1 and
ARG2, storing the result in memory locations
RESULTL (low byte) and RESULTH (high byte).
An additional memory location TEMP is also used
temporarily as the high byte of ARG2. For
maximum efficiency all these labels should be declared on page zero,
but this is not required.
First we zero out the result and temporary space, and start the loop. We need only seven passes instead of the usual eight, because the largest two-digit decimal number is 99, which can be halved down to 0 in only seven halvings.
LDA #0
STA RESULTL
STA RESULTH
STA TEMP
LDX #7
We see whether the first argument was odd by halving it. If a 1 bit
falls off the end into the carry flag, the number was odd. This is
actually a truncated version of our halving routine from above--we
only need to check whether the low digit needs adjustment, because the
high digit can never be greater than 4 after the LSR.
LOOP LDA ARG1
LSR
N08 PHP
BIT N08
BEQ PULLP
SEC
SBC #3
PULLP PLP
STA ARG1
SED
BCC DOUBLE
If the number was odd, we add the (shifted) second argument to the result. This block of code is skipped if the number was even.
CLC
LDA ARG2
ADC RESULTL
STA RESULTL
LDA TEMP
ADC RESULTH
STA RESULTH
Now we double the second argument. Remember that decimal mode is on here, regardless of which path we took getting here.
DOUBLE CLC
LDA ARG2
ADC ARG2
STA ARG2
LDA TEMP
ADC TEMP
STA TEMP
CLD
We're at the end of the loop. Go back if there's more, else we're done.
DEX
BNE LOOP
RTS
Again, we avoid the usual grade school method for long division, and rely on a Russian-Peasant-style algorithm. Fortunately the Russian Peasant multiplication algorithm is easily reversible:
Again, if you do this in base 2, it's the same as the grade school method. If you look very carefully, you can see that it works by going backwards up the doubling column in the Russian Peasant algorithm, figuring out which entries in the halving column need to be odd by figuring out which doublings were added into the dividend.
The routine below will divide the two-byte (four-digit) number in
NUML, NUMH by DENOML, DENOMH, leaving the
quotient in QUOTL, QUOTH and the remainder in REML,
REMH. All these labels should be declared in page zero for
efficiency, but this not necessary.
First we initialize the quotient and remainder:
LDA #0
STA QUOTL
STA QUOTH
LDA NUML
STA REML
LDA NUMH
STA REMH
Next we test for division by 0, and for the denominator greater than the numerator:
LDA DENOML
ORA DENOMH
BNE TESTND
RET RTS
TESTND LDA NUML
CMP DENOML
LDA NUMH
SBC DENOMH
BCC RET
Now count how many doublings are required to make the denominator greater than the numerator. If the denominator overflows, assume it's greater than the numerator:
LDX #0
COUNTD INX
SED
CLC
LDA DENOML
ADC DENOML
STA DENOML
LDA DEMONH
ADC DENOMH
STA DENOMH
CLD
BCS LOOP
LDA NUML
CMP DENOML
LDA NUMH
SBC DENOMH
BCS COUNTD
Halve the denominator. Watch out--the carry flag at the beginning of this code is important!
LOOP LDA DENOMH
JSR HALF
STA DENOMH
LDA DENOML
JSR HALF
STA DENOML
Double the quotient. Note that we can skip clearing the carry before
the adds because DENOML, DENOMH is guaranteed to be an
even number before the above halving step.
SED
LDA QUOTL
ADC QUOTL
STA QUOTL
LDA QUOTH
ADC QUOTH
STA QUOTH
Try subtracting the denominator from the remainder:
SEC
LDA REML
SBC DENOML
TAY
LDA REMH
SBC DENOMH
CLD
BCC NEXTX
If the subtraction succeeded, replace the remainder with the result:
STA REMH
STY REML
...and add 1 to the quotient (since the low bit of the quotient is guaranteed to be 0 at this point, we don't need to go into decimal mode, or worry about a carry into the high byte):
INC QUOTL
Count down the number of doublings, and go back if there were
more, otherwise we're done. Note that the extra CLC
instruction is needed to ensure that the carry flag is right at the
label LOOP.
NEXTX CLC
DEX
BNE LOOP
RTS
Finally, we need our halving tool as a subroutine:
HALF ROR
N08 PHP
BPL HALF2
SEC
SBC #$30
HALF2 BIT N08
BEQ PULLP
SEC
SBC #3
PULLP PLP
RTS
The above routines are certainly not the only way to do decimal multiplication and division. Multiplication can be implememted in a much smaller and easier-to-understand routine than the above simply by adding the multiplicand to itself repeatedly, as many times as the multiplier indicates. Likewise, a much simpler division routine would be to count how many times the denominator can be subtracted from the numerator. But multiplying that way is extremely inefficient if the multiplier is large, and dividing like that is slow if the denominator is much smaller than the numerator, and the problem becomes much worse if you need to work with numbers larger than one or two bytes. With the routines presented above, at least the worst case isn't much worse than the best case, and the methods scale much more nicely to wider operands.
Of course neither of the above routines will win any awards for
speed. For maximum speed the best approach is probably to avoid
algorithmic approaches like the above and rely on table lookups. For
example, a very efficient multiplication routine can be written using
the identity 4*A*B = (A+B)^2 - (A-B)^2, for which you only need
to sacrifice about 1K of memory for a table of squares, and the code
for it in decimal mode isn't much more complicated than the code in
binary mode. But this article is already too long, so I leave
implementing it as an exercise for the reader.
LLX > Neil Parker > Apple II > Decimal Multiplication & Division
Original: August 31, 2005
Modified: June 26, 2016--fixed a couple of typos