LLX > Neil Parker > Apple II > Decimal Multiplication & Division
Many introductions to 6502 programming describe how to do BCD (binary-coded decimal) addition and subtraction by turning on the decimal flag, and may mention the original 6502's decimal mode bug (the N, V, and Z flags aren't reliable after a decimal operation). But that's usually all they have to say on the subject...multiplication and division in decimal mode are never described, even if binary multiplication and division are.
After a bit of experimenting, I now understand the reason for this oversight: Decimal operations other than simple addition and subtraction are a pain in the butt on the 6502, requiring substantially more code than their binary equivalents. The routines end up being just too long for the average introductory chapter on machine language arithmetic.
But adventurous souls that we are, we'll forge ahead undaunted by such concerns. Decimal multiplication and division on the 6502 are not as difficult as some might think, despite the previous paragraph's pessimism.
We'll need a couple of tools to get the job done. First, we'll need to be
able to multiply a decimal number by 2. We can't use a simple
ROL for this, since the 6502's shift
and rotate instructions always operate in binary mode regardless of the
D flag, but we can do it easily by adding the number to itself:
SED LDA NUMBER ADC NUMBER STA NUMBER CLD
Dividing a decimal number by 2 is a little harder, but we can do it
with a shift and some post-processing on the result: If any decimal
digit of the result is 8 or greater, subtract 3 from the digit. The
BIT N08 line could be considered a trick--we need to test
the $08 bit of the accumulator, and luckily we have a
instruction nearby, whose opcode happens to be $08. (On the 65C02 and
65816, one can say
BIT #8 instead, eliminating the need
for this trick, but for maximum portability this article uses only
instructions that work on all 6502s.)
ROR N08 PHP BPL HALF2 SEC SBC #$30 HALF2 BIT N08 BEQ HALF3 SEC SBC #3 HALF3 PLP
Note that this is done with the D flag off.
These two tools, together with decimal addition and subtraction, can be used to build routines for multiplying or dividing by any decimal number. Already, though, we can begin to see why the decimal routines are so long...both of the above tools do jobs that can be done with a single shift instruction in binary mode.
A reasonably obvious algorithm for decimal multiplication would be to implement the usual "grade school" method for multiplying with pencil and paper. On the plus side, this is reasonably straightforward for numbers of almost any size, but on the minus side, manipulating the individual digits of a BCD number is somewhat inconvenient, and we still need a subroutine to do decimal multiplication of single digits.
So instead, the routine below implements the so-called "Russian Peasant" algorithm, which goes like this:
For example, suppose we want to multiply 35 by 43:
35 43 * 17 86 * 8 172 4 344 2 688 1 1376 *
The starred numbers on the right are the ones opposite odd numbers on the left, so we add 43+86+1376, and get 1505, which is the correct answer.
This actually turns out to be identical to the grade school algorithm in base 2, but by doing it in terms of doublings and halvings instead of digits, we have a method that works without change in any base, and no auxiliary subroutine is needed to multiply single decimal digits.
By now it should be fairly obvious where we're going with this, so without further ado, let's talk about some actual code.
The routine below multiplies two one-byte (two-digit) decimal
decimal numbers in memory locations
ARG2, storing the result in memory locations
RESULTL (low byte) and
RESULTH (high byte).
An additional memory location
TEMP is also used
temporarily as the high byte of
maximum efficiency all these labels should be declared on page zero,
but this is not required.
First we zero out the result and temporary space, and start the loop. We need only seven passes instead of the usual eight, because the largest two-digit decimal number is 99, which can be halved down to 0 in only seven halvings.
LDA #0 STA RESULTL STA RESULTH STA TEMP LDX #7
We see whether the first argument was odd by halving it. If a 1 bit
falls off the end into the carry flag, the number was odd. This is
actually a truncated version of our halving routine from above--we
only need to check whether the low digit needs adjustment, because the
high digit can never be greater than 4 after the
LOOP LDA ARG1 LSR N08 PHP BIT N08 BEQ PULLP SEC SBC #3 PULLP PLP STA ARG1 SED BCC DOUBLE
If the number was odd, we add the (shifted) second argument to the result. This block of code is skipped if the number was even.
CLC LDA ARG2 ADC RESULTL STA RESULTL LDA TEMP ADC RESULTH STA RESULTH
Now we double the second argument. Remember that decimal mode is on here, regardless of which path we took getting here.
DOUBLE CLC LDA ARG2 ADC ARG2 STA ARG2 LDA TEMP ADC TEMP STA TEMP CLD
We're at the end of the loop. Go back if there's more, else we're done.
DEX BNE LOOP RTS
Again, we avoid the usual grade school method for long division, and rely on a Russian-Peasant-style algorithm. Fortunately the Russian Peasant multiplication algorithm is easily reversible:
Again, if you do this in base 2, it's the same as the grade school method. If you look very carefully, you can see that it works by going backwards up the doubling column in the Russian Peasant algorithm, figuring out which entries in the halving column need to be odd by figuring out which doublings were added into the dividend.
The routine below will divide the two-byte (four-digit) number in
NUML, NUMH by
DENOML, DENOMH, leaving the
QUOTL, QUOTH and the remainder in
REMH. All these labels should be declared in page zero for
efficiency, but this not necessary.
First we initialize the quotient and remainder:
LDA #0 STA QUOTL STA QUOTH LDA NUML STA REML LDA NUMH STA REMH
Next we test for division by 0, and for the denominator greater than the numerator:
LDA DENOML ORA DENOMH BNE TESTND RET RTS TESTND LDA NUML CMP DENOML LDA NUMH SBC DENOMH BCC RET
Now count how many doublings are required to make the denominator greater than the numerator. If the denominator overflows, assume it's greater than the numerator:
LDX #0 COUNTD INX SED CLC LDA DENOML ADC DENOML STA DENOML LDA DEMONH ADC DENOMH STA DENOMH CLD BCS LOOP LDA NUML CMP DENOML LDA NUMH SBC DENOMH BCS COUNTD
Halve the denominator. Watch out--the carry flag at the beginning of this code is important!
LOOP LDA DENOMH JSR HALF STA DENOMH LDA DENOML JSR HALF STA DENOML
Double the quotient. Note that we can skip clearing the carry before
the adds because
DENOML, DENOMH is guaranteed to be an
even number before the above halving step.
SED LDA QUOTL ADC QUOTL STA QUOTL LDA QUOTH ADC QUOTH STA QUOTH
Try subtracting the denominator from the remainder:
SEC LDA REML SBC DENOML TAY LDA REMH SBC DENOMH CLD BCC NEXTX
If the subtraction succeeded, replace the remainder with the result:
STA REMH STY REML
...and add 1 to the quotient (since the low bit of the quotient is guaranteed to be 0 at this point, we don't need to go into decimal mode, or worry about a carry into the high byte):
Count down the number of doublings, and go back if there were
more, otherwise we're done. Note that the extra
instruction is needed to ensure that the carry flag is right at the
NEXTX CLC DEX BNE LOOP RTS
Finally, we need our halving tool as a subroutine:
HALF ROR N08 PHP BPL HALF2 SEC SBC #$30 HALF2 BIT N08 BEQ PULLP SEC SBC #3 PULLP PLP RTS
The above routines are certainly not the only way to do decimal multiplication and division. Multiplication can be implememted in a much smaller and easier-to-understand routine than the above simply by adding the multiplicand to itself repeatedly, as many times as the multiplier indicates. Likewise, a much simpler division routine would be to count how many times the denominator can be subtracted from the numerator. But multiplying that way is extremely inefficient if the multiplier is large, and dividing like that is slow if the denominator is much smaller than the numerator, and the problem becomes much worse if you need to work with numbers larger than one or two bytes. With the routines presented above, at least the worst case isn't much worse than the best case, and the methods scale much more nicely to wider operands.
Of course neither of the above routines will win any awards for
speed. For maximum speed the best approach is probably to avoid
algorithmic approaches like the above and rely on table lookups. For
example, a very efficient multiplication routine can be written using
4*A*B = (A+B)^2 - (A-B)^2, for which you only need
to sacrifice about 1K of memory for a table of squares, and the code
for it in decimal mode isn't much more complicated than the code in
binary mode. But this article is already too long, so I leave
implementing it as an exercise for the reader.
LLX > Neil Parker > Apple II > Decimal Multiplication & Division
Original: August 31, 2005
Modified: June 26, 2016--fixed a couple of typos