LLX > Neil Parker > Apple II > Decimal Multiplication & Division

Many introductions to 6502 programming describe how to do BCD (binary-coded decimal) addition and subtraction by turning on the decimal flag, and may mention the original 6502's decimal mode bug (the N, V, and Z flags aren't reliable after a decimal operation). But that's usually all they have to say on the subject...multiplication and division in decimal mode are never described, even if binary multiplication and division are.

After a bit of experimenting, I now understand the reason for this oversight: Decimal operations other than simple addition and subtraction are a pain in the butt on the 6502, requiring substantially more code than their binary equivalents. The routines end up being just too long for the average introductory chapter on machine language arithmetic.

But adventurous souls that we are, we'll forge ahead undaunted by such concerns. Decimal multiplication and division on the 6502 are not as difficult as some might think, despite the previous paragraph's pessimism.

We'll need a couple of tools to get the job done. First, we'll need to be
able to multiply a decimal number by 2. We can't use a simple
`ASL`

or `ROL`

for this, since the 6502's shift
and rotate instructions always operate in binary mode regardless of the
D flag, but we can do it easily by adding the number to itself:

SED LDA NUMBER ADC NUMBER STA NUMBER CLD

Dividing a decimal number by 2 is a little harder, but we can do it
with a shift and some post-processing on the result: If any decimal
digit of the result is 8 or greater, subtract 3 from the digit. The
`BIT N08`

line could be considered a trick--we need to test
the $08 bit of the accumulator, and luckily we have a `PHP`

instruction nearby, whose opcode happens to be $08. (On the 65C02 and
65816, one can say `BIT #8`

instead, eliminating the need
for this trick, but for maximum portability this article uses only
instructions that work on *all* 6502s.)

ROR N08 PHP BPL HALF2 SEC SBC #$30 HALF2 BIT N08 BEQ HALF3 SEC SBC #3 HALF3 PLP

Note that this is done with the D flag *off*.

These two tools, together with decimal addition and subtraction, can
be used to build routines for multiplying or dividing by any decimal
number. Already, though, we can begin to see why the decimal routines
are so long...both of the above tools do jobs that can be done with a
*single shift instruction* in binary mode.

A reasonably obvious algorithm for decimal multiplication would be to implement the usual "grade school" method for multiplying with pencil and paper. On the plus side, this is reasonably straightforward for numbers of almost any size, but on the minus side, manipulating the individual digits of a BCD number is somewhat inconvenient, and we still need a subroutine to do decimal multiplication of single digits.

So instead, the routine below implements the so-called "Russian Peasant" algorithm, which goes like this:

- Write the two numbers to be multiplied side by side.
- Divide the left-hand number by two, dropping fractions, and write the answer underneath it. Then divide the new number by two again (dropping fractions), and write the result underneath it again. Repeat until you get down to 1.
- Double the right-hand number, and write the result underneath it, next to the first halved number in the left-hand column. Then double this new number again, and write the result under the previous number, and repeat until there are as many doubled numbers in the right hand column as there are halved numbers in the left-hand column.
- Find the numbers in the right-hand (doubling) column that are across
from
*odd*numbers in the left-hand (halving) column. Add up these numbers. The total is the product of the original two numbers.

For example, suppose we want to multiply 35 by 43:

35 43 * 17 86 * 8 172 4 344 2 688 1 1376 *

The starred numbers on the right are the ones opposite odd numbers on the left, so we add 43+86+1376, and get 1505, which is the correct answer.

This actually turns out to be identical to the grade school
algorithm in base 2, but by doing it in terms of doublings and
halvings instead of digits, we have a method that works without change
in *any* base, and no auxiliary subroutine is needed to multiply
single decimal digits.

By now it should be fairly obvious where we're going with this, so without further ado, let's talk about some actual code.

The routine below multiplies two one-byte (two-digit) decimal
decimal numbers in memory locations `ARG1`

and
`ARG2`

, storing the result in memory locations
`RESULTL`

(low byte) and `RESULTH`

(high byte).
An additional memory location `TEMP`

is also used
temporarily as the high byte of `ARG2`

. For
maximum efficiency all these labels should be declared on page zero,
but this is not required.

First we zero out the result and temporary space, and start the loop. We need only seven passes instead of the usual eight, because the largest two-digit decimal number is 99, which can be halved down to 0 in only seven halvings.

LDA #0 STA RESULTL STA RESULTH STA TEMP LDX #7

We see whether the first argument was odd by halving it. If a 1 bit
falls off the end into the carry flag, the number was odd. This is
actually a truncated version of our halving routine from above--we
only need to check whether the low digit needs adjustment, because the
high digit can never be greater than 4 after the `LSR`

.

LOOP LDA ARG1 LSR N08 PHP BIT N08 BEQ PULLP SEC SBC #3 PULLP PLP STA ARG1 SED BCC DOUBLE

If the number was odd, we add the (shifted) second argument to the result. This block of code is skipped if the number was even.

CLC LDA ARG2 ADC RESULTL STA RESULTL LDA TEMP ADC RESULTH STA RESULTH

Now we double the second argument. Remember that decimal mode is on here, regardless of which path we took getting here.

DOUBLE CLC LDA ARG2 ADC ARG2 STA ARG2 LDA TEMP ADC TEMP STA TEMP CLD

We're at the end of the loop. Go back if there's more, else we're done.

DEX BNE LOOP RTS

Again, we avoid the usual grade school method for long division, and rely on a Russian-Peasant-style algorithm. Fortunately the Russian Peasant multiplication algorithm is easily reversible:

- Start by setting the quotient to 0, and the remainder to the dividend.
- If the divisor is 0, exit with an error. If the divisor is greater than the dividend, exit with no error.
- Repeatedly double the divisor until it's greater than the dividend.
Let
*X*be the number of doublings required. - Repeat
*X*times:- Halve the divisor.
- Double the quotient.
- If the divisor is less than or equal to the remainder, subtract the divisor from the remainder, and add 1 to the quotient.

Again, if you do this in base 2, it's the same as the grade school method. If you look very carefully, you can see that it works by going backwards up the doubling column in the Russian Peasant algorithm, figuring out which entries in the halving column need to be odd by figuring out which doublings were added into the dividend.

The routine below will divide the two-byte (four-digit) number in
`NUML, NUMH`

by `DENOML, DENOMH`

, leaving the
quotient in `QUOTL, QUOTH`

and the remainder in ```
REML,
REMH
```

. All these labels should be declared in page zero for
efficiency, but this not necessary.

First we initialize the quotient and remainder:

LDA #0 STA QUOTL STA QUOTH LDA NUML STA REML LDA NUMH STA REMH

Next we test for division by 0, and for the denominator greater than the numerator:

LDA DENOML ORA DENOMH BNE TESTND RET RTS TESTND LDA NUML CMP DENOML LDA NUMH SBC DENOMH BCC RET

Now count how many doublings are required to make the denominator greater than the numerator. If the denominator overflows, assume it's greater than the numerator:

LDX #0 COUNTD INX SED CLC LDA DENOML ADC DENOML STA DENOML LDA DEMONH ADC DENOMH STA DENOMH CLD BCS LOOP LDA NUML CMP DENOML LDA NUMH SBC DENOMH BCS COUNTD

Halve the denominator. Watch out--the carry flag at the beginning of this code is important!

LOOP LDA DENOMH JSR HALF STA DENOMH LDA DENOML JSR HALF STA DENOML

Double the quotient. Note that we can skip clearing the carry before
the adds because `DENOML, DENOMH`

is guaranteed to be an
even number before the above halving step.

SED LDA QUOTL ADC QUOTL STA QUOTL LDA QUOTH ADC QUOTH STA QUOTH

Try subtracting the denominator from the remainder:

SEC LDA REML SBC DENOML TAY LDA REMH SBC DENOMH CLD BCC NEXTX

If the subtraction succeeded, replace the remainder with the result:

STA REMH STY REML

...and add 1 to the quotient (since the low bit of the quotient is guaranteed to be 0 at this point, we don't need to go into decimal mode, or worry about a carry into the high byte):

INC QUOTL

Count down the number of doublings, and go back if there were
more, otherwise we're done. Note that the extra `CLC`

instruction is needed to ensure that the carry flag is right at the
label `LOOP`

.

NEXTX CLC DEX BNE LOOP RTS

Finally, we need our halving tool as a subroutine:

HALF ROR N08 PHP BPL HALF2 SEC SBC #$30 HALF2 BIT N08 BEQ PULLP SEC SBC #3 PULLP PLP RTS

The above routines are certainly not the only way to do decimal multiplication and division. Multiplication can be implememted in a much smaller and easier-to-understand routine than the above simply by adding the multiplicand to itself repeatedly, as many times as the multiplier indicates. Likewise, a much simpler division routine would be to count how many times the denominator can be subtracted from the numerator. But multiplying that way is extremely inefficient if the multiplier is large, and dividing like that is slow if the denominator is much smaller than the numerator, and the problem becomes much worse if you need to work with numbers larger than one or two bytes. With the routines presented above, at least the worst case isn't much worse than the best case, and the methods scale much more nicely to wider operands.

Of course neither of the above routines will win any awards for
speed. For maximum speed the best approach is probably to avoid
algorithmic approaches like the above and rely on table lookups. For
example, a very efficient multiplication routine can be written using
the identity `4*A*B = (A+B)^2 - (A-B)^2`

, for which you only need
to sacrifice about 1K of memory for a table of squares, and the code
for it in decimal mode isn't much more complicated than the code in
binary mode. But this article is already too long, so I leave
implementing it as an exercise for the reader.

LLX > Neil Parker > Apple II > Decimal Multiplication & Division

Original: August 31, 2005

Modified: June 26, 2016--fixed a couple of typos