LLX > Neil Parker > Apple II > Decimal Multiplication & Division

# Multiplying and Dividing in Decimal Mode on the 6502

## Introduction

Many introductions to 6502 programming describe how to do BCD (binary-coded decimal) addition and subtraction by turning on the decimal flag, and may mention the original 6502's decimal mode bug (the N, V, and Z flags aren't reliable after a decimal operation). But that's usually all they have to say on the subject...multiplication and division in decimal mode are never described, even if binary multiplication and division are.

After a bit of experimenting, I now understand the reason for this oversight: Decimal operations other than simple addition and subtraction are a pain in the butt on the 6502, requiring substantially more code than their binary equivalents. The routines end up being just too long for the average introductory chapter on machine language arithmetic.

But adventurous souls that we are, we'll forge ahead undaunted by such concerns. Decimal multiplication and division on the 6502 are not as difficult as some might think, despite the previous paragraph's pessimism.

## Our Toolkit

We'll need a couple of tools to get the job done. First, we'll need to be able to multiply a decimal number by 2. We can't use a simple `ASL` or `ROL` for this, since the 6502's shift and rotate instructions always operate in binary mode regardless of the D flag, but we can do it easily by adding the number to itself:

```        SED
LDA NUMBER
STA NUMBER
CLD```

Dividing a decimal number by 2 is a little harder, but we can do it with a shift and some post-processing on the result: If any decimal digit of the result is 8 or greater, subtract 3 from the digit. The `BIT N08` line could be considered a trick--we need to test the \$08 bit of the accumulator, and luckily we have a `PHP` instruction nearby, whose opcode happens to be \$08. (On the 65C02 and 65816, one can say `BIT #8` instead, eliminating the need for this trick, but for maximum portability this article uses only instructions that work on all 6502s.)

```        ROR
N08     PHP
BPL HALF2
SEC
SBC #\$30
HALF2   BIT N08
BEQ HALF3
SEC
SBC #3
HALF3   PLP```

Note that this is done with the D flag off.

These two tools, together with decimal addition and subtraction, can be used to build routines for multiplying or dividing by any decimal number. Already, though, we can begin to see why the decimal routines are so long...both of the above tools do jobs that can be done with a single shift instruction in binary mode.

## Multiplication

A reasonably obvious algorithm for decimal multiplication would be to implement the usual "grade school" method for multiplying with pencil and paper. On the plus side, this is reasonably straightforward for numbers of almost any size, but on the minus side, manipulating the individual digits of a BCD number is somewhat inconvenient, and we still need a subroutine to do decimal multiplication of single digits.

So instead, the routine below implements the so-called "Russian Peasant" algorithm, which goes like this:

1. Write the two numbers to be multiplied side by side.
2. Divide the left-hand number by two, dropping fractions, and write the answer underneath it. Then divide the new number by two again (dropping fractions), and write the result underneath it again. Repeat until you get down to 1.
3. Double the right-hand number, and write the result underneath it, next to the first halved number in the left-hand column. Then double this new number again, and write the result under the previous number, and repeat until there are as many doubled numbers in the right hand column as there are halved numbers in the left-hand column.
4. Find the numbers in the right-hand (doubling) column that are across from odd numbers in the left-hand (halving) column. Add up these numbers. The total is the product of the original two numbers.

For example, suppose we want to multiply 35 by 43:

```          35     43 *
17     86 *
8    172
4    344
2    688
1   1376 *```

The starred numbers on the right are the ones opposite odd numbers on the left, so we add 43+86+1376, and get 1505, which is the correct answer.

This actually turns out to be identical to the grade school algorithm in base 2, but by doing it in terms of doublings and halvings instead of digits, we have a method that works without change in any base, and no auxiliary subroutine is needed to multiply single decimal digits.

By now it should be fairly obvious where we're going with this, so without further ado, let's talk about some actual code.

The routine below multiplies two one-byte (two-digit) decimal decimal numbers in memory locations `ARG1` and `ARG2`, storing the result in memory locations `RESULTL` (low byte) and `RESULTH` (high byte). An additional memory location `TEMP` is also used temporarily as the high byte of `ARG2`. For maximum efficiency all these labels should be declared on page zero, but this is not required.

First we zero out the result and temporary space, and start the loop. We need only seven passes instead of the usual eight, because the largest two-digit decimal number is 99, which can be halved down to 0 in only seven halvings.

```        LDA #0
STA RESULTL
STA RESULTH
STA TEMP
LDX #7```

We see whether the first argument was odd by halving it. If a 1 bit falls off the end into the carry flag, the number was odd. This is actually a truncated version of our halving routine from above--we only need to check whether the low digit needs adjustment, because the high digit can never be greater than 4 after the `LSR`.

```LOOP    LDA ARG1
LSR
N08     PHP
BIT N08
BEQ PULLP
SEC
SBC #3
PULLP   PLP
STA ARG1
SED
BCC DOUBLE```

If the number was odd, we add the (shifted) second argument to the result. This block of code is skipped if the number was even.

```        CLC
LDA ARG2
STA RESULTL
LDA TEMP
STA RESULTH```

Now we double the second argument. Remember that decimal mode is on here, regardless of which path we took getting here.

```DOUBLE  CLC
LDA ARG2
STA ARG2
LDA TEMP
STA TEMP
CLD```

We're at the end of the loop. Go back if there's more, else we're done.

```        DEX
BNE LOOP
RTS```

## Division

Again, we avoid the usual grade school method for long division, and rely on a Russian-Peasant-style algorithm. Fortunately the Russian Peasant multiplication algorithm is easily reversible:

1. Start by setting the quotient to 0, and the remainder to the dividend.
2. If the divisor is 0, exit with an error. If the divisor is greater than the dividend, exit with no error.
3. Repeatedly double the divisor until it's greater than the dividend. Let X be the number of doublings required.
4. Repeat X times:
1. Halve the divisor.
2. Double the quotient.
3. If the divisor is less than or equal to the remainder, subtract the divisor from the remainder, and add 1 to the quotient.

Again, if you do this in base 2, it's the same as the grade school method. If you look very carefully, you can see that it works by going backwards up the doubling column in the Russian Peasant algorithm, figuring out which entries in the halving column need to be odd by figuring out which doublings were added into the dividend.

The routine below will divide the two-byte (four-digit) number in `NUML, NUMH` by `DENOML, DENOMH`, leaving the quotient in `QUOTL, QUOTH` and the remainder in ```REML, REMH```. All these labels should be declared in page zero for efficiency, but this not necessary.

First we initialize the quotient and remainder:

```        LDA #0
STA QUOTL
STA QUOTH
LDA NUML
STA REML
LDA NUMH
STA REMH```

Next we test for division by 0, and for the denominator greater than the numerator:

```        LDA DENOML
ORA DENOMH
BNE TESTND
RET     RTS
TESTND  LDA NUML
CMP DENOML
LDA NUMH
SBC DENOMH
BCC RET```

Now count how many doublings are required to make the denominator greater than the numerator. If the denominator overflows, assume it's greater than the numerator:

```        LDX #0
COUNTD  INX
SED
CLC
LDA DENOML
STA DENOML
LDA DEMONH
STA DENOMH
CLD
BCS LOOP
LDA NUML
CMP DENOML
LDA NUMH
SBC DENOMH
BCS COUNTD```

Halve the denominator. Watch out--the carry flag at the beginning of this code is important!

```LOOP    LDA DENOMH
JSR HALF
STA DENOMH
LDA DENOML
JSR HALF
STA DENOML```

Double the quotient. Note that we can skip clearing the carry before the adds because `DENOML, DENOMH` is guaranteed to be an even number before the above halving step.

```        SED
LDA QUOTL
STA QUOTL
LDA QUOTH
STA QUOTH```

Try subtracting the denominator from the remainder:

```        SEC
LDA REML
SBC DENOML
TAY
LDA REMH
SBC DENOMH
CLD
BCC NEXTX```

If the subtraction succeeded, replace the remainder with the result:

```        STA REMH
STY REML```

...and add 1 to the quotient (since the low bit of the quotient is guaranteed to be 0 at this point, we don't need to go into decimal mode, or worry about a carry into the high byte):

`        INC QUOTL`

Count down the number of doublings, and go back if there were more, otherwise we're done. Note that the extra `CLC` instruction is needed to ensure that the carry flag is right at the label `LOOP`.

```NEXTX   CLC
DEX
BNE LOOP
RTS```

Finally, we need our halving tool as a subroutine:

```HALF    ROR
N08     PHP
BPL HALF2
SEC
SBC #\$30
HALF2   BIT N08
BEQ PULLP
SEC
SBC #3
PULLP   PLP
RTS```

## Conclusion

The above routines are certainly not the only way to do decimal multiplication and division. Multiplication can be implememted in a much smaller and easier-to-understand routine than the above simply by adding the multiplicand to itself repeatedly, as many times as the multiplier indicates. Likewise, a much simpler division routine would be to count how many times the denominator can be subtracted from the numerator. But multiplying that way is extremely inefficient if the multiplier is large, and dividing like that is slow if the denominator is much smaller than the numerator, and the problem becomes much worse if you need to work with numbers larger than one or two bytes. With the routines presented above, at least the worst case isn't much worse than the best case, and the methods scale much more nicely to wider operands.

Of course neither of the above routines will win any awards for speed. For maximum speed the best approach is probably to avoid algorithmic approaches like the above and rely on table lookups. For example, a very efficient multiplication routine can be written using the identity `4*A*B = (A+B)^2 - (A-B)^2`, for which you only need to sacrifice about 1K of memory for a table of squares, and the code for it in decimal mode isn't much more complicated than the code in binary mode. But this article is already too long, so I leave implementing it as an exercise for the reader.

LLX > Neil Parker > Apple II > Decimal Multiplication & Division

Original: August 31, 2005
Modified: June 26, 2016--fixed a couple of typos