32-Bit Multiplication with a 64-Bit Result

Coded in Thumb (ARM) | July 15, 2011
I worked on a 32-bit microcontroller for a summer intern project. Due to the limitations of the ARM Thumb code, multiplying 32-bit numbers results in a loss of precision if the answer can only be represented as a 64-bit number. The multiplying instruction, MULS, simply truncates the top 32 bits of the answer. To solve this, I figured out a way to get a 64-bit number resulting from multiplying two 32-bit inputs, using 32-bit registers to represent the result.

Introduction

I first tried working this out on paper before getting it to work in Thumb. Say we want to multiply two numbers A and B. Both A and B will each take up one 32-bit register. Since A and B are represented in binary, they can be split into a top half and bottom half, like so.
        A          *          B          =      Result
 ---------------       ---------------       -----------
| A_top | A_bot |  *  | B_top | B_bot |  =  | Top | Bot |
 ---------------       ---------------       -----------
This can then be taken as a product of two sums.
(A_top<<16 + A_bot) * (B_top<<16 + B_bot)
    = (A_top<<16)(B_top<<16) + (A_top<<16)(B_bot) + (B_top<<16)(A_bot) + A_bot*B_bot
    = (A_top*B_top)<<32 + (A_top*B_bot + B_top*A_bot)<<16 + A_bot*B_bot
Top can be set as (A_top*B_top), since (A_top*B_top) << 32 moves it over by one register. Similarly, Bot can be set as (A_bot*B_bot).
LSRS R7, R6, #16   ; top half of Mid
ADD  R1, R7        ; add to Top
The hard part is the middle 32 bits, (A_top*B_bot + B_top*A_bot) << 16, since half of it falls in Top and half falls in Bot. This can be dealt with by first adding the top half of (A_top*B_bot + B_top*A_bot) << 16 (which we will call Mid from now on) to Top. To get this top half, shift Mid right by 16 places.
Next, add the bottom half of Mid to Bot. To get this bottom half, shift Mid left by 16 places. This will result with the bottom half of Mid followed by 16 zeros, which is what we want.

Checking for Overflow

The problem with only using 32-bit registers is that it is possible that overflow will occur. The overflow amount needs to be added to the right registers to get the correct result.
LSLS R7, R6, #16   ; bottom half of Mid
ADDS R0, R7        ; add to Bot
ADC  R1, R7        ; add the carry to Top
The first part to check is the result of adding the bottom half of Mid to Bot. Add 1 (the carry) to Top if there is an overflow. If (A_top*B_bot + B_top*A_bot) also causes an overflow, it also has to be added to Top.
ADDS R6, R7, R6    ; A_top*B_bot + B_top*A_bot stored in R6
ADC  R1, R7        ; Overflow added to Top
This code does not deal with negative numbers. Negation of 64-Bit Numbers with 32-Bit Registers has some information on this.