How can I perform 64-bit division with a 32-bit divide instruction?

math assembly cpu-architecture fixed-point integer-division

11,051

Solution 1

GCC has such a routine for many processors, named _divdi3 (usually implemented using a common divmod call). Here's one. Some Unix kernels have an implementation too, e.g. FreeBSD.

Solution 2

If your dividend is unsigned 64 bits, your divisor is unsigned 32 bits, the architecture is i386 (x86), the div assembly instruction can help you with some preparation:

#include <stdint.h>
/* Returns *a % b, and sets *a = *a_old / b; */
uint32_t UInt64DivAndGetMod(uint64_t *a, uint32_t b) {
#ifdef __i386__  /* u64 / u32 division with little i386 machine code. */
  uint32_t upper = ((uint32_t*)a)[1], r;
  ((uint32_t*)a)[1] = 0;
  if (upper >= b) {   
    ((uint32_t*)a)[1] = upper / b;
    upper %= b;
  }
  __asm__("divl %2" : "=a" (((uint32_t*)a)[0]), "=d" (r) :
      "rm" (b), "0" (((uint32_t*)a)[0]), "1" (upper));
  return r;
#else
  const uint64_t q = *a / b;  /* Calls __udivdi3 in libgcc. */
  const uint32_t r = *a - b * q;  /* `r = *a % b' would use __umoddi3. */
  *a = q;
  return r;
#endif
}

If the line above with __udivdi3 doesn't compile for you, use the __div64_32 function from the Linux kernel: https://github.com/torvalds/linux/blob/master/lib/div64.c

11,051

RunnerPack

Updated on May 01, 2022

Comments

RunnerPack about 2 years

This is (AFAIK) a specific question within this general topic.

Here's the situation:

I have an embedded system (a video game console) based on a 32-bit RISC microcontroller (a variant of NEC's V810). I want to write a fixed-point math library. I read this article, but the accompanying source code is written in 386 assembly, so it's neither directly usable nor easily modifiable.

The V810 has built-in integer multiply/divide, but I want to use the 18.14 format mentioned in the above article. This requires dividing a 64-bit int by a 32-bit int, and the V810 only does (signed or unsigned) 32-bit/32-bit division (which produces a 32-bit quotient and a 32-bit remainder).

So, my question is: how do I simulate a 64-bit/32-bit divide with a 32-bit/32-bit one (to allow for the pre-shifting of the dividend)? Or, to look at the problem from another way, what's the best way to divide an 18.14 fixed-point by another using standard 32-bit arithmetic/logic operations? ("best" meaning fastest, smallest, or both).

Algebra, (V810) assembly, and pseudo-code are all fine. I will be calling the code from C.

Thanks in advance!

EDIT: Somehow I missed this question... However, it will still need some modification to be super-efficient (it has to be faster than the floating-point div provided by the v810, though it may already be...), so feel free to do my work for me in exchange for reputation points ;) (and credit in my library documentation, of course).
RunnerPack almost 14 years

This seems to be exactly what I needed. Thanks for linking to the relevant code! BTW, I'm using GCC, but I'm using newlib, which doesn't include this stuff.
Peter Cordes almost 3 years

Those pointer-casts are unsafe; strict-aliasing violation. Use shifts to split up a uint64_t into uint32_t halves, or shift/OR to combine. Or use "=A" to make the compiler pick EDX:EAX for a 64-bit integer in 32-bit mode. (Beware that in 64-bit mode, the same constraint allows the compiler a choice of RAX or RDX).