Count leading zeroes in an Int32

13,758

Solution 1

NOTE Using dotnet core >=3.0? Look here.

Let's take the number 20 as an example. It can be stated in binary as follows:

    00000000000000000000000000010100

First we "smear" the most significant bit over the lower bit positions by right shifting and bitwise-ORing over itself.

    00000000000000000000000000010100
 or 00000000000000000000000000001010 (right-shifted by 1)
 is 00000000000000000000000000011110

then

    00000000000000000000000000011110
 or 00000000000000000000000000000111 (right-shifted by 2)
 is 00000000000000000000000000011111

Here, because it's a small number, we've already completed the job, but by continuing the process with shifts of 4, 8 and 16 bits, we can ensure that for any 32-bit number, we have set all of the bits from 0 to the MSB of the original number to 1.

Now, if we count the number of 1s in our "smeared" result, we can simply subtract it from 32, and we are left with the number of leading zeros in the original value.

How do we count the number of set bits in an integer? This page has a magical algorithm for doing just that ("a variable-precision SWAR algorithm to perform a tree reduction"... if you get it, you're cleverer than me!), which translates to C# as follows:

int PopulationCount(int x)
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);
    return (x & 0x0000003f);
}

By inlining this method with our "smearing" method above, we can produce a very fast, loop-free and conditional-free method for counting the leading zeros of an integer.

int LeadingZeros(int x)
{
    const int numIntBits = sizeof(int) * 8; //compile time constant
    //do the smearing
    x |= x >> 1; 
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    //count the ones
    x -= x >> 1 & 0x55555555;
    x = (x >> 2 & 0x33333333) + (x & 0x33333333);
    x = (x >> 4) + x & 0x0f0f0f0f;
    x += x >> 8;
    x += x >> 16;
    return numIntBits - (x & 0x0000003f); //subtract # of 1s from 32
}

Solution 2

In .NET Core 3.0 there are BitOperations.LeadingZeroCount() and BitOperations.TrailingZeroCount() which map to the x86's LZCNT/BSR and TZCNT/BSF directly. As a result currently they'll be the most efficient solution

Solution 3

If you'd like to mix assembly code in for peak performance. Here's how you do that in C#.

First the supporting code to make it possible:

using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;
using static System.Runtime.CompilerServices.MethodImplOptions;

/// <summary> Gets the position of the right most non-zero bit in a UInt32.  </summary>
[MethodImpl(AggressiveInlining)] public static int BitScanForward(UInt32 mask) => _BitScanForward32(mask);

/// <summary> Gets the position of the left most non-zero bit in a UInt32.  </summary>
[MethodImpl(AggressiveInlining)] public static int BitScanReverse(UInt32 mask) => _BitScanReverse32(mask);


[DllImport("kernel32.dll", SetLastError = true)]
private static extern IntPtr VirtualAlloc(IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);

private static TDelegate GenerateX86Function<TDelegate>(byte[] x86AssemblyBytes) {
    const uint PAGE_EXECUTE_READWRITE = 0x40;
    const uint ALLOCATIONTYPE_MEM_COMMIT = 0x1000;
    const uint ALLOCATIONTYPE_RESERVE = 0x2000;
    const uint ALLOCATIONTYPE = ALLOCATIONTYPE_MEM_COMMIT | ALLOCATIONTYPE_RESERVE;
    IntPtr buf = VirtualAlloc(IntPtr.Zero, (uint)x86AssemblyBytes.Length, ALLOCATIONTYPE, PAGE_EXECUTE_READWRITE);
    Marshal.Copy(x86AssemblyBytes, 0, buf, x86AssemblyBytes.Length);
    return (TDelegate)(object)Marshal.GetDelegateForFunctionPointer(buf, typeof(TDelegate));
}

Then here's the assembly to generate the functions:

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate Int32 BitScan32Delegate(UInt32 inValue);

private static BitScan32Delegate _BitScanForward32 = (new Func<BitScan32Delegate>(() => { //IIFE   
   BitScan32Delegate del = null;
   if(IntPtr.Size == 4){
      del = GenerateX86Function<BitScan32Delegate>(
         x86AssemblyBytes: new byte[20] {
         //10: int32_t BitScanForward(uint32_t inValue) {
            0x51,                                       //51                   push        ecx  
            //11:    unsigned long i;
            //12:    return _BitScanForward(&i, inValue) ? i : -1;
            0x0F, 0xBC, 0x44, 0x24, 0x08,               //0F BC 44 24 08       bsf         eax,dword ptr [esp+8] 
            0x89, 0x04, 0x24,                           //89 04 24             mov         dword ptr [esp],eax 
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,               //B8 FF FF FF FF       mov         eax,-1               
            0x0F, 0x45, 0x04, 0x24,                     //0F 45 04 24          cmovne      eax,dword ptr [esp]
            0x59,                                       //59                   pop         ecx 
            //13: }
            0xC3,                                       //C3                   ret  
      });
   } else if(IntPtr.Size == 8){
      del = GenerateX86Function<BitScan32Delegate>( 
         //This code also will work for UInt64 bitscan.
         // But I have it limited to UInt32 via the delegate because UInt64 bitscan would fail in a 32bit dotnet process.  
            x86AssemblyBytes: new byte[13] {
            //15:    unsigned long i;
            //16:    return _BitScanForward64(&i, inValue) ? i : -1; 
            0x48, 0x0F, 0xBC, 0xD1,            //48 0F BC D1          bsf         rdx,rcx
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,      //B8 FF FF FF FF       mov         eax,-1 
            0x0F, 0x45, 0xC2,                  //0F 45 C2             cmovne      eax,edx  
            //17: }
            0xC3                              //C3                   ret 
         });
   }
   return del;
}))();


private static BitScan32Delegate _BitScanReverse32 = (new Func<BitScan32Delegate>(() => { //IIFE   
   BitScan32Delegate del = null;
   if(IntPtr.Size == 4){
      del = GenerateX86Function<BitScan32Delegate>(
         x86AssemblyBytes: new byte[20] {
            //18: int BitScanReverse(unsigned int inValue) {
            0x51,                                       //51                   push        ecx  
            //19:    unsigned long i;
            //20:    return _BitScanReverse(&i, inValue) ? i : -1;
            0x0F, 0xBD, 0x44, 0x24, 0x08,               //0F BD 44 24 08       bsr         eax,dword ptr [esp+8] 
            0x89, 0x04, 0x24,                           //89 04 24             mov         dword ptr [esp],eax 
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,               //B8 FF FF FF FF       mov         eax,-1  
            0x0F, 0x45, 0x04, 0x24,                     //0F 45 04 24          cmovne      eax,dword ptr [esp]  
            0x59,                                       //59                   pop         ecx 
            //21: }
            0xC3,                                       //C3                   ret  
      });
   } else if(IntPtr.Size == 8){
      del = GenerateX86Function<BitScan32Delegate>( 
         //This code also will work for UInt64 bitscan.
         // But I have it limited to UInt32 via the delegate because UInt64 bitscan would fail in a 32bit dotnet process. 
            x86AssemblyBytes: new byte[13] {
            //23:    unsigned long i;
            //24:    return _BitScanReverse64(&i, inValue) ? i : -1; 
            0x48, 0x0F, 0xBD, 0xD1,            //48 0F BD D1          bsr         rdx,rcx 
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,      //B8 FF FF FF FF       mov         eax,-1
            0x0F, 0x45, 0xC2,                  //0F 45 C2             cmovne      eax,edx  
            //25: }
            0xC3                              //C3                   ret 
         });
   }
   return del;
}))();

In order to generate the assembly I started a new VC++ project, created the functions I wanted, then went to Debug-->Windows-->Disassembly. For compiler options I disabled inlining, enabled intrinsics, favored fast code, omitted frame pointers, disabled security checks and SDL checks. The code for that is:

#include "stdafx.h"
#include <intrin.h>  

#pragma intrinsic(_BitScanForward)  
#pragma intrinsic(_BitScanReverse) 
#pragma intrinsic(_BitScanForward64)  
#pragma intrinsic(_BitScanReverse64) 


__declspec(noinline) int _cdecl BitScanForward(unsigned int inValue) {
    unsigned long i;
    return _BitScanForward(&i, inValue) ? i : -1; 
}
__declspec(noinline) int _cdecl BitScanForward64(unsigned long long inValue) {
    unsigned long i;
    return _BitScanForward64(&i, inValue) ? i : -1;
}
__declspec(noinline) int _cdecl BitScanReverse(unsigned int inValue) {
    unsigned long i;
    return _BitScanReverse(&i, inValue) ? i : -1; 
}
__declspec(noinline) int _cdecl BitScanReverse64(unsigned long long inValue) {
    unsigned long i;
    return _BitScanReverse64(&i, inValue) ? i : -1;
}

Solution 4

Look at https://chessprogramming.wikispaces.com/BitScan for good info on bitscanning.

If you're able to mix assembly code then use the modern LZCNT, TZCNT and POPCNT processor commands.

Other than that take a look at Java's implementation for Integer.

/**
 * Returns the number of zero bits preceding the highest-order
 * ("leftmost") one-bit in the two's complement binary representation
 * of the specified {@code int} value.  Returns 32 if the
 * specified value has no one-bits in its two's complement representation,
 * in other words if it is equal to zero.
 *
 * <p>Note that this method is closely related to the logarithm base 2.
 * For all positive {@code int} values x:
 * <ul>
 * <li>floor(log<sub>2</sub>(x)) = {@code 31 - numberOfLeadingZeros(x)}
 * <li>ceil(log<sub>2</sub>(x)) = {@code 32 - numberOfLeadingZeros(x - 1)}
 * </ul>
 *
 * @param i the value whose number of leading zeros is to be computed
 * @return the number of zero bits preceding the highest-order
 *     ("leftmost") one-bit in the two's complement binary representation
 *     of the specified {@code int} value, or 32 if the value
 *     is equal to zero.
 * @since 1.5
 */
public static int numberOfLeadingZeros(int i) {
    // HD, Figure 5-6
    if (i == 0)
        return 32;
    int n = 1;
    if (i >>> 16 == 0) { n += 16; i <<= 16; }
    if (i >>> 24 == 0) { n +=  8; i <<=  8; }
    if (i >>> 28 == 0) { n +=  4; i <<=  4; }
    if (i >>> 30 == 0) { n +=  2; i <<=  2; }
    n -= i >>> 31;
    return n;
}

Solution 5

Try this:

static int LeadingZeros(int value)
{
   // Shift right unsigned to work with both positive and negative values
   var uValue = (uint) value;
   int leadingZeros = 0;
   while(uValue != 0)
   {
      uValue = uValue >> 1;
      leadingZeros++;
   }

   return (32 - leadingZeros);
}
Share:
13,758
Admin
Author by

Admin

Updated on June 05, 2022

Comments

  • Admin
    Admin almost 2 years

    How do I count the leading zeroes in an Int32? So what I want to do is write a function which returns 30 if my input is 2, because in binary I have 000...0000000000010.

  • Tim
    Tim almost 12 years
    I think he wants it converted to binary first
  • Henk Holterman
    Henk Holterman almost 12 years
    I think this just moves the main part of the work to ones(x)
  • BrokenGlass
    BrokenGlass almost 12 years
    what is WORDBITS and what is ones(x)? - the answer is very incomplete currently at best
  • BrokenGlass
    BrokenGlass almost 12 years
    wrong answer- max length of a 32 bit integer in base 10 is not 32
  • Admin
    Admin almost 12 years
    Hi, thank you very much. I was wondering if there was anything built into C# to do this though. I'll take this as a no, given that I didn't find anything either.
  • James Johnson
    James Johnson almost 12 years
    @BrokenGlass: I know what the max length of an integer is. OP specifically asked for the difference from 32. My comment about adjusting the upper bound was anticipating that.
  • Calmarius
    Calmarius over 10 years
    Your algorithm doesn't count leading zeros, but all zeros. Not what the OP is asking about.
  • Brent
    Brent over 10 years
    The Ones function computes the Hamming weight. There's a description of the algorithm on the wikipedia page of that name.
  • Brent
    Brent over 10 years
    Also want to throw out there that I benchmarked this against coercing to a double and then reading the exponent bits. This implementation is still faster. hamming weight: 1550887 ticks, fpu: 1861061 ticks. Thats DateTime ticks for 10000000 iterations, btw.
  • Calmarius
    Calmarius over 10 years
    @Brent Is it possible to improve the algorithm exploiting the fact that x always 2^n-1?
  • Brent
    Brent over 10 years
    Yep. According to Wikipedia's "Find first set" (the count leading zeros problem is also discussed) page: table[0..31] = {0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31} return table[(x × 0x07C4ACDD) >> 27] from the 4th code block in the algorithms section (citing Leiserson, Prokop, and Randall). Whether this implementation is faster (it may very well be) will be determined by someone who isn't slacking off at work. ;)
  • jorgebg
    jorgebg about 7 years
    I've modified the method to use unsigned integer shifting, otherwise with negative values it will never exit the loop.
  • Ken Lyon
    Ken Lyon almost 7 years
    I realize this is an old thread and the method is more the focus than the values, but 10 (decimal) is not 00000000000000000000000000010100 in binary, it's 00000000000000000000000000001010. I'd recommend you either change the binary values or just say we're using 20 for this example.
  • Robear
    Robear over 5 years
    What a cool way to inline assembly in C#! I was looking for a way to use BSR for this. Worth noting that this "should" be the fastest way to do this, but it's also architecture-specific (not that many of us are using .NET on anything but x86). This needs more upvotes.
  • Robear
    Robear over 5 years
    Lookup tables are generally faster in benchmarks only. In real-world operation, they can be slower because they require the lookup table to be in the cache. If it's not, it's a cache miss, and you lose cycles. The fastest (and most reliable) algorithms are usually pure mathematical operations without branches or lookup tables.
  • Derek Ziemba
    Derek Ziemba over 5 years
    Note that when I tested it, it was actually slower on dotnet 4.7.2 because of the ~50 instruction overhead of going from managed to unmanaged code. Assembly functions would have to be much fatter to offset that overhead.
  • Robear
    Robear over 5 years
    That's really interesting. I wonder if it's the same overhead branching out to C++/CLI? That's how I usually bring assembly into the managed space, mostly because I can control the marshalling. Unfortunately, Core isn't going to support C++/CLI, so hopefully the overhead has gotten better.
  • SunsetQuest
    SunsetQuest over 4 years
    I benchmarked about 25 different versions in c# and besides a couple that extract a floats exponent, yours was 30% fast then the next. After spending some time looking at all these, this function is actually pretty amazing and is the best because it extracting the exponent could possibly be problematic on non-little-endian architectures or some compiler configurations. Nice work!
  • phuclv
    phuclv about 4 years
    it's also very slow because of Math.Log
  • SunsetQuest
    SunsetQuest almost 4 years
    This is the best answer for .NET Core 3.0 and above. It is safe because it is built-in to the default .net core library and the performance is probably tuned with handcrafted asm instructions for each platform.
  • spender
    spender over 3 years
    I've linked to you from my answer. You're welcome :)
  • freakish
    freakish over 2 years
    The biggest issue with this solution is that it requires memory allocation plus is very very slow due to serialization.