Fastest way to separate the digits of an int into an array in .NET?

16,966

Solution 1

How about:

public static int[] ConvertToArrayOfDigits(int value)
{
    int size = DetermineDigitCount(value);
    int[] digits = new int[size];
    for (int index = size - 1; index >= 0; index--)
    {
        digits[index] = value % 10;
        value = value / 10;
    }
    return digits;
}

private static int DetermineDigitCount(int x)
{
    // This bit could be optimised with a binary search
    return x < 10 ? 1
         : x < 100 ? 2
         : x < 1000 ? 3
         : x < 10000 ? 4
         : x < 100000 ? 5
         : x < 1000000 ? 6
         : x < 10000000 ? 7
         : x < 100000000 ? 8
         : x < 1000000000 ? 9
         : 10;
}

Note that this won't cope with negative numbers... do you need it to?

EDIT: Here's a version which memoizes the results for under 10000, as suggested by Eric. If you can absolutely guarantee that you won't change the contents of the returned array, you could remove the Clone call. It also has the handy property of reducing the number of checks to determine the length of "large" numbers - and small numbers will only go through that code once anyway :)

private static readonly int[][] memoizedResults = new int[10000][];

public static int[] ConvertToArrayOfDigits(int value)
{
    if (value < 10000)
    {
        int[] memoized = memoizedResults[value];
        if (memoized == null) {
            memoized = ConvertSmall(value);
            memoizedResults[value] = memoized;
        }
        return (int[]) memoized.Clone();
    }
    // We know that value >= 10000
    int size = value < 100000 ? 5
         : value < 1000000 ? 6
         : value < 10000000 ? 7
         : value < 100000000 ? 8
         : value < 1000000000 ? 9
         : 10;

    return ConvertWithSize(value, size);
}

private static int[] ConvertSmall(int value)
{
    // We know that value < 10000
    int size = value < 10 ? 1
             : value < 100 ? 2
             : value < 1000 ? 3 : 4;
    return ConvertWithSize(value, size);
}

private static int[] ConvertWithSize(int value, int size)
{
    int[] digits = new int[size];
    for (int index = size - 1; index >= 0; index--)
    {
        digits[index] = value % 10;
        value = value / 10;
    }
    return digits;
}

Note that this doesn't try to be thread-safe at the moment. You may need to add a memory barrier to make sure that the write to the memoized results isn't visible until the writes within the individual result are visible. I've given up trying to reason about these things unless I absolutely have to. I'm sure you can make it lock-free with effort, but you should really get someone very smart to do so if you really need to.

EDIT: I've just realised that the "large" case can make use of the "small" case - split the large number into two small ones and use the memoised results. I'll give that a go after dinner and write a benchmark...

EDIT: Okay, ready for a giant amount of code? I realised that for uniformly random numbers at least, you'll get "big" numbers much more often than small ones - so you need to optimise for that. Of course, that might not be the case for real data, but anyway... it means I now do my size tests in the opposite order, hoping for big numbers first.

I've got a benchmark for the original code, the simple memoization, and then the extremely-unrolled code.

Results (in ms):

Simple: 3168
SimpleMemo: 3061
UnrolledMemo: 1204

Code:

using System;
using System.Diagnostics;

class DigitSplitting
{
    static void Main()        
    {
        Test(Simple);
        Test(SimpleMemo);
        Test(UnrolledMemo);
    }

    const int Iterations = 10000000;

    static void Test(Func<int, int[]> candidate)
    {
        Random rng = new Random(0);
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < Iterations; i++)
        {
            candidate(rng.Next());
        }
        sw.Stop();
        Console.WriteLine("{0}: {1}",
            candidate.Method.Name, (int) sw.ElapsedMilliseconds);            
    }

    #region Simple
    static int[] Simple(int value)
    {
        int size = DetermineDigitCount(value);
        int[] digits = new int[size];
        for (int index = size - 1; index >= 0; index--)
        {
            digits[index] = value % 10;
            value = value / 10;
        }
        return digits;
    }

    private static int DetermineDigitCount(int x)
    {
        // This bit could be optimised with a binary search
        return x < 10 ? 1
             : x < 100 ? 2
             : x < 1000 ? 3
             : x < 10000 ? 4
             : x < 100000 ? 5
             : x < 1000000 ? 6
             : x < 10000000 ? 7
             : x < 100000000 ? 8
             : x < 1000000000 ? 9
             : 10;
    }
    #endregion Simple    

    #region SimpleMemo
    private static readonly int[][] memoizedResults = new int[10000][];

    public static int[] SimpleMemo(int value)
    {
        if (value < 10000)
        {
            int[] memoized = memoizedResults[value];
            if (memoized == null) {
                memoized = ConvertSmall(value);
                memoizedResults[value] = memoized;
            }
            return (int[]) memoized.Clone();
        }
        // We know that value >= 10000
        int size = value >= 1000000000 ? 10
                 : value >= 100000000 ? 9
                 : value >= 10000000 ? 8
                 : value >= 1000000 ? 7
                 : value >= 100000 ? 6
                 : 5;

        return ConvertWithSize(value, size);
    }

    private static int[] ConvertSmall(int value)
    {
        // We know that value < 10000
        return value >= 1000 ? new[] { value / 1000, (value / 100) % 10,
                                           (value / 10) % 10, value % 10 }
              : value >= 100 ? new[] { value / 100, (value / 10) % 10, 
                                         value % 10 }
              : value >= 10 ? new[] { value / 10, value % 10 }
              : new int[] { value };
    }

    private static int[] ConvertWithSize(int value, int size)
    {
        int[] digits = new int[size];
        for (int index = size - 1; index >= 0; index--)
        {
            digits[index] = value % 10;
            value = value / 10;
        }
        return digits;
    }
    #endregion

    #region UnrolledMemo
    private static readonly int[][] memoizedResults2 = new int[10000][];
    private static readonly int[][] memoizedResults3 = new int[10000][];
    static int[] UnrolledMemo(int value)
    {
        if (value < 10000)
        {
            return (int[]) UnclonedConvertSmall(value).Clone();
        }
        if (value >= 1000000000)
        {
            int[] ret = new int[10];
            int firstChunk = value / 100000000;
            ret[0] = firstChunk / 10;
            ret[1] = firstChunk % 10;
            value -= firstChunk * 100000000;
            int[] secondChunk = ConvertSize4(value / 10000);
            int[] thirdChunk = ConvertSize4(value % 10000);
            ret[2] = secondChunk[0];
            ret[3] = secondChunk[1];
            ret[4] = secondChunk[2];
            ret[5] = secondChunk[3];
            ret[6] = thirdChunk[0];
            ret[7] = thirdChunk[1];
            ret[8] = thirdChunk[2];
            ret[9] = thirdChunk[3];
            return ret;
        } 
        else if (value >= 100000000)
        {
            int[] ret = new int[9];
            int firstChunk = value / 100000000;
            ret[0] = firstChunk;
            value -= firstChunk * 100000000;
            int[] secondChunk = ConvertSize4(value / 10000);
            int[] thirdChunk = ConvertSize4(value % 10000);
            ret[1] = secondChunk[0];
            ret[2] = secondChunk[1];
            ret[3] = secondChunk[2];
            ret[4] = secondChunk[3];
            ret[5] = thirdChunk[0];
            ret[6] = thirdChunk[1];
            ret[7] = thirdChunk[2];
            ret[8] = thirdChunk[3];
            return ret;
        }
        else if (value >= 10000000)
        {
            int[] ret = new int[8];
            int[] firstChunk = ConvertSize4(value / 10000);
            int[] secondChunk = ConvertSize4(value % 10000);
            ret[0] = firstChunk[0];
            ret[1] = firstChunk[0];
            ret[2] = firstChunk[0];
            ret[3] = firstChunk[0];
            ret[4] = secondChunk[0];
            ret[5] = secondChunk[1];
            ret[6] = secondChunk[2];
            ret[7] = secondChunk[3];
            return ret;
        }
        else if (value >= 1000000)
        {
            int[] ret = new int[7];
            int[] firstChunk = ConvertSize4(value / 10000);
            int[] secondChunk = ConvertSize4(value % 10000);
            ret[0] = firstChunk[1];
            ret[1] = firstChunk[2];
            ret[2] = firstChunk[3];
            ret[3] = secondChunk[0];
            ret[4] = secondChunk[1];
            ret[5] = secondChunk[2];
            ret[6] = secondChunk[3];
            return ret;
        }
        else if (value >= 100000)
        {
            int[] ret = new int[6];
            int[] firstChunk = ConvertSize4(value / 10000);
            int[] secondChunk = ConvertSize4(value % 10000);
            ret[0] = firstChunk[2];
            ret[1] = firstChunk[3];
            ret[2] = secondChunk[0];
            ret[3] = secondChunk[1];
            ret[4] = secondChunk[2];
            ret[5] = secondChunk[3];
            return ret;
        }
        else
        {
            int[] ret = new int[5];
            int[] chunk = ConvertSize4(value % 10000);
            ret[0] = value / 10000;
            ret[1] = chunk[0];
            ret[2] = chunk[1];
            ret[3] = chunk[2];
            ret[4] = chunk[3];
            return ret;
        }
    }

    private static int[] UnclonedConvertSmall(int value)
    {
        int[] ret = memoizedResults2[value];
        if (ret == null)
        {
            ret = value >= 1000 ? new[] { value / 1000, (value / 100) % 10,
                                           (value / 10) % 10, value % 10 }
              : value >= 100 ? new[] { value / 100, (value / 10) % 10, 
                                         value % 10 }
              : value >= 10 ? new[] { value / 10, value % 10 }
              : new int[] { value };
            memoizedResults2[value] = ret;
        }
        return ret;
    }

    private static int[] ConvertSize4(int value)
    {
        int[] ret = memoizedResults3[value];
        if (ret == null)
        {
            ret = new[] { value / 1000, (value / 100) % 10,
                         (value / 10) % 10, value % 10 };
            memoizedResults3[value] = ret;
        }
        return ret;
    }
    #endregion UnrolledMemo
}

Solution 2

1 + Math.Log10(num) will give the number of digits without any searching/looping:

public static byte[] Digits(int num)
{
    int nDigits = 1 + Convert.ToInt32(Math.Floor(Math.Log10(num)));
    byte[] digits = new byte[nDigits];
    int index = nDigits - 1;
    while (num > 0) {
        byte digit = (byte) (num % 10);
        digits[index] = digit;
        num = num / 10;
        index = index - 1;
    }
    return digits;
}

Edit: Possibly prettier:

public static byte[] Digits(int num)
{
    int nDigits = 1 + Convert.ToInt32(Math.Floor(Math.Log10(num)));
    byte[] digits = new byte[nDigits];

    for(int i = nDigits - 1; i != 0; i--)
    {
        digits[i] = (byte)(num % 10);
        num = num / 10;
    }
    return digits;
} 

Solution 3

convert integer to string and then use String.Chars[]

Solution 4

Millions of times isn't that much.

// input: int num >= 0
List<byte> digits = new List<byte>();
while (num > 0)
{
   byte digit = (byte) (num % 10);
   digits.Insert(0, digit);  // Insert to preserve order
   num = num / 10;
}

// if you really want it as an array
byte[] bytedata = digits.ToArray();

Note that this could be changed to cope with negative numbers if you change byte to sbyte and test for num != 0.

Solution 5

'Will' vs 'Does'? I'm a huge fan of optimizing code after it's wrote, profiled, and it's been determined to be the bottleneck.

Share:
16,966

Related videos on Youtube

Moayad Mardini
Author by

Moayad Mardini

Working with startups in three continents. Backend developer using Python/Django. Microsoft Developer Network moderator. SOreadytohelp

Updated on April 15, 2022

Comments

  • Moayad Mardini
    Moayad Mardini about 2 years

    I want to separate the digits of an integer, say 12345, into an array of bytes {1,2,3,4,5}, but I want the most performance effective way to do that, because my program does that millions of times.

    Any suggestions? Thanks.

    • Russell Steen
      Russell Steen over 14 years
      If it's really critical, then try to get ALL of the ways, and benchmark them yourself, using cycles that will simulate your real environment.
    • Moayad Mardini
      Moayad Mardini over 14 years
      OK, I don't know all the possible ways, would you suggest some of?
    • plinth
      plinth over 14 years
      do you really mean to separate out the base 10 digits or you mean byte by byte?
    • Eric Lippert
      Eric Lippert over 14 years
      Do you know anything about the input? For example, do you know that the integer will always be positive? Do you know that it will always be less than a million? And so on. You can get good optimizations if you can reduce the scope of the problem.
    • Chris
      Chris over 14 years
      This is really interesting. I am curious. Why are you doing this?
  • Daniel Elliott
    Daniel Elliott over 14 years
    Why the downvote? +! premature optimisation is the root of all evil
  • Jon Skeet
    Jon Skeet over 14 years
    @Thorarin: Good idea, but there's a better one :)
  • cdiggins
    cdiggins over 14 years
    Your answer is a non-answer. How do you know he didn't write and profile the code, and that he does know this to be the the bottleneck.
  • cdiggins
    cdiggins over 14 years
    Several expensive operations here: allocate a list, insert into a list, convert to an array. It could be sped up by an order of at least a hundred.
  • Moayad Mardini
    Moayad Mardini over 14 years
    @cdiggins: Yes, thanks, this is the bottleneck of my program's logic.
  • Thorarin
    Thorarin over 14 years
    Actually, I'm not certain it's "better". In theory it should be, but the for (int i = 0; i < size; i++) is optimized by the JIT compiler to be faster than for (int i = size - 1; i >= 0; i--). Kinda stupid, I know :( I was mostly going for readability.
  • Joren
    Joren over 14 years
    It's of course easy to get rid of the zeroes by copying the relevant digits to a new array before returning. Okay, it iterates over the array again, but for an array that's 10 entries long I really don't see a problem with that.
  • Joren
    Joren over 14 years
    It'd be way better just to start filling up an array of 10 items in reverse order and then copying the digits to a new array later.
  • Joren
    Joren over 14 years
    A kitten dies every time someone uses strings for doing simple arithmetic.
  • cdiggins
    cdiggins over 14 years
    The extra iteration is exactly corret. The only probably we would run into is millions of array allocations. Faster to pre-allocate 10 static arrays, and choose the correct one afterwards.
  • Eric Lippert
    Eric Lippert over 14 years
    You could potentially make this faster on average by memoizing the result.
  • Joren
    Joren over 14 years
    I had thought of splitting the number in two halves and implemented it, and it is indeed faster than your current solution.
  • John K
    John K over 14 years
    The nice thing about a regular expression being present is you can modify the pattern to exclude special numeric-related symbols like . $ E, etc if you wanted only the digits extracted before being split.
  • DK.
    DK. over 14 years
    First solution had "hard to beat" performance (get rid of function call to improve it). 2nd solution... too many typos and Clone() will kill most of performance gain.
  • sentry07
    sentry07 over 14 years
    I'm returning from 400k to 500k random integers (from 1 to 9 digits) a second.
  • Nick
    Nick over 14 years
    "Hardware is Cheap, Programmers are Expensive."
  • Obi
    Obi over 14 years
    He said my code will, not does. He inferred future tense, and yes as stated. Premature optimization is the root of all evil.