Checksum for an integer array?

10,024

Solution 1

Let's take the case of your array of 25 integers. You explain that it can contains any permutations of the unique integers 0 to 24. According to this page, there is 25! (25 factorial) possible permutations, that is 15511210043330985984000000. Far more than a 32bit integer can contains.

The conclusion is that you will have collision, no matter how hard you try.

Now, here is a simple algorithm that account for position:

int checksum(int[] array, int size) {
  int c = 0;
  for(int i = 0; i < size; i++) {
    c += array[i];
    c = c << 3 | c >> (32 - 3); // rotate a little
    c ^= 0xFFFFFFFF; // invert just for fun
  }
  return c;
}

Solution 2

I think what you want is in the answer of the following thread:

Fast permutation -> number -> permutation mapping algorithms

You just take the number your permutation is mapped to and take that as your Checksum. As there is exactly one Checksum per permutation there can't be a smaller Checksum that is collision free.

Solution 3

How about the checksum of weighted sum? Let's take an example for [0,1,2,3]. First pick a seed and limit, let's pick a seed as 7 and limit as 10000007.

a[4] = {0, 1, 2, 3}
limit = 10000007, seed = 7
result = 0
result = ((result + a[0]) * seed) % limit = ((0 + 0) * 7)) % 10000007 = 0
result = ((result + a[1]) * seed) % limit = ((0 + 1) * 7)) % 10000007 = 7
result = ((result + a[2]) * seed) % limit = ((7 + 2) * 7)) % 10000007 = 63
result = ((result + a[3]) * seed) % limit = ((63 + 3) * 7)) % 10000007 = 462

Your checksum is 462 for that [0, 1, 2, 3]. The reference is http://www.codeabbey.com/index/wiki/checksum

Share:
10,024
MinaHany
Author by

MinaHany

Updated on June 04, 2022

Comments

  • MinaHany
    MinaHany almost 2 years

    I have an array that is of size 4,9,16 or 25 (according to the input) and the numbers in the array are the same but less by one (if the array size is 9 then the biggest element in the array would be 8) the numbers start with 0 and I would like to do some algorithm to generate some sort of a checksum for the array so that I can compare that 2 arrays are equal without looping through the whole array and checking each element one by one.

    Where can I get this sort of information? I need something that is as simple as possible. Thank you.

    edit: just to be clear on what I want:

    -All the numbers in the array are distinct, so [0,1,1,2] is not valid because there is a repeated element (1)

    -The position of the numbers matter, so [0,1,2,3] is not the same as [3,2,1,0]

    -The array will contain the number 0, so this should also be taken into consideration.

    EDIT:

    Okay I tried to implement the Fletcher's algorithm here: http://en.wikipedia.org/wiki/Fletcher%27s_checksum#Straightforward

    int fletcher(int array[], int size){
      int i;
      int sum1=0;
      int sum2=0;
      for(i=0;i<size;i++){
        sum1=(sum1+array[i])%255;
        sum2=(sum2+sum1)%255;
      }
      return (sum2 << 8) | sum1;
    }
    

    to be honest I have no idea what does the return line do but unfortunately, the algorithm does not work. For arrays [2,1,3,0] and [1,3,2,0] I get the same checksum.

    EDIT2:

    okay here's another one, the Adler checksum http://en.wikipedia.org/wiki/Adler-32#Example_implementation

    #define MOD 65521;
    
    unsigned long adler(int array[], int size){
      int i;
      unsigned long a=1;
      unsigned long b=0;
      for(i=0;i<size;i++){
        a=(a+array[i])%MOD;
        b=(b+a)%MOD;
      }
      return (b <<16) | a;
    }
    

    This also does not work. Arrays [2,0,3,1] and [1,3,0,2] generate same checksum. I'm losing hope here, any ideas?

    • ur.
      ur. almost 12 years
      The numbers in an array are not unique, I guess?! So { 1,2,2,4 } is valid?
    • Nick
      Nick almost 12 years
      > the numbers in the array are the same Can you elaborate on that?
    • MinaHany
      MinaHany almost 12 years
      oh sorry I did not mention! Yes the numbers are unique, so [1,2,2,4] is NOT valid.
    • Pavan Manjunath
      Pavan Manjunath almost 12 years
      {1,2,3,4} and {4,3,2,1}. Are these 2 arrays equal according to your definition?
    • MinaHany
      MinaHany almost 12 years
      no, these are different.
    • Jay
      Jay almost 12 years
      So, basically, if your array size is 4, then some examples of possible elements are: {1,2,3,4}, {1,3,4,2}, {1,4,3,2} {4,3,2,1}.. etc But, the always the elements will be between 1 and 4. Is this right?
    • MinaHany
      MinaHany almost 12 years
      Yes, that's right! But I just realised that the array will contain the number 0 so possiblme elements are : [0,1,2,3], [0,2,3,1], [0,3,2,1] and so on
    • Lundin
      Lundin almost 12 years
      Sounds very much like some sort of Sudoku generator or solver. There is like plenty of code for that to be found on the web.
    • Lundin
      Lundin almost 12 years
      Also, your spec seems incorrect. If the array of size 9 contains distinct numbers between 0 and 9, then 10 different numbers cannot fit in 9 array positions. One number between 0 and 9 will be left out in each array. Is that the case?
    • Lundin
      Lundin almost 12 years
      If you are looking for a simple solution, then that is to compare the arrays one element at a time. For an array of size 9, with ten possible values per item, the chance of finishing the comparison at the first attempt is 9 out of 10. If the first items are identical, the chance that the next comparison will be the last one, is 8 out of 9. And so on. This would be quite an effective algorithm in terms of comparisons (if not in terms of branch prediction).
    • MinaHany
      MinaHany almost 12 years
      @Lundin no, I meant the biggest element would be 8 not 9 (elements are from 0 to size-1
    • Lundin
      Lundin almost 12 years
      @MinaHany Still, I doubt you will be able to write more effective code than the simple for loop. A checksum calculation with modulo, division etc like those you posted probably takes 100 times the execution time than the worst case for loop (24 identical elements). So the real question would be if this checksum makes any sense in the particular application.
  • Pavan Manjunath
    Pavan Manjunath almost 12 years
    Nice and simple logic but it breaks easily- {4,2,3,1} also yields a checksum of -3
  • Jay
    Jay almost 12 years
    {3,2,1,4} is 1+1+3 = 5 & {3,1,2,4} is 2+1+2 = 5. Both turns out same. So, how will this work?
  • MinaHany
    MinaHany almost 12 years
    Sorry, I forgot to mention that the number 0 will also be in the array.
  • Yusuf X
    Yusuf X almost 12 years
    Right, that's why I mention "collision". Checksums normally have them. Good checksums have few collisions and are computationally inexpensive. But if you want to guarantee no collisions, then you have to go through each element in the array.
  • Pavan Manjunath
    Pavan Manjunath almost 12 years
    @YusufX agreed that collisions are inevitable when we are looking out for time optimization and simple algorithms, but collision rate shouldn't be too high even for simple cases as is in your logic
  • Yusuf X
    Yusuf X almost 12 years
    No requirements were given for collision rates or computational complexity, but if my answer doesn't suit, then I recommend a position-dependent checksum, as described here: en.wikipedia.org/wiki/Checksum#Position-dependent_checksums
  • Pavan Manjunath
    Pavan Manjunath almost 12 years
    @YusufX I think you should edit your answer and include the above link. It seems to be on target and useful
  • MinaHany
    MinaHany almost 12 years
    Thanks for the answer! Would you please explain a bit more on what the calculation is? I can't seem to get what's on the other thread.
  • Pavan Manjunath
    Pavan Manjunath almost 12 years
    Though the idea of using the index as a checksum is a pretty good one ( 0% collision ) but the complexity of calculating the checksum, the computations of factorials etc cant beat the straightforward array1[i] == array2[i] O(n) comparison.
  • Pavan Manjunath
    Pavan Manjunath almost 12 years
    I disagree with the claim that collision is unavoidable. @UmNyobe's solution, though not efficient enough for this particular case, guarantees 0% collision. Saying something's very difficult is quite different from saying that something's not possible at all :)
  • MinaHany
    MinaHany almost 12 years
    I'm sorry but this doesn't work either; 7 out of 12 arrays I am testing generate the same checksum.
  • MinaHany
    MinaHany almost 12 years
    I'm doing something wrong or this doesn't work: arrays [0,1,2,3] and [2,1,0,3] both generate a checksum of 18.
  • Nicolas Repiquet
    Nicolas Repiquet almost 12 years
    @PavanManjunath I'm not saying that it's difficult, I say it's impossible. You can't have 15511210043330985984000000 different 32bit integers, so you can't have a unique checksum per possible array, so you will have collision.
  • Pavan Manjunath
    Pavan Manjunath almost 12 years
    According to @UmNyobe's solution which uses index as the checksum, you need not have 15511210043330985984000000 different 32 bit integers in your program. You just need to calculate the indices for the 2 arrays that the user wishes to compare. These indices can go upto 15511210043330985984000000 , I agree. So we need to have only 2 variables that can hold this big a value. Also note that initially OP started out without any language preference. So Java's BigInteger class can help to hold this big an integer. In short, its not impossible!
  • Nicolas Repiquet
    Nicolas Repiquet almost 12 years
    @PavanManjunath It's not a checksum anymore though, it's a kind of unique signature. Btw, BigInteger data is stored in an array. So to compare two "checksums", you end up comparing two arrays. Oh wait! Isn't it what we're trying to avoid in the first place ? ;D
  • greybeard
    greybeard almost 9 years
    Replacing, e.g., comparisons of 25-element arrays with comparisons of about 12 bytes - a win, once you've amortised the cost of computing the numbers (space being no issue, as the arrays are equivalent and bigger). Depending on the probability of equal checksum values for a considerable number of checks, a fast&small checksum may be slower. E.g., with each permutation equally probable, expect it to be much faster.