Subtracting Arrays

15,768

Solution 1

If your arrays aren't sorted, the worst case time complexity for an array exclusion using a intuitive solution is O(n2) (although you can boost this if you sort the arrays first), since you need to check the whole array whether an element is existent or not.

Example of worst case scenario:

array a1 = [1, 3, 4, 5, 8];
array a2 = [8, 5, 4, 3, 1];

If your arrays are ordered, then the worst case time complexity is O(n+m) (pseudo-code):

int i = 0;
for(int j = 0; i < a1.size && j < a2.size;){
    if(a1[i] == a2[j])
        ++i, ++j;  // exclude this element
    if(a1[i] < a2[j]){
         a3.push(a1[i]); // include this element
         ++i;
    }
    if(a1[i] > a2[j])
         ++j; // ignore lesser elements
}
while(i < a1.size)
     a3.push(a1[i]);

UPDATE -Wall -Wextra -pedantic C code:

#include <stdio.h>
#include <malloc.h>

/**
* The following function excludes values from an array using another arrays values.
* Note that this version won't exclude multiple values, for this you have to drop
* '++j' in line 25.
*
* \param[in] from Original sorted array
* \param[in] from_length Original array length
* \param[in] what Sorted array including the excluding values
* \param[in] what_length self describing
* \param[out] result_length the lenght of the new array - a value lesser 0 indicates an error.
*/

int* exclude(int* from, int from_length, int* what, int what_length, int* result_length){
    int i,j,k;
    int* result = (int*) malloc(sizeof(int)*from_length);
    if(result == NULL){
        *result_length = -1;
        return NULL;
    }
    for(i = j = k = 0; i < from_length && j < what_length;){
        if(from[i] == what[j])
            ++i, ++j;  /* exclude this element - to enable multiple exclusion drop '++j' 
                        4,4,5,6 /4 --> 5,6 */
        if(from[i] < what[j])
            result[k++] = from[i++];
        if(from[i] > what[j])
             ++j; /* ignore lesser elements */
    }
    while(i < from_length)
        result[k++] = from[i++];

    if( k < from_length){
        int* tmp = (int*) realloc(result,sizeof(int)*k);
        if(tmp == NULL){
            /* either error handling or returning result */
        }else{
            result = tmp;
        }
    }
    *result_length = k;
    return result;
}

int main(){
    int a[6] = {1,2,3,4,5,6};
    int b[3] = {2,4,5};
    int result_length;
    int i;
    int *c = exclude(a,6,b,3,&result_length);
    for(i = 0; i < result_length; ++i)
        printf("%i ",c[i]);
    free(c);
    return 0;
}

This will result in a worst time complexity of O(n+m) for sorted arrays and O(n log n + m log m) for non-sorted arrays (sort both, use the function provided above).

Solution 2

it can be done in O(nlogm + m) where m is the array you are subtracting from, using binary search (*)If the array is not sorted, a sort will be needed first which will result in O(mlogm + nlogm + m)
Pseudo code:

remove(a1,a2): //a1-a2
   for each element x in a2:
      i <- binarySearch(a1,x)
      if x is in a1:
         a1[x] <- NOT_VALID
   remove all elements in a1 marked NOT_VALID

(*) You will have to give NOT_VALID a special value for binary search to keep working, or even simpler: maintain a new array of elements marked as NOT_VALID instead of actually marking elements.

Solution 3

If a1 does not contain duplicates then you could use a hash set data structure, e.g. from pblSet. Something like this:

PblSet* pSet = pblSetNewHashSet();

pblSetAddAll(pSet, a1);
pblSetRemoveAll(pSet, a2);

int** ppResult = (int**) pblSetToArray(pSet);

// use *ppResult
...

free(ppResult);
pblSetFree(pSet);

The performance should be O(n + m) and the arrays don't need to be sorted.

Solution 4

Because, you asked for the fastest and simplest I'm going to introduce some assumptions:

  • integers
  • finite
  • positive
  • unique
  • small
  • order isn't important.

e.g. you have no more than 10 numbers. Then let's treat them as sets for an O(n) solution (where n represents the maximum finite size of the set):

// Initialize array1 to [1, 3, 4, 5, 8].
unsigned char array1[10];
memset(array1, 0, 10);
array1[1] = 1;
array1[3] = 1;
array1[4] = 1;
array1[5] = 1;
array1[8] = 1;

// Initialize array2 to [2,4,5].
unsigned char array2[10];
memset(array2, 0, 10);
array2[2] = 1;
array2[4] = 1;
array2[5] = 1;

// Implement array3 = array1 - array2.
unsigned char array3[10];
memset(array3, 0, 10);
for (int i = 0; i < 10; i++)
    array3[i] = array1[i] & ~array2[i];

For an even more cheekier answer, if the numbers in your array do not exceed 0-31, you can just simplify the above using unsigned int:

    // array1 = 1, 3, 4, 5, 8
    unsigned int array1 = (1 << 1) | (1 << 3) | (1 << 4) | (1 << 5) | (1 << 8);
    // array2 = 2, 4, 5
    unsigned int array2 = (1 << 2) | (1 << 4) | (1 << 5);
    // array3 = array1 - array2;
    unsigned int array3 = array1 &~ array2;
Share:
15,768
winck
Author by

winck

Updated on June 11, 2022

Comments

  • winck
    winck almost 2 years

    What's a fastest way to implement array subtraction? For example:

    array a1 = [1, 3, 4, 5, 8];
    array a2 = [2, 4, 5];
    
    array a3 = a1 - a2; /* [1, 3, 8] */
    

    Here array would be the type my program uses to represent a struct which is used as a container. The rest of it is pseudo code, of course I'm not creating the arrays like that nor subtracting.

    The simplest solution I can think of involves nested loops:

    /* a1 - a2 */
    for (i = 0; i < a1.size; ++i) {
        int is_the_same = 0;
        for (j = 0; i < a2.size; ++j)
            if (a1[i] == a2[j]) {
                is_the_same = 1;
                break;
            }
        }
        if (!is_the_same)
           a3.push a1[i];
    }
    

    But this does not look very efficient. What would be another approach?