Fast way to replace elements in array - C

64,253

Solution 1

For your specific case where you initially have 0 and 1, the following might be faster. You'll have to bench mark it. You probably can't do much better with plain C though; you may need to dive into assembly if you want to take advantage of "x86 trickery" that may exist.

for(int i = 0; i < size ; i++){
  array[i] *= 123456;
}

EDIT:

Benchmark code:

#include <time.h>
#include <stdlib.h>
#include <stdio.h>

size_t diff(struct timespec *start, struct timespec *end)
{
  return (end->tv_sec - start->tv_sec)*1000000000 + end->tv_nsec - start->tv_nsec;
}

int main(void)
{
  const size_t size = 1000000;
  int array[size];

  for(size_t i=0; i<size; ++i) {
    array[i] = rand() & 1;
  }

  struct timespec start, stop;

  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
  for(size_t i=0; i<size; ++i) {
    array[i] *= 123456;
    //if(array[i]) array[i] = 123456;
  }
  clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop);

  printf("size: %zu\t nsec: %09zu\n", size, diff(&start, &stop));
}

my results:

Computer: quad core AMD Phenom @2.5GHz, Linux, GCC 4.7, compiled with

$ gcc arr.c -std=gnu99 -lrt -O3 -march=native
  • if version: ~5-10ms
  • *= version: ~1.3ms

Solution 2

You might want to benchmark this as well:

for(int i = 0; i < size ; i++){
  array[i] = (~(array[i]-1) & 123456);
}

I run it through same benchmark as SchighSchagh, with little or no difference on my set up. It may differ on yours, however.

EDIT: Stop the presses!

I just remembered that x86 can "unbranch" ternary operators if arguments between ":" are constants. Consider following code:

for(size_t i=0; i<size; ++i) {
    array[i] = array[i] ? 123456 : 0;
}

Looks almost like your original code, doesn't it? Well, disassembly shows that it has been compiled without any branches:

  for(size_t i=0; i<size; ++i) {
00E3104C  xor         eax,eax  
00E3104E  mov         edi,edi  
        array[i] = array[i] ? 123456 : 0;
00E31050  mov         edx,dword ptr [esi+eax*4]  
00E31053  neg         edx  
00E31055  sbb         edx,edx  
00E31057  and         edx,1E240h  
00E3105D  mov         dword ptr [esi+eax*4],edx  
00E31060  inc         eax  
00E31061  cmp         eax,5F5E100h  
00E31066  jb          wmain+50h (0E31050h)  
    }

Performance-wise it seems on par or little better than my original and SchighSchagh solution. It is more readable and more flexible, though. For instance, it can work with array[i] having values different than 0 and 1.

Bottom line, benchmark AND take a peek in the disassembly.

Solution 3

The array is small enough that it fits in cache, so it should be worthwhile to use SIMD: (not tested)

  mov ecx, size
  lea esi, [array + ecx * 4]
  neg ecx
  pxor xmm0, xmm0
  movdqa xmm1, [_vec4_123456]  ; value of { 123456, 123456, 123456, 123456 }
_replaceloop:
  movdqa xmm2, [esi + ecx * 4] ; assumes the array is 16 aligned, make that true
  add ecx, 4
  pcmpeqd xmm2, xmm0
  pandn xmm2, xmm1
  movdqa [esi + ecx * 4 - 16], xmm2
  jnz _replaceloop

Unrolling by 2 might help.

If you have SSE4.1, you can use SchighSchagh's multiplication trick with pmulld.

Solution 4

one more way to speed up the array assignment you can use the c inline assembly. Like below,

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

const int size = 100000; 
void main(void) {
  int array[size];
  int value = 1000;

  __asm__ __volatile__("cld\n\t"
          "rep\n\t"
          "stosl\n\t"
          :
          :"c"(size*4), "a"(value), "D"(array)
          :
         );

  printf("Array[0] : %d \n", array[0]);
}

This should be speed when we compared to plain c program to assign the array values. And also the stosl instruction take 4 clock cycle.

Share:
64,253
Axarydax
Author by

Axarydax

Programming enthusiast, enjoying coding at home and at work too. I also like computer and console games. But I also have a real life too :D ! I like going to pub with friends, metal music shows, photography, driving, motorbiking and cycling. P.S. I'm keeping the April Fool's avatar!

Updated on March 25, 2020

Comments

  • Axarydax
    Axarydax about 4 years

    Let's say we have an array of ints like this:

    const int size = 100000;
    int array[size];
    //set some items to 0 and other items to 1
    

    I'd like to replace all items that have value of 1 with another value, for example 123456. This can be trivially implemented with:

    for(int i = 0; i < size ; i++){
        if(array[i] != 0) 
            array[i] = 123456;
    }
    

    Out of curiosity, is there a faster way to do this, by some kind of x86 trickery, or is this the best code for the processor?