Fast way to replace elements in array - C
Solution 1
For your specific case where you initially have 0 and 1, the following might be faster. You'll have to bench mark it. You probably can't do much better with plain C though; you may need to dive into assembly if you want to take advantage of "x86 trickery" that may exist.
for(int i = 0; i < size ; i++){
array[i] *= 123456;
}
EDIT:
Benchmark code:
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
size_t diff(struct timespec *start, struct timespec *end)
{
return (end->tv_sec - start->tv_sec)*1000000000 + end->tv_nsec - start->tv_nsec;
}
int main(void)
{
const size_t size = 1000000;
int array[size];
for(size_t i=0; i<size; ++i) {
array[i] = rand() & 1;
}
struct timespec start, stop;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start);
for(size_t i=0; i<size; ++i) {
array[i] *= 123456;
//if(array[i]) array[i] = 123456;
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop);
printf("size: %zu\t nsec: %09zu\n", size, diff(&start, &stop));
}
my results:
Computer: quad core AMD Phenom @2.5GHz, Linux, GCC 4.7, compiled with
$ gcc arr.c -std=gnu99 -lrt -O3 -march=native
if
version: ~5-10ms*=
version: ~1.3ms
Solution 2
You might want to benchmark this as well:
for(int i = 0; i < size ; i++){
array[i] = (~(array[i]-1) & 123456);
}
I run it through same benchmark as SchighSchagh, with little or no difference on my set up. It may differ on yours, however.
EDIT: Stop the presses!
I just remembered that x86 can "unbranch" ternary operators if arguments between ":" are constants. Consider following code:
for(size_t i=0; i<size; ++i) {
array[i] = array[i] ? 123456 : 0;
}
Looks almost like your original code, doesn't it? Well, disassembly shows that it has been compiled without any branches:
for(size_t i=0; i<size; ++i) {
00E3104C xor eax,eax
00E3104E mov edi,edi
array[i] = array[i] ? 123456 : 0;
00E31050 mov edx,dword ptr [esi+eax*4]
00E31053 neg edx
00E31055 sbb edx,edx
00E31057 and edx,1E240h
00E3105D mov dword ptr [esi+eax*4],edx
00E31060 inc eax
00E31061 cmp eax,5F5E100h
00E31066 jb wmain+50h (0E31050h)
}
Performance-wise it seems on par or little better than my original and SchighSchagh solution. It is more readable and more flexible, though. For instance, it can work with array[i] having values different than 0 and 1.
Bottom line, benchmark AND take a peek in the disassembly.
Solution 3
The array is small enough that it fits in cache, so it should be worthwhile to use SIMD: (not tested)
mov ecx, size
lea esi, [array + ecx * 4]
neg ecx
pxor xmm0, xmm0
movdqa xmm1, [_vec4_123456] ; value of { 123456, 123456, 123456, 123456 }
_replaceloop:
movdqa xmm2, [esi + ecx * 4] ; assumes the array is 16 aligned, make that true
add ecx, 4
pcmpeqd xmm2, xmm0
pandn xmm2, xmm1
movdqa [esi + ecx * 4 - 16], xmm2
jnz _replaceloop
Unrolling by 2 might help.
If you have SSE4.1, you can use SchighSchagh's multiplication trick with pmulld
.
Solution 4
one more way to speed up the array assignment you can use the c inline assembly. Like below,
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
const int size = 100000;
void main(void) {
int array[size];
int value = 1000;
__asm__ __volatile__("cld\n\t"
"rep\n\t"
"stosl\n\t"
:
:"c"(size*4), "a"(value), "D"(array)
:
);
printf("Array[0] : %d \n", array[0]);
}
This should be speed when we compared to plain c program to assign the array values. And also the stosl instruction take 4 clock cycle.
Axarydax
Programming enthusiast, enjoying coding at home and at work too. I also like computer and console games. But I also have a real life too :D ! I like going to pub with friends, metal music shows, photography, driving, motorbiking and cycling. P.S. I'm keeping the April Fool's avatar!
Updated on March 25, 2020Comments
-
Axarydax about 4 years
Let's say we have an array of ints like this:
const int size = 100000; int array[size]; //set some items to 0 and other items to 1
I'd like to replace all items that have value of 1 with another value, for example 123456. This can be trivially implemented with:
for(int i = 0; i < size ; i++){ if(array[i] != 0) array[i] = 123456; }
Out of curiosity, is there a faster way to do this, by some kind of x86 trickery, or is this the best code for the processor?