improving C circular buffer efficiency
Solution 1
As "Oli Charlesworth" suggested - you'd be able to simplify things if your buffer size is a power of 2. I'd like to write the read/write function bodies, so that the intent is more clear.
#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)
struct buffer {
float buff[BUFF_SIZE];
unsigned writeIndex;
};
void write(struct buffer *buffer, float value) {
buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}
float readn(struct buffer *buffer, unsigned Xn){
return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}
Some explanations. Note that there's no branching (if
) at all. We don't limit the array index to the array bounds, instead we're AND-ing it with the mask.
Solution 2
If you can make your buffer size a power-of-2, then the check against zero can be replaced with unconditional bit-masking. On most processors, this should be faster.
Solution 3
This may not be seem elegant but is efficient. Accessing structure elements through the pointer is taking up a lot of instructions. Why not remove the structure altogether and make buffer
and writeIndex
as global variables? This will considerably decrease the size of your readn
and write
functions.
I tried in gcc and here is the output with and without the structure
With Structure
_write:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %ecx
movl 8(%ebp), %eax
movl 16(%eax), %edx
movl 12(%ebp), %eax
movl %eax, (%ecx,%edx,4)
movl 8(%ebp), %eax
incl 16(%eax)
movl 8(%ebp), %eax
cmpl $3, 16(%eax)
jne L1
movl 8(%ebp), %eax
movl $0, 16(%eax)
L1:
popl %ebp
ret
Without Structure. ie Making buffer
and writeIndex
as global
_write:
pushl %ebp
movl %esp, %ebp
movl _writeIndex, %edx
movl 8(%ebp), %eax
movl %eax, _buff(,%edx,4)
incl _writeIndex
cmpl $3, _writeIndex
jne L1
movl $0, _writeIndex
L1:
popl %ebp
ret
Solution 4
Keeping track of the start and the end of the circular buffer with pointers, is likely a bit faster than array indexing, since the address will computed in runtime in case of the latter. Try to replace readIndex and writeIndex with float*
instead. The code would then be
*buffer->writeIndex = value;
buffer->writeIndex++;
if(buffer->writeIndex == buffer + BUFF_SIZE)
buffer->writeIndex=buffer->buff;
buffer + BUFF_SIZE
will still be a constant expression and the compiler will translate that to a fixed address at compile time.
Solution 5
The accepted answer contains code that is incorrect and will invoke undefined behavior. Correction below:
#define BUFF_SIZE (4U)
#define BUFF_SIZE_MASK (BUFF_SIZE-1U)
struct buffer {
float buff[BUFF_SIZE];
unsigned writeIndex;
};
void write(struct buffer *buffer, float value) {
buffer->buff[(++buffer->writeIndex) & BUFF_SIZE_MASK] = value;
}
float readn(struct buffer *buffer, unsigned Xn){
return buffer->buff[(buffer->writeIndex - Xn) & BUFF_SIZE_MASK];
}
The error in the original answer was to assume that 'int' will wrap around. Using binary masks with int is also unwise.
Gurba
Updated on June 10, 2022Comments
-
Gurba almost 2 years
I'd like some help improving the efficiency of my circular buffer code.
I had a look around stackoverflow and found that (nearly) all of the topics on circular buffers are about the uses of such a buffer or the basic implementation of a circular buffer. I really need information about how to make it super efficient.
The plan is to use this buffer with the STM32F4 microcontroller which has a single precicion FPU. I plan to make heavy use of especially the write() and readn() functions. We're literally talking a few million calls a second here so shaving of a few clock cycles here and there is really going to make a difference.
I'll put the most important bits of code here, the full buffer code is available via http://dl.dropbox.com/u/39710897/circular%20buffer.rar
Can anyone provide me with a few pointers on how to improve the efficiency of this buffer?
#define BUFF_SIZE 3 // buffer size set at compile time typedef struct buffer{ float buff[BUFF_SIZE]; int readIndex; int writeIndex; }buffer; /********************************\ * void write(buffer* buffer, float value) * writes value into the buffer * @param buffer* buffer * pointer to buffer to be used * @param float value * valueto be written in buffer \********************************/ void write(buffer* buffer,float value){ buffer->buff[buffer->writeIndex]=value; buffer->writeIndex++; if(buffer->writeIndex==BUFF_SIZE) buffer->writeIndex=0; } /********************************\ * float readn(buffer* buffer, int Xn) * reads specified value from buffer * @param buffer* buffer * pointer to buffer to be read from * @param int Xn * specifies the value to be read from buffer counting backwards from the most recently written value * i.e. the most recently writen value can be read with readn(buffer, 0), the value written before that with readn(buffer, 1) \********************************/ float readn(buffer* buffer, int Xn){ int tempIndex; tempIndex=buffer->writeIndex-(Xn+1); while(tempIndex<0){ tempIndex+=BUFF_SIZE; } return buffer->buff[tempIndex]; }
-
valdo about 12 yearsAgree. Also there'll be no need to check for index over/under-flow at all. Just increment and decrement it without any checking, just AND-mask it for the actual element access
-
Gurba about 12 yearsI'll be investigating this, making it a power of 2 is easy enough (just set BUFF_SIZE to 4). I'll have to dig a bit for the bit-masking though
-
valdo about 12 yearsWith optimizations enabled there shouldn't be such a big difference. One may resolve once the address of
buff
andwriteIndex
, all the rest should be the same. OTOH without optimizations any access such asbuff->...
would be long -
Lundin about 12 years@Gurba I assume the final application will have a much larger buffer size? Otherwise I would strongly question the use of the ring buffer ADT in the first place.
-
Oliver Charlesworth about 12 yearsThis won't work if the OP needs more than one buffer instance.
-
Pavan Manjunath about 12 years@valdo Hmmm.. But the compiler seems to be doing something else. I tried with
-O2
and the resultant code seems to be much bigger than the above ones :O -
Pavan Manjunath about 12 years@OliCharlesworth You are right. He would then have to create more global buffers/indices which would go on becoming clumsier. That's why I added the
not so elegant
phrase in my answer :) -
Gurba about 12 yearsI'll be keeping this one in mind, but I doubt that I'll use it. I do indeed need multiple instances and this just seems unmaintanable to me.. I'll just get confused too often
-
Jens Gustedt about 12 yearsAs soon as you use bitmasks you should simply switch to
unsigned
index types. Then you don't have to argue about 2-complement or stuff like that. -
Jens Gustedt about 12 yearsThe assembler really looks ineffient. I am sure you don't use the correct options for a modern platform. (-O3 -march=native should do for gcc). Also, the lengthy preamble is in that assembler because this is seen as a function. when declared as
inline
and then inlined into the call side, most of this noise should go away. -
Gurba about 12 yearsI thank you a LOT, this has made my code roughly 6.5 times faster! I'd call that a rather dramatic improvement
-
Gurba about 12 yearssince I can set only 1 answer as accepted, I'll to that with this one because of the examples even though Oli Charlesworth came up with a similar answer.
-
Jens Gustedt about 12 years@Stacker, I meant "assembler" as a short hand for "assembly listing".
-
Mtr about 11 yearsIt might make sense to try to make ReadArray and WriteArray functions that should work with arrays instead of single values. Or to make write and readn as macroses or inline procedures (in case of C++ compiler). In that case you avoid some stack workload, that may give you cpu ticks.
-
4pie0 about 8 years@valdo writeIndex will eventually overflow
-
Evan Benn about 5 yearsAs @4pie0 noted, this answer is incorrect C code, and will invoke undefined behaviour. I tried to edit the answer: stackoverflow.com/review/suggested-edits/22381560 but was rejected. Please do not use this code.
-
valdo about 5 years@EvanBenn: why do you think it invokes undefined behavior? There is no problem with
writeIndex
overflow. -
Evan Benn about 5 yearsWhen you say there is no problem, what do you mean? If signed
writeIndex
overflows that is undefined behaviour. unsigned integers wrap around.