how to count cycles?

11,620

Solution 1

http://icl.cs.utk.edu/papi/

PAPI_get_real_cyc(3) - return the total number of cycles since some arbitrary starting point

Solution 2

Assembler instruction rdtsc (Read Time-Stamp Counter) retun in EDX:EAX registers the current CPU ticks count, started at CPU reset. If your CPU runing at 3GHz then one tick is 1/3GHz.

EDIT: Under MS windows the API call QueryPerformanceFrequency return the number of ticks per second.

Solution 3

Unfortunately timing the code is as error prone as visually counting instructions and clock cycles. Be it a debugger or other tool or re-compiling the code with a re-run 10000000 times and time it kind of thing, you change where things land in the cache line, the frequency of the cache hits and misses, etc. You can mitigate some of this by adding or removing some code upstream from the module of code being tested, (to cause a few instructions added and removed changing the alignment of your program and sometimes of your data).

With experience you can develop an eye for performance by looking at the disassembly (as well as the high level code). There is no substitute for timing the code, problem is timing the code is error prone. The experience comes from many experiements and trying to fully understand why adding or removing one instruction made no or dramatic differences. Why code added or removed in a completely different unrelated area of the module under test made huge performance differences on the module under test.

Solution 4

As GJ has written in another answer I also recommend using the "rdtsc" instruction (rather than calling some operating system function which looks right).

I've written quite a few answers on this topic. Rdtsc allows you to calculate the elapsed clock cycles in the code's "natural" execution environment rather than having to resort to calling it ten million times which may not be feasible as not all functions are black boxes.

If you want to calculate elapsed time you might want to shut off energy-saving on the CPUs. If it's only a matter of clock cycles this is not necessary.

Solution 5

If you are trying to compare the performance, the easiest way is to put your algorithm in a loop and run it 1000 or 1000000 times.

Once you are running it enough times that the small differences can be seen, run time ./my_program which will give you the amount of processor time that it used.

Do this a few times to get a sampling and compare the results.

Trying to count instructions won't help you on x86 architecture. This is because different instructions can take significantly different amounts of time to execute.

Share:
11,620
Dervin Thunk
Author by

Dervin Thunk

Updated on June 27, 2022

Comments

  • Dervin Thunk
    Dervin Thunk almost 2 years

    I'm trying to find the find the relative merits of 2 small functions in C. One that adds by loop, one that adds by explicit variables. The functions are irrelevant themselves, but I'd like someone to teach me how to count cycles so as to compare the algorithms. So f1 will take 10 cycles, while f2 will take 8. That's the kind of reasoning I would like to do. No performance measurements (e.g. gprof experiments) at this point, just good old instruction counting.

    Is there a good way to do this? Are there tools? Documentation? I'm writing C, compiling with gcc on an x86 architecture.