Cycle counter on ARM Cortex M4 (or M3)?

35,926

Solution 1

Take a look at the DWT_CYCCNT register defined here. Note that this register is implementation-dependent. Who is the chip vendor? I know the STM32 implementation offers this set of registers.

This post provides instructions for using the DWT Cycle Counter Register for timing. (See the post form 11 December 2009 - 06:29 PM)

This Stack overflow post is an example on how to DWT_CYCCNT as well.

Solution 2

If your part incorporates the CoreSight Embedded Trace Macrocell and you have appropriate trace capable debugger hardware and software then you can profile the code directly. Trace capable debug hardware is of course more expensive, and your board needs to be designed to make the trace port pins available on the debug header. Since these pins are often multiplexed to other functions, that may not always be possible or practical.

Otherwise if your tool-chain includes a cycle-accurate simulator (such as that available in Keil uVision), you can use that to analyse the code timing. The simulator provides debug, trace and profiling features that are generally more powerful and flexible that those available on chip, so even if you do have trace hardware, the simulator may still be the easier solution.

Solution 3

Expanding previous answers with a DWT_CYCCNT example (STM32) in main (similar to my other post).

Note: I added a delay method as well. You can verify stopwatch_delay by calling STOPWATCH_START, run stopwatch_delay(ticks), then call STOPWATCH_STOP and verify with CalcNanosecondsFromStopwatch(m_nStart, m_nStop). Adjust ticks as needed.

uint32_t m_nStart;               //DEBUG Stopwatch start cycle counter value
uint32_t m_nStop;                //DEBUG Stopwatch stop cycle counter value

#define DEMCR_TRCENA    0x01000000

/* Core Debug registers */
#define DEMCR           (*((volatile uint32_t *)0xE000EDFC))
#define DWT_CTRL        (*(volatile uint32_t *)0xe0001000)
#define CYCCNTENA       (1<<0)
#define DWT_CYCCNT      ((volatile uint32_t *)0xE0001004)
#define CPU_CYCLES      *DWT_CYCCNT
#define CLK_SPEED         168000000 // EXAMPLE for CortexM4, EDIT as needed

#define STOPWATCH_START { m_nStart = *((volatile unsigned int *)0xE0001004);}
#define STOPWATCH_STOP  { m_nStop = *((volatile unsigned int *)0xE0001004);}


static inline void stopwatch_reset(void)
{
    /* Enable DWT */
    DEMCR |= DEMCR_TRCENA; 
    *DWT_CYCCNT = 0;             
    /* Enable CPU cycle counter */
    DWT_CTRL |= CYCCNTENA;
}

static inline uint32_t stopwatch_getticks()
{
    return CPU_CYCLES;
}

static inline void stopwatch_delay(uint32_t ticks)
{
    uint32_t end_ticks = ticks + stopwatch_getticks();
    while(1)
    {
            if (stopwatch_getticks() >= end_ticks)
                    break;
    }
}

uint32_t CalcNanosecondsFromStopwatch(uint32_t nStart, uint32_t nStop)
{
    uint32_t nDiffTicks;
    uint32_t nSystemCoreTicksPerMicrosec;

    // Convert (clk speed per sec) to (clk speed per microsec)
    nSystemCoreTicksPerMicrosec = CLK_SPEED / 1000000;

    // Elapsed ticks
    nDiffTicks = nStop - nStart;

    // Elapsed nanosec = 1000 * (ticks-elapsed / clock-ticks in a microsec)
    return 1000 * nDiffTicks / nSystemCoreTicksPerMicrosec;
} 

void main(void)
{
    int timeDiff = 0;
    stopwatch_reset();

    // =============================================
    // Example: use a delay, and measure how long it took
    STOPWATCH_START;
    stopwatch_delay(168000); // 168k ticks is 1ms for 168MHz core
    STOPWATCH_STOP;

    timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
    printf("My delay measured to be %d nanoseconds\n", timeDiff);

    // =============================================
    // Example: measure function duration in nanosec
    STOPWATCH_START;
    // run_my_function() => do something here
    STOPWATCH_STOP;

    timeDiff = CalcNanosecondsFromStopwatch(m_nStart, m_nStop);
    printf("My function took %d nanoseconds\n", timeDiff);
}

Solution 4

This is just easier:

[code]

#define start_timer()    *((volatile uint32_t*)0xE0001000) = 0x40000001  // Enable CYCCNT register
#define stop_timer()   *((volatile uint32_t*)0xE0001000) = 0x40000000  // Disable CYCCNT register
#define get_timer()   *((volatile uint32_t*)0xE0001004)               // Get value from CYCCNT register

/***********
* How to use:
*       uint32_t it1, it2;      // start and stop flag                                             

        start_timer();          // start the timer.
        it1 = get_timer();      // store current cycle-count in a local

        // do something

        it2 = get_timer() - it1;    // Derive the cycle-count difference
        stop_timer();               // If timer is not needed any more, stop

print_int(it2);                 // Display the difference
****/

[/code]

Works on Cortex M4: STM32F407VGT on a CJMCU Board and just counts the required cycles.

Share:
35,926
makapuf
Author by

makapuf

Updated on October 30, 2020

Comments

  • makapuf
    makapuf over 3 years

    I'm trying to profile a C function (which is called from an interrupt, but I can extract it and profile it elsewhere) on a Cortex M4.

    What are the possibilities to count the number of cycles typically used in this function ? Function shall run in ~4000 cycles top, so RTC isn't an option I guess, and manually counting cycles from disassembly can be painful - and only useful if averaged because I'd like to profile on a typical stream with typical flash / memory usage pattern.

    I have heard about cycle counter registers and MRC instructions, but they seem to be available for A8/11. I haven't seen such instructions in cortex-Mx micros.

    • old_timer
      old_timer almost 12 years
      most microcontrollers have timers, the cortex-m3 has one in the core (m4 doesnt if I remember right or m0 doesnt one of the two). github.com/dwelch67 I have many examples and all start with blinking leds progressively working towards using different timers, etc. mbed and stm32f4d are cortex-m examples (there are others).
  • makapuf
    makapuf almost 12 years
    I've seen it quickly, but thought it was only a comparator to a counter, only able to generate interrupts each time a given value. So I would have only an imprecise count - interrupting each 500 cycles, or have a big impact on performance, always interrupting to code ? How to get access to its value or use it ? (It is indeed an STM32F4 chip)
  • Throwback1986
    Throwback1986 almost 12 years
    @makapuf: See edited post. You should be able to obtain precise timing using this register.
  • makapuf
    makapuf almost 12 years
    I'm using gnu tool chain on Linux, so gcc/gdb
  • Clifford
    Clifford almost 12 years
    One slightly convoluted solution perhaps then is to use a Windows machine or a Windows VM running in VirtualBox for example, and then use the evaluation version of Keil uVision with Codesourcery's GNU ARM Toolchain. The evaluation restrictions are on the ARM RealView compiler/linker not the IDE and I am not sure about the debugger/simulator, but even if they are restricted the code size limit is 32k, so you can probably test this function if not the entire application. Details: keil.com/appnotes/docs/apnt_199.asp. Probably to much trouble though.
  • makapuf
    makapuf almost 12 years
    Thanks, sounds like it ! I'll try it ASAP. I'll post the results.
  • makapuf
    makapuf almost 12 years
    Thanks but This will only be a simulation, based on a perfect memory model (could be great as a first approx but I'd trust the real deal better in case of memory bus contention ( I use heavyly DMA transfers also ...)
  • Clifford
    Clifford almost 12 years
    @makapuf: True, but equally you might never know whether your "real" measurements represent worst case conditions either in that case. The real measurements will be variable, while the simulation will give you a base-line constant from which to calculate worst-case conditions (perhaps). It would be interesting to do both, but you may not have the time or the equipment. I suggest Throwback1986's solution.
  • makapuf
    makapuf almost 12 years
    I also think I'll start with it. Thanks again for your answer. Besides, talking about simulations, it seems ARMulator is a cycle-perfect ARM simulator, do you have any experience with it ?
  • Clifford
    Clifford almost 12 years
    @makapuf: I have not used ARMulator, however I would suggest that it might be useful if your code has no peripheral hardware dependencies. Does it support Cortex-M however? One advantage of Keil's simulator for example is that it emulates on-chip peripherals for all the ARM based chips it directly supports. Since before Cortex, not even interrupt controllers were part of the core this is very useful. Consider also OVPsim which does support Cortex-M I believe, and runs on Linux.
  • makapuf
    makapuf almost 12 years
    I will try to use keil as soon
  • BenCr
    BenCr about 10 years
    Include the content from the links in the answer case they die again
  • Throwback1986
    Throwback1986 almost 10 years
    As a follow-up for posterity, this link is quite good: stackoverflow.com/questions/13379220/…
  • Marc
    Marc about 6 years
    Works on MK22FN512xxx12
  • bunkerdive
    bunkerdive over 3 years
    "This post" link is dead