What are "Instructions per Cycle"?

19,006

Solution 1

The keywords you should probably look up are CISC, RISC and superscalar architecture.

CISC

In a CISC architecture (x86, 68000, VAX) one instruction is powerful, but it takes multiple cycles to process. In older architectures the number of cycles was fixed, nowadays the number of cycles per instruction usually depends on various factors (cache hit/miss, branch prediction, etc.). There are tables to look up that stuff. Often there are also facilitates to actually measure how many cycles a certain instruction under certain circumstances takes (see performance counters).

If you are interested in the details for Intel, the Intel 64 and IA-32 Optimization Reference Manual is a very good read.

RISC

RISC (ARM, PowerPC, SPARC) architecture means usually one very simple instruction takes only a few (often only one) cycle.

Superscalar

But regardless of CISC or RISC there is the superscalar architecture. The CPU is not processing one instruction after another but is working on many instructions simultaneously, very much like an assembly line.

The consequence is: If you simply look up the cycles for every instruction of your program and then add them all up you will end up with a number way to high. Suppose you have a single core RISC CPU. The time to process a single instruction can never be less than the time of one cycle, but the overall throughput may well be several instructions per cycle.

Solution 2

The way I like to think of it is with a laundry analogy. CPU instructions are like loads of laundry. You need to use both the washer and the dryer for each load. Let's say that each takes 30 minutes to run. That is the clock cycle. Old CPUs would run the washer, then run the dryer, taking 60 minutes (2 cycles) to finish each load of laundry, every time.

Pipelining: A pipeline is when you use both at the same time -- you wash a load, then while it is drying, you wash the next load. The first load takes 2 cycles to finish, but the second load is finished after 1 more cycle. So, most loads only need 1 cycle, except the first load.

Superscalar: Take all the laundry to the laundromat. Get 2 washers and load them both. When they are done, find 2 dryers and use them both. Now you can wash and dry 2 loads in 60 minutes. That is 2 loads in 2 cycles. Each load still takes 2 cycles, but you can do more of them now. Average time is now 1 load per cycle.

Superscalar with Pipelining: Wash the first 2 loads, then while these are drying, load up the washers with the next 2 loads. Now, the first 2 loads still take 2 cycles, and then the next 2 are finished after 1 more cycle. So, most of the time, you finish 2 loads in each cycle.

Multiple cores: Give half of your laundry to your mother, who also has 2 washers and 2 dryers. With both of you working together, you can get twice as much done. This is similar to superscalar, but slightly different. Instead of you having to move all laundry to and from each machine yourself, she can do that at the same time as you.

This is great, we can do eight times more laundry than before in the same amount of time, without having to create faster machines. (Double the clock speed: Washing machines that only need 15 minutes to run.)

Now, let's talk about how things go wrong:

Pipeline bubble: You have a stain that did not come out in the wash, so you decide to wash it again. Now the dryer is just sitting there, waiting for something to do.

Cache Miss: The truck that delivers the dirty laundry is stuck in traffic. Now you have 2 washers and 2 dryers, but you are getting no work done because you have to wait.

Depending on how often things go wrong, we will not be able to always get 4 loads done every cycle, so the actual amount of work done may vary.

Branch Prediction: Well, you start doing laundry on your clean clothes in case you stain them later so they will be clean already ... okay, this is where the analogy breaks down ...

Solution 3

Not exactly. The cycle you're referring to is clock cycle and since most modern processors pipeline, it takes several clock cycles for 1 instruction to execute. (This is a good thing because it allows other instructions to begin execution even before the 1st instruction finishes.) Assuming the most ideal circumstance, it would probably be around 8 billions IPC, but all sorts of things happen like dependencies, bubbles in the pipeline, branches, etc. so it doesn't always work out.

Sorry, it's way too complicated for a straight answer. Jon Stokes does a good job of explaining it with this article.

Solution 4

The days when one could look up (or even memorize) the cycle time for each instruction and know how many clocks it would take for a certain bit of code to finish are long past for high-end chips (but are still with us in some micro-controllers). A modern, general purpose CPU core may have multiple copies of several different execution units in multiple pipelines, accessing a multi-stage memory cache with its own logic, plus branch prediction and speculative execution capability. Having multiple core on a single die drags in cache consistence logic, and other complexities.

So the short answer is: more cores means more capacity to get things done, but not in a nice, predictable way.

Solution 5

Ludwig explained the difference between CISC and RISC, but forgot to mention that while RISC instructions are simple and quick, they do little individually and so you must string several together to do the same thing as a single instruction in a CISC processor. As a result, some RISC instructions will be faster, other will not.

Share:
19,006
Matt Simmons
Author by

Matt Simmons

Updated on September 17, 2022

Comments

  • Matt Simmons
    Matt Simmons over 1 year

    I've been learning a little bit more about how processors work, but I haven't been able to find a straight answer about instructions per cycle.

    For instance, I was under the impression that a four core CPU could execute four instructions per cycle, so a four core CPU running at 2Ghz would execute 8 billion operations per second. Is this the case?

    I'm sure it's oversimplifying things, but if there's a guide or something else I can use to set myself straight, I'm definitely open to ideas.

  • sblair
    sblair almost 15 years
    To me, the "assembly line" analogy suggests just simple pipelining, not a superscalar architecture. Superscalar involves replicating parts of CPU hardware (e.g., a stage of the pipeline that is a bottleneck) to improve throughput.
  • trh88
    trh88 almost 15 years
    I'm adding for brevity: RISC = reduced instruction set; CISC = complex instruction set. Good explanation, Ludwig for pointing out cache hit/miss ratio and (ultimately) pointing out TLB. Explaining microprocessor architecture is not easy, especially to cram it all into one (fairly compact) post! :)
  • dmckee --- ex-moderator kitten
    dmckee --- ex-moderator kitten over 14 years
    Nice analogy. I'm going to steal it.
  • Ronald Pottol
    Ronald Pottol over 14 years
    And hyperthreading is like having several people doing their wash at the same laundromat.
  • Akash
    Akash about 12 years
    Branch Prediction: you start washing clothes which you think you will need in the next week
  • Florenz Kley
    Florenz Kley about 12 years
    Hyperthreading: you start accepting other people's laundry, and advertise the number of washing machines you have (1). Soon, you realize that your washing machine has room for more than the pair of pants you are washing, just not another pair of pants, but something smaller. So you stuff in some socks, too. Now you advertise 2 washing machines, and hope that people will drop off laundry diverse enough for you to always "fill the holes" with smaller items. Just when this guy who ony ever comes with 10 dirty jeans and 1 pair of socks drops his stuff, it is as slow as ever.
  • DrColossos
    DrColossos about 12 years
    @Akash You wash clothes that do not even have stains yet, just in case?
  • Akash
    Akash about 12 years
    @KevinPanko Assuming they are dirty ofcourse, and some other clothes are also dirty which you probably wont use the next week.
  • user6849803
    user6849803 over 10 years
    does it mean that machine cycle is different from clock cycle? a single machine cycle consists of many clock cycles?
  • DrColossos
    DrColossos over 10 years
    No, they are the same thing here.