Why do we have CPUs with all the cores at the same speeds and not combinations of different speeds?

cpu multi-core cpu-architecture cpu-cores

39,011

Solution 1

This is known as heterogeneous multiprocessing (HMP) and is widely adopted by mobile devices. In ARM-based devices which implement big.LITTLE, the processor contains cores with different performance and power profiles, e.g. some cores run fast but draw lots of power (faster architecture and/or higher clocks) while others are energy-efficient but slow (slower architecture and/or lower clocks). This is useful because power usage tends to increase disproportionately as you increase performance once you get past a certain point. The idea here is to get performance when you need it and battery life when you don't.

On desktop platforms, power consumption is much less of an issue so this is not truly necessary. Most applications expect each core to have similar performance characteristics, and scheduling processes for HMP systems is much more complex than scheduling for traditional SMP systems. (Windows 10 technically has support for HMP, but it's mainly intended for mobile devices that use ARM big.LITTLE.)

Also, most desktop and laptop processors today are not thermally or electrically limited to the point where some cores need to run faster than others even for short bursts. We've basically hit a wall on how fast we can make individual cores, so replacing some cores with slower ones won't allow the remaining cores to run faster.

While there are a few desktop processors that have one or two cores capable of running faster than the others, this capability is currently limited to certain very high-end Intel processors (as Turbo Boost Max Technology 3.0) and only involves a slight gain in performance for those cores that can run faster.

While it is certainly possible to design a traditional x86 processor with both large, fast cores and smaller, slower cores to optimize for heavily-threaded workloads, this would add considerable complexity to the processor design and applications are unlikely to properly support it.

Take a hypothetical processor with two fast Kaby Lake (7th-generation Core) cores and eight slow Goldmont (Atom) cores. You'd have a total of 10 cores, and heavily-threaded workloads optimized for this kind of processor may see a gain in performance and efficiency over a normal quad-core Kaby Lake processor. However, the different types of cores have wildly different performance levels, and the slow cores don't even support some of the instructions the fast cores support, like AVX. (ARM avoids this issue by requiring both the big and LITTLE cores to support the same instructions.)

Again, most Windows-based multithreaded applications assume that every core has the same or nearly the same level of performance and can execute the same instructions, so this kind of asymmetry is likely to result in less-than-ideal performance, perhaps even crashes if it uses instructions not supported by the slow cores. While Intel could modify the slow cores to add advanced instruction support so that all cores can execute all instructions, this would not resolve issues with software support for heterogeneous processors.

A different approach to application design, closer to what you're probably thinking about in your question, would use the GPU for acceleration of highly parallel portions of applications. This can be done using APIs like OpenCL and CUDA. As for a single-chip solution, AMD promotes hardware support for GPU acceleration in its APUs, which combine a traditional CPU and a high-performance integrated GPU onto the same chip, as Heterogeneous System Architecture, though this has not seen much industry uptake outside of a few specialized applications.

Solution 2

What you're asking is why are current systems using Symmetric multiprocessing rather than Asymmetric multiprocessing.

Asymmetric multiprocessing were used in the old days, when a computer was enormous and housed over several units.

Modern CPUs are cast as one unit, in one die, where it is much simpler not to mix CPUs of different types, since they all share the same bus and RAM.

There is also the constraint of the clock that governs the CPU cycles and RAM access. This will become impossible when mixing CPUs of different speeds. Clock-less experimental computers did exist and were even pretty fast, but the complexities of modern hardware imposed a simpler architecture.

For example, Sandy Bridge and Ivy Bridge cores can't be running at different speeds at the same time since the L3 cache bus runs at the same clock speed as the cores, so to prevent synchronization problems they all have to either run at that speed or be parked/off (link: Intel's Sandy Bridge Architecture Exposed). (Also verified in the comments below for Skylake.)

[EDIT] Some people have mistaken my answer to mean saying that mixing CPUs is impossible. For their benefit I state : Mixing of differing CPUs is not beyond today's technology, but is not done - "why not" is the question. As answered above, this would be technically complicated, therefore costlier and for too little or no financial gain, so does not interest the manufacturers.

Here are answers to some comments below :

Turbo boost changes CPU speeds so they can be changed

Turbo boost is done by speeding up the clock and changing some multipliers, which is exactly what people do when overclocking, except that the hardware does it for us. The clock is shared between cores on the same CPU, so this speeds up uniformly the entire CPU and all its cores.

Some phones have more than one CPU of different speeds

Such phones typically have a custom firmware and software stack associated with each CPU, more like two separate CPUs (or like CPU and GPU), and they lack a single view of system memory. This complexity is hard to program and so Asymmetric multiprocessing was left in the mobile realm, since it requires low-level close-to-the-hardware software development, which is shunned by general-purpose desktop OS. This is the reason that such configurations aren't found in the PC (except for CPU/GPU if we stretch enough the definition).

My server with 2x Xeon E5-2670 v3 (12 cores with HT) currently has cores at 1.3 GHz, 1.5 GHz, 1.6 GHz, 2.2 GHz, 2.5 GHz, 2.7 GHz, 2.8 GHz, 2.9 GHz, and many other speeds.

A core is either active or idle. All cores that are active at the same time run at the same frequency. What you are seeing is just an artifact of either timing or averaging. I have myself also noted that Windows does not park a core for a long time, but rather separately parks/unparks all cores far faster than the refresh rate of Resource Monitor, but I don't know the reason for this behavior which probably is behind the above remark.

Intel Haswell processors have integrated voltage regulators that enable individual voltages and frequencies for every core

Individual voltage regulators differ from clock speed. Not all cores are identical - some are faster. Faster cores are given slightly less power, creating the headroom to boost the power given to weaker cores. Core voltage regulators will be set as low as possible in order to maintain the current clock speed. The Power Control Unit on the CPU regulates voltages and will override OS requests where necessary for cores that differ in quality. Summary: Individual regulators are for making all cores operate economically at the same clock speed, not for setting individual core speeds

Solution 3

Why do we not have variants with differing clock speeds? ie. 2 'big' cores and lots of small cores.

It's possible that the phone in your pocket sports exactly that arrangement - the ARM big.LITTLE works exactly as you described. There it's not even just a clock speed difference, they can be entirely different core types - typically, the slower clocked ones are even "dumber" (no out-of-order execution and other CPU optimizations).

It's a nice idea essentially to save battery, but has its own shortcomings; the bookkeeping to move stuff between different CPUs is more complicated, the communication with the rest of the peripherals is more complicated and, most importantly, to use such cores effectively the task scheduler has to be extremely smart (and often to "guess right").

The ideal arrangement is to run non-time-critical background tasks or relatively small interactive tasks on on the "little" cores and wake the "big" ones only for big, long computations (where the extra time spent on the little cores ends up eating more battery) or for medium-sized interactive tasks, where the user feels sluggishness on the little cores.

However, the scheduler has limited information about the kind of work each task may be running, and has to resort to some heuristic (or external information, such as forcing some affinity mask on a given task) to decide where to schedule them. If it gets this wrong, you may end up wasting a lot of time/power to run a task on a slow core, and give a bad user experience, or using the "big" cores for low priority tasks, and thus wasting power/stealing them away from tasks that would need them.

Also, on an asymmetric multiprocessing system it's usually more costly to migrate tasks to a different core than it would be on an SMP system, so the scheduler generally has to make a good initial guess instead of trying to run on a random free core and moving it around later.

The Intel choice here instead is to have a lower number of identical intelligent and fast cores, but with very aggressive frequency scaling. When the CPU gets busy it quickly ramps up to the maximum clock speed, does the work the fastest it can and then scales it down to go back to lowest power usage mode. This doesn't place particular burden on the scheduler, and avoids the bad scenarios described above. Of course, even when in low clock mode, these cores are "smart" ones, so they'll probably consume more than the low-clock "stupid" big.LITTLE cores.

Solution 4

Performance in games tends to be determined by single core speed,

In the past (DOS era games): Correct.
These days, it is no longer true. Many modern games are threaded and benefit from multiple cores. Some games are already quite happy with 4 cores and that number seems to rise over time.

whereas applications like video editing are determined by number of cores.

Sort of true.

Number of cores * times speed of the core * efficiency.
If you compare a single identical core to a set of identical cores, then you are mostly correct.

In terms of what is available on the market - all the CPUs seem to have roughly the same speed with the main differences being more threads or more cores. For example:

Intel Core i5 7600k, Base Freq 3.80 GHz, 4 Cores Intel Core i7 7700k, Base Freq 4.20 GHz, 4 Cores, 8 Threads AMD Ryzen 1600x, Base Freq 3.60 GHz, 6 Cores, 12 Threads AMD Ryzen 1800x, Base Freq 3.60 GHz, 8 Cores, 16 Threads

Comparing different architectures is dangerous, but ok...

So why do we see this pattern of increasing cores with all cores having the same clock speed?

Partially because we ran into a barrier. Increasing clock speed further means more power needed and more heat generated. More heat meant even more power needed. We have tried that way, the result was the horrible pentium 4. Hot and power hungry. Hard to cool. And not even faster than the smartly designed Pentium-M (A P4 at 3.0GHz was roughly as fast as a P-mob at 1.7GHz).

Since then, we mostly gave up on pushing clock speed and instead we build smarter solutions. Part of that was to use multiple cores over raw clock speed.

E.g. a single 4GHz core might draw as much power and generate as much heat as three 2GHz cores. If your software can use multiple cores, it will be much faster.

Not all software could do that, but modern software typically can.

Which partially answers why we have chips with multiple cores, and why we sell chips with different numbers of cores.

As to clock speed, I think I can identify three points:

Low power CPUs makes sense for quite a few cases which raw speed is not needed. E.g. Domain controllers, NAS setups, ... For these, we do have lower frequency CPUs. Sometimes even with more cores (e.g. 8x low speed CPU make sense for a web server).
For the rest, we usually are near the maximum frequency which we can do without our current design getting too hot. (say 3 to 4GHz with current designs).
And on top of that, we do binning. Not all CPU are generated equally. Some CPU score badly or score badly in part of their chips, have those parts disabled and are sold as a different product.

The classic example of this was a 4 core AMD chip. If one core was broken, it was disabled and sold as a 3 core chip. When demand for these 3 cores was high, even some 4 cores were sold as the 3 core version, and with the right software hack, you could re-enable the 4th core.

And this is not only done with the number of cores, it also affects speed. Some chips run hotter than others. Too hot and sell it as a lower speed CPU (where lower frequency also means less heat generated).

And then there is production and marketing and that messes it up even further.

Why do we not have variants with differing clock speeds? ie. 2 'big' cores and lots of small cores.

We do. In places where it makes sense (e.g. mobile phones), we often have a SoC with a slow core CPU (low power), and a few faster cores. However, in the typical desktop PC, this is not done. It would make the setup much more complex, more expensive, and there is no battery to drain.

Solution 5

Why do we not have variants with differing clock speeds? For example, two 'big' cores and lots of small cores.

Unless we were extremely concerned about power consumption, it would make no sense to accept all the cost associated with an additional core and not get as much performance out of that core as possible. The maximum clock speed is determined largely by the fabrication process, and the entire chip is made by the same process. So what would the advantage be to making some of the cores slower than the fabrication process supported?

We already have cores that can slow down to save power. What would be the point to limiting their peak performance?

View more solutions

39,011

Jamie

Updated on September 18, 2022

Comments

Jamie over 1 year
In general if you are buying a new computer you would determine which processor to buy by what your expected workload will be. Performance in games tends to be determined by single core speed, whereas applications like video editing are determined by number of cores.

In terms of what is available on the market - all the CPUs seem to have roughly the same speed with the main differences being more threads or more cores.

For example:
- Intel Core i5-7600K, base frequency 3.80 GHz, 4 cores, 4 threads
- Intel Core i7-7700K, base frequency 4.20 GHz, 4 cores, 8 threads
- AMD Ryzen 5 1600X, base frequency 3.60 GHz, 6 cores, 12 threads
- AMD Ryzen 7 1800X, base frequency 3.60 GHz, 8 cores, 16 threads
So why do we see this pattern of increasing cores with all cores having the same clock speed?

Why do we not have variants with differing clock speeds? For example, two 'big' cores and lots of small cores.

For examples sake, instead of, say, four cores at 4.0 GHz (i.e. 4x4 GHz ~ 16 GHz maximum), what about a CPU with two cores running at say 4.0 GHz and say four cores running at 2 GHz (i.e. 2x4.0 GHz + 4x2.0 GHz ~ 16 GHz maximum). Wouldn't the second option be equally good at single threaded workloads, but potentially better at multi-threaded workloads?

I ask this question as a general point - not specifically about those CPUs I listed above, or about any specific one specific workload. I am just curious as to why the pattern is as it is.
- Mathew Lionnet almost 7 years
  
  There are many mobiles with fast and slow cores, and on nearly all modern multi core servers the CPU core speeds clock independent depending on the load, some even switch off cores when not used. On a general purpose computer where you do not design for saving energy however having only two types of cores (CPU and GPU) just makes the platform more flexible.
- LMiller7 almost 7 years
  
  Before the thread scheduler could make an intelligent choice about which core to use it would have to determine if a process can take advantage of multiple cores. Doing that reliably would be highly problematic and prone to error. Particularly when this can change dynamically according to the needs of the application. In many cases the scheduler would have to make a sub optimal choice when the best core was in use. Identical cores makes things simpler, provides maximum flexibility, and generally has the best performance.
- Bob Jarvis - Слава Україні almost 7 years
  
  Clock speeds cannot reasonably be said to be additive in the manner you described. Having four cores running at 4 Ghz does not mean you have a "total" of 16 GHz, nor does it mean that this 16 Ghz could be partitioned up into 8 processors running at 2 Ghz or 16 processors running at 1 GHz.
- Bob Jarvis - Слава Україні almost 7 years
  
  In a similar manner - consider how dreadnought battleships, which had a uniform main battery, replaced the pre-dreadnought battleships which had a main battery of the largest guns, an intermediate battery that was smaller, and an anti-torpedo-boat battery which was smaller still.
- phuclv almost 7 years
  
  4 cores@4GHz doesn't mean that it's running at 16GHz. Parallel processing doesn't work that way. And AFAIK AMD has supported different clock speeds for different cores for a very long time
- phuclv almost 7 years
  
  The premise of the question is simply wrong. Modern CPUs are perfectly capable of running cores at different speeds
- allquixotic almost 7 years
  
  Voted to reopen. Also, big.LITTLE designs in ARM SoCs are common, where the smaller cores are an entirely different design (sometimes different architecture), lower clocked and much more power efficient, while the big ones are used while the screen is on for apps in the foreground.
- phuclv almost 7 years
  
  Multi-core CPU: can I say I have a 3x2.1GHz=6.3GHz CPU?, How do I calculate clock speed in multi-core processors?,
- phuclv almost 7 years
  
  see the discussions here big.LITTLE x86: Why not?, Intel and the big.LITTLE concept
- Jamie almost 7 years
  
  @LưuVĩnhPhúc Of course the calculation doesn't work like that - if it did the question would be comparing to equals, it is literally the entire point of the question. The example is simply for means of comparison. CPUs being capable of running different cores at different speeds would apply to any combination of cores.- The Thanks for the links nonetheless.
- SGR almost 7 years
  
  Another point to make is that most modern CPUs from Intel and AMD can dynamically scale clock speed based on the task they're doing. My 4790K usually sits at around 2GHz when I'm just browsing the web, but then kicks up to 4GHz+ when I'm gaming.
- Baldrickk almost 7 years
  
  @LưuVĩnhPhúc intel have also been able to run cores at different clock speeds for a long time as well.
- mckenzm almost 7 years
  
  @Baldrickk AMD are more blatant, especially with FX and very especially with unlocked "latent" cores, these were locked for a reason and generally need to be hobbled.
- user541686 almost 7 years
  
  @BobJarvis: 16 GHz can't exactly be partitioned up into 8 processors of 2 GHz, of course, but can't it come pretty close? in contrast with the opposite direction?
- Christopher Schultz almost 7 years
  
  These days, people have such problems interpreting what Intel Core i5-7600K, base frequency 3.80 GHz, 4 cores, 4 threads means, can you imagine if you had a list of tech jargon about each individual core in the package? It would be marketing insanity, and everyone except for True Nerds would be confused. Intel has spent 30 years trying to make its chip designations accessible to consumers, which is why they (somewhat) recently moved to the i3/i5/i7 labeling, because otherwise people had no idea if a particular process was "fast" or "slow".
ganesh almost 7 years

Ah. more mshorter and to the point. +1
Jamie almost 7 years

My understanding is that if a core has a speed of 4.0GHz, that might break down as 40*100mhz. So if you had a core at 4.0GHz and another core at 2.0GHz could they not be both break down as 40*100mhz and 20*100mhz? Is that what you mean by the 'clock'? So is that an issue? The argument of it being simpler to cast one die is only an argument if there is not a sufficient benefit to casting two different sized cores.
Jamie almost 7 years

As I pointed out - "I ask this question as a general point - not specifically about those cpus I listed above", and there was a reason I gave two examples from each architecture. If we treat the two scenarios as 1. all big cores, and 2. two big & two small - then i think all the points you mention apply to both cases - ie. a theoretical max single core speed, binning of chips, downclocking when not in use.
harrymc almost 7 years

The clock pulses govern everything the CPU does, since data flows in it in steps that are governed by the clock. The clock is not here for telling the time, but for marking the time between data entering and exiting sub-circuits, so for computations to pass from one step to another, as well as RAM access stages. The clock is used for synchronization, and it would be hard to synchronize two CPUs that don't have the same timing between steps, or even the same steps.
ganesh almost 7 years

A single max speed core is not all that interesting when it does not get choosen though. Schedulers will need to be updated to actually prefer the high speed core(s).
Jamie almost 7 years

I don't, hence my question. Comparing an Intel i5 7600 to an i5 7600k, we see that the base clock is 100mhz for both and the difference is the core ratio. So you could have two cores with the same base clock of 100mhz but with different core ratios - does this scenario violate the synchronicity requirement?
Wayne Jhukie almost 7 years

Yeah, this is oversimplifying too much; it's not really true that all operations must be tied to the same clock, there are lots of clock domains and it's perfectly possible to run different cores at the same speed. Bus clock is not the same as internal clock, etc.
Michael almost 7 years

Modern chips already have multiple clock domains (even the RTC of a cheap&dumb microcontroller usually runs on a separate 32.7kHz domain). You just have to synchronize between clock domains. Even with a common clock you could divide it by 2, 4, 8 and so on.
Wayne Jhukie almost 7 years

@harrymc there are synchroniser blocks that manage it perfectly well; DRAM runs slower than core speed, and you can have Intel cores running at different speeds dynamically on the same chip.
Wayne Jhukie almost 7 years

@Jamie the clock multiplication (see "PLL") is usually "multiply by X divide by Y", where X is limited to a few choices and Y can be varied more widely. You can have one core at 4GHz and another at 2GHz or even 3.9GHz if you want, but there's a penalty of a few cycles for crossing clock domains.
Samin yeasir almost 7 years

Heuristics should be pretty simple. Any involuntary task switch (use of full timeslice) is an indication that the slow cpu is inappropriate for the task. Very low utilization and all voluntary task switches is indication that the task could be moved to the slow cpu.
phuclv almost 7 years

another problem is that 4 stupid 2GHz cores may take more die size than 2 smart 4GHz cores, or they may be smaller and take much less power than 4 GHz cores but run also much much slower
harrymc almost 7 years

@pjc50: Synchroniser blocks etc. between CPUs will make an architecture that is much too complicated and costly. Any price advantage that is gained in creating such a "middle-class" CPU will be lost that way, so there is no point. In addition, most OS today are uniquely oriented toward Symmetric multiprocessing.
Jamie almost 7 years

You are answering a different question. The question is about lots of big cores vs a couple of big cores and lots of small cores - the merits of the two scenarios. In both situations you can clock up and down dependent on demand, or boost a core.
Jamie almost 7 years

Windows already has a notion of 'Apps', 'Background Processes' and 'Windows Processes'. So this doesn't extend to a hardware level?
Nick T almost 7 years

Intel Core-series processors run at different speeds on the same die all the time.
harrymc almost 7 years

@NickT: All at the same time.
tvdo almost 7 years

@Jamie A "background" process gets smaller time slices and is more likely to be interrupted. Windows 10 does, to some extent, account for HMP systems, though there isn't much information on yet how.
harrymc almost 7 years

@Bob: The question is why are the processors all the same. It's well known that modern OS can vary power consumption and even park cores.
Agent_L almost 7 years

The sole existence of big.LITTLE architectures and core-indepenendent clock boosting proves you wrong. Heterogeneous multiprocessing is mainstream. It can be done, it is done in phones, but for some reason not in desktops.
harrymc almost 7 years

@Agent_L: The reason is the complexity. Desktop CPUs are costly enough already. So I repeat: Everything is possible, but the actual question is why it is not done, not whether it can be done. Do not attack me as if I have claimed this is impossible - all I say is that it's too complicated and costly and for too little gain to interest the manufacturers.
MPW almost 7 years

This is what I was thinking. Why intentionally use some inferior components when they could all be elite? +1.
Agent_L almost 7 years

It's better now, but IMHO you should dive more into details on why it's done in phones and less so in PCs. I believe that is the root of the question and you've merely mentioned it for now, without any real explanation. Mentioning clockless designs is just a distraction, I'd drop it. You've literally wrote "impossible", and it's still there on RAM clock access - when it is clearly possible and done, on desktops: single-core turbo-boost introduces clock difference. Nobody attacks you, but the obviously false statements you've made. Or back them up better, maybe it's me who get turbo-boost wrong
harrymc almost 7 years

@Agent_L: I don't know exactly how turbo boost is done, but guess that it speeds up the clock and some multipliers, same as overclocking. The clock is shared, so this speeds up the entire CPU and all its cores. For phones: They typically have a custom firmware and software stack associated with every CPU, more like two separate CPUs (or like CPU and GPU), and lacking a single view of system memory. This complexity is hard to program and so left AMP in the mobile realm, as it requires low-level close-to-the-hardware software development, which is shunned by general-purpose desktop OS .
Grant Wu almost 7 years

That's not how I read the question. The question does not mention architecturally different cores, despite using the words "big" and "small". It focuses exclusively on clock speed.
Grant Wu almost 7 years

"The clock is shared between cores on the same CPU, so this speeds uniformly up the entire CPU and all its cores." Wrong. Plenty of us have given plenty of evidence that this different cores run at different clocks on the same die at the same time. Pretty much every large modern processor does this.
Jamie almost 7 years

@MPW The choice isn't between creating a big core and then neutering it, it is between all big vs a few big and lots of small cores. Because you have two competing scenarios - single thread performance and multi thread performance - why not maximise both? Do we know that you can't fabricate a chip with a few big and lots of small cores?
Jamie almost 7 years

So I think that after the edit @bwDraco has pretty much answered it for me. If there was a 'mixed' processor it could easily support the same instruction set if it was built that way, so then we would need some sort of scheduler to pick the right core. I'm thinking that really the applications which benefit from going to lots of small cores would probably benefit even more from going to lots and lots of really small cores. Thus we have GPU acceleration.
Nick T almost 7 years

My server with 2x Xeon E5-2670 v3 (12 cores with HT) currently has cores at 1.3 GHz, 1.5 GHz, 1.6 GHz, 2.2 GHz, 2.5 GHz, 2.7 GHz, 2.8 GHz, 2.9 GHz, and many other speeds. In fact, it's rare that cat /proc/cpuinfo | grep MHz | uniq -c ever shows duplicates.
David Schwartz almost 7 years

@Jamie You could fabricate a chip with a few big and lots of small cores. But the smaller cores wouldn't run at a lower clock speed.
Jamie almost 7 years

They would if they were designed that way... The question is why aren't they designed that way from scratch, not taking an existing fabrication process and neutering it.
David Schwartz almost 7 years

@Jamie I don't understand what you're saying. The whole CPU has to be made with the same fabrication process, and the maximum clock speed is largely a characteristic of the fabrication processes. Cores that require a lower clock speed at the same fabrication level would generally be more complex and take more space, otherwise why would they require a lower clock speed?
Jamie almost 7 years

Maybe I don't know enough about the fabrication process to understand. Could you not create two different cores on the same cpu within the same fabrication process? - ie. a 4.0GHz (40*100mhz) core & a 2.0GHz (20*100mhz) core. Some cpus have on-chip gpus, is this part of the fabrication process or is it added later? There is clearly currency in adding complexity - if the end result is worth it.
David Schwartz almost 7 years

@Jamie Sure, you could do that. But likely the 2.0GHz core would be larger and more complex, requiring it to run at a lower frequency. (Why else would it need to run at a lower frequency even though it's built with the same fabrication process?)
harrymc almost 7 years

@NickT: A core is either active or idle. All cores that are active at the same time run at the same frequency. What you are seeing is just an artifact of either timing or averaging. For example, Sandy Bridge and Ivy Bridge cores can't be running at different speeds at the same time since the L3 cache bus runs at the same clock speed as the cores, so to prevent synchronization problems they all have to either run at that speed or shut off (link).
Matteo Italia almost 7 years

@R.: in line of principle I agree with you, but even enabling some basic scheduler support for this I saw ridiculous core jostling on an ARM board I used, so there must be something else to it. Besides, most "regular" multithreaded software is written with SMP in mind, so it's not untypical to see thread pools as big as the total number of cores, with jobs dragging on the slow cores.
Agent_L almost 7 years

@harrymc Thanks, I have learned something new today.
RyRoUK almost 7 years

All true. But it still reduces efficiency of operation. And that is always the goal in regards to performance. That was my point. Sure, you can do it. But you'll take a hit on performance.
Grant Wu almost 7 years

Please remove the incorrect information about the E5-2670 v3. To quote ieeexplore.ieee.org/document/7284406 :"The recently introduced Intel Xeon E5-1600 v3 and E5-2600 v3 series processors–codenamed Haswell-EP–implement major changes compared to their predecessors. Among these changes are integrated voltage regulators that enable individual voltages and frequencies for every core."
Yakk almost 7 years

Note that the GPU case isn't trading 2 big cores for 10 small and slow cores, but rather the (very rough) equivalent of trading 2 big cores for 1024 small and slow cores. Massively parallel, not just a little bit more parallel.
harrymc almost 7 years

@GrantWu: Individual voltage regulators differ from clock speed. Not all cores are identical - some are faster. Faster cores are given slightly less power, creating the headroom to boost the power given to weaker cores. Core voltage regulators will be set as low as possible in order to maintain the current clock speed. The Power Control Unit on the CPU regulates voltages and will override OS requests where necessary for cores that differ in quality. Summary: Individual regulators are for making all cores operate economically at the same clock speed, not for setting individual core speeds.
JFA almost 7 years

This question is about CPUs, but I think for the implied question, it's important to note that computers actually already kind of do this across the motherboard. While it doesn't make sense to run the CPU at different speeds rather than just the fastest available, different chips and buses on the motherboard already run at slower clock speeds, designed for a trade-off between cost of materials and development vs performance.
Grant Wu almost 7 years

"that enable individual voltages and frequencies for every core" "This enables per-core pstates (PCPS) [14] instead of one p-state for all cores as in previous products. The finer granularity of voltage and frequency domains enables energy-aware runtimes and operating systems to lower the power consumption of single cores while keeping the performance of other cores at a high level." "Previous Intel processor generations used either a fixed uncore frequency (Nehalem-EP and Westmere-EP) or a common frequency for cores and uncore (Sandy Bridge-EP and Ivy Bridge-EP)."
harrymc almost 7 years

@GrantWu: That does not contradict what I said, just gives more hardware details.
Grant Wu almost 7 years

Yes it does. It says "individual... frequencies" for every core. Or look at stackoverflow.com/questions/2619745/…
Grant Wu almost 7 years

Or, look at the abstract of aspire.eecs.berkeley.edu/wp/wp-content/uploads/2014/07/… "it is highly desirable to independently control the supply and the clock frequency for each core":
harrymc almost 7 years

@GrantWu: That does not replace the CPU clock - it is only used to adjust the speed to follow the clock. This is probably the mechanism used for implementing turbo boost and for homogenizing the cores (cores performance might differ as not all cores are identical when manufactured).
tvdo almost 7 years

On closer look, I think @harrymc is correct. As of Skylake, all cores still share a clock domain. Though the publicly available literature is a little bit vague in whether it is merely referring to the base clock or the cores also share a multiplier; the latter is implied.
Peter Cordes almost 7 years

@Ramhound: A 120W 10-core part has a power budget of 12W per core (except in single-core turbo mode). This is why the highest single-core clocks are found in the quad-core parts, where e.g. Intel's i7-6700k has a power budget of 91W for 4 cores: 22.75W per core sustained with all cores active (at 4.0GHz even with an AVX2+FMA workload like Prime95). This is also why the single-core Turbo headroom is only an extra 0.2GHz, vs. a 22-core Broadwell E5-2699v4 with 2.2GHz base@145W, 3.6GHz turbo.
Peter Cordes almost 7 years

@Ramhound: added an answer that expands on this. A many-core Xeon seems to be exactly what the OP is looking for: operate as many low-power cores, or spend a lot of power running a single-thread fast when possible (turbo).
Peter Cordes almost 7 years

Intel could probably get a Goldmont core to run AVX2 instructions without much extra silicon (slowly, by decoding to pairs of 128b ops). Knight's Landing (Xeon Phi) has Silvermont-based cores with AVX512, so it's not like it's impossible to modify Silvermont. But KNL adds out-of-order execution for vector instructions, while normal Silver/Goldmont only does OOO for integer, so they'd probably want to design it closer to Goldmont than KNL. Anyway, insn sets are not a real problem. It's OS support and small benefit that are the real obstacles to spending die-area on a low-power core.
hmijail mourns resignees almost 7 years

"Reduces performance" - compared to what? You are assuming a base state where you have n processors running with the same clock. That doesn't have to be the case. Processor X + processor Y is a more powerful/flexible solution than processor X alone, no matter what exactly processor Y is.
RyRoUK almost 7 years

Compared to its own max voltage + frequency. If all cores are maxed out in both V & f, then scaling down any core would result in lower potential performance.
Suici Doga almost 7 years

if i look at the individual core speed i can see some cores run faster than others but the max speed is the same for all cores.