Why are we still using CPUs instead of GPUs?

cpu gpu cuda gpgpu

235,834

Solution 1

TL;DR answer: GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations.

GPGPU is still a relatively new concept. GPUs were initially used for rendering graphics only; as technology advanced, the large number of cores in GPUs relative to CPUs was exploited by developing computational capabilities for GPUs so that they can process many parallel streams of data simultaneously, no matter what that data may be. While GPUs can have hundreds or even thousands of stream processors, they each run slower than a CPU core and have fewer features (even if they are Turing complete and can be programmed to run any program a CPU can run). Features missing from GPUs include interrupts and virtual memory, which are required to implement a modern operating system.

In other words, CPUs and GPUs have significantly different architectures that make them better suited to different tasks. A GPU can handle large amounts of data in many streams, performing relatively simple operations on them, but is ill-suited to heavy or complex processing on a single or few streams of data. A CPU is much faster on a per-core basis (in terms of instructions per second) and can perform complex operations on a single or few streams of data more easily, but cannot efficiently handle many streams simultaneously.

As a result, GPUs are not suited to handle tasks that do not significantly benefit from or cannot be parallelized, including many common consumer applications such as word processors. Furthermore, GPUs use a fundamentally different architecture; one would have to program an application specifically for a GPU for it to work, and significantly different techniques are required to program GPUs. These different techniques include new programming languages, modifications to existing languages, and new programming paradigms that are better suited to expressing a computation as a parallel operation to be performed by many stream processors. For more information on the techniques needed to program GPUs, see the Wikipedia articles on stream processing and parallel computing.

Modern GPUs are capable of performing vector operations and floating-point arithmetic, with the latest cards capable of manipulating double-precision floating-point numbers. Frameworks such as CUDA and OpenCL enable programs to be written for GPUs, and the nature of GPUs make them most suited to highly parallelizable operations, such as in scientific computing, where a series of specialized GPU compute cards can be a viable replacement for a small compute cluster as in NVIDIA Tesla Personal Supercomputers. Consumers with modern GPUs who are experienced with Folding@home can use them to contribute with GPU clients, which can perform protein folding simulations at very high speeds and contribute more work to the project (be sure to read the FAQs first, especially those related to GPUs). GPUs can also enable better physics simulation in video games using PhysX, accelerate video encoding and decoding, and perform other compute-intensive tasks. It is these types of tasks that GPUs are most suited to performing.

AMD is pioneering a processor design called the Accelerated Processing Unit (APU) which combines conventional x86 CPU cores with GPUs. This approach enables graphical performance vastly superior to motherboard-integrated graphics solutions (though no match for more expensive discrete GPUs), and allows for a compact, low-cost system with good multimedia performance without the need for a separate GPU. The latest Intel processors also offer on-chip integrated graphics, although competitive integrated GPU performance is currently limited to the few chips with Intel Iris Pro Graphics. As technology continues to advance, we will see an increasing degree of convergence of these once-separate parts. AMD envisions a future where the CPU and GPU are one, capable of seamlessly working together on the same task.

Nonetheless, many tasks performed by PC operating systems and applications are still better suited to CPUs, and much work is needed to accelerate a program using a GPU. Since so much existing software use the x86 architecture, and because GPUs require different programming techniques and are missing several important features needed for operating systems, a general transition from CPU to GPU for everyday computing is very difficult.

Solution 2

What makes the GPU so much faster than the CPU?

The GPU is not faster than the CPU. CPU and GPU are designed with two different goals, with different trade-offs, so they have different performance characteristic. Certain tasks are faster in a CPU while other tasks are faster computed in a GPU. The CPU excels at doing complex manipulations to a small set of data, the GPU excels at doing simple manipulations to a large set of data.

The GPU is a special-purpose CPU, designed so that a single instruction works over a large block of data (SIMD/Single Instruction Multiple Data), all of them applying the same operation. Working in blocks of data is certainly more efficient than working with a single cell at a time because there is a much reduced overhead in decoding the instructions, however working in large blocks means there are more parallel working units, so it uses much much more transistors to implement a single GPU instruction (causing physical size constraint, using more energy, and producing more heat).

The CPU is designed to execute a single instruction on a single datum as quickly as possible. Since it only need to work with a single datum, the number of transistors that is required to implement a single instruction is much less so a CPU can afford to have a larger instruction set, a more complex ALU, a better branch prediction, better virtualized architecture, and a more sophisticated caching/pipeline schemes. Its instruction cycles is also faster.

The reason why we are still using CPU is not because x86 is the king of CPU architecture and Windows is written for x86, the reason why we are still using CPU is because the kind of tasks that an OS needs to do, i.e. making decisions, is run more efficiently on a CPU architecture. An OS needs to look at 100s of different types of data and make various decisions which all depends on each other; this kind of job does not easily parallelizes, at least not into an SIMD architecture.

In the future, what we will see is a convergence between the CPU and GPU architecture as CPU acquires the capability to work over blocks of data, e.g. SSE. Also, as manufacturing technology improves and chips gets smaller, the GPU can afford to implement more complex instructions.

Solution 3

GPUs lack:

Virtual memory (!!!)
Means of addressing devices other than memory (e.g. keyboards, printers, secondary storage, etc)
Interrupts

You need these to be able to implement anything like a modern operating system.

They are also (relatively) slow at double precision arithmetic (when compared with their single precision arithmetic performance)*, and are much larger (in terms of size of silicon). Older GPU architectures don't support indirect calls (through function pointers) needed for most general-purpose programming, and more recent architectures that do do so slowly. Finally, (as other answers have noted), for tasks which cannot be parallelized, GPUs lose in comparison to CPUs given the same workload.

EDIT: Please note that this response was written in 2011 -- GPU tech is an area changing constantly. Things could be very different depending on when you're reading this :P

* Some GPUs aren't slow at double precision arithmetic, such as NVidia's Quadro or Tesla lines (Fermi generation or newer), or AMD's FirePro line (GCN generation or newer). But these aren't in most consumers' machines.

Solution 4

A CPU is like a worker that goes super fast. A GPU is like a group of clone workers that go fast, but which all have to do exactly the same thing in unison (with the exception that you can have some clones sit idle if you want)

Which would you rather have as your fellow developer, one super fast guy, or 100 fast clones that are not actually as fast, but all have to perform the same actions simultaneously?

For some actions, the clones are pretty good e.g. sweep the floor - they can each sweep a part of it.

For some actions, the clones stink, e.g. write the weekly report - all the clones but one sit idle while one clone writes the report (otherwise you just get 100 copies of the same report).

Solution 5

Because GPUs are designed to do a lot of small things at once, and CPUs are designed to do a one thing at a time. If your process can be made massively parallel, like hashing, the GPU is orders of magnitude faster, otherwise it won't be.

Your CPU can compute a hash much, much faster than your GPU can - but the time it takes your CPU to do it, your GPU could be part way through several hundred hashes. GPUs are designed to do a lot of things at the same time, and CPUs are designed to do one thing at a time, but very fast.

The problem is that CPUs and GPUs are very different solutions to very different problems, there is a little overlap but generally what's in their domain stays in their domain. We can't replace the CPU with a GPU because the CPU is sitting there doing its job much better than a GPU ever could, simply because a GPU isn't designed to do the job, and a CPU is.

A minor side note, though, if it were possible to scrap the CPU and only have a GPU, don't you think we'd rename it? :)

View more solutions

235,834

ell

Updated on September 18, 2022

Comments

ell over 1 year

It seems to me that these days lots of calculations are done on the GPU. Obviously graphics are done there, but using CUDA and the like, AI, hashing algorithms (think bitcoins) and others are also done on the GPU. Why can't we just get rid of the CPU and use the GPU on its own? What makes the GPU so much faster than the CPU?
- Ellie Kesselman almost 13 years
  
  There are some recent answers @ell now, which do not contain "misinformation". They are gradually rising to the top with up votes due to efficient market mechanism of the wonderfully designed StackExchange ;-) I'd suggest waiting a little longer before accepting an answer. Looks like you very prudently are doing just that. This is a good question, by the way. Might seem obvious, but it isn't at all. Thank you for asking it!
- Daniel R Hicks almost 13 years
  
  There's no reason why one couldn't, eg, create a Java JITC for a GPU, from a code-generation point of view. And most OS code is now written in C/C++ which can be easily retargeted. So one is not tied to the x86 heritage in any really significant way (unless you're running Windoze). The problem is that few (if any) GPUs are at all good at general-purpose processing.
- Soren almost 13 years
  
  @DanH Except that Java is a bad language, specifically for creating programs which have a high level of parallelism. We need mainstream languages, like for functional programming, where parallelism is the natural way of expressing any program -- futher more the programming languages have to be well suited to operate on very small amount of memory for each unit of computation, as that is when the GPU operates efficient. As mentioned in the in the question, there are only few problems such as AI and the like which does this naturally without a new programming language
- Daniel R Hicks almost 13 years
  
  But you don't need to run Java. The point is that you're not chained to a processor architecture. As to a new language for parallel processing, people have been trying to invent one for maybe 30 years now, and not made significant progress. Whereas after 30 years of developing sequential programming languages we had Fortran, COBOL, Modula-2, C, Pascal, Ada, PL/I, C++, and a host of others.
- vartec almost 13 years
  
  Kind of like asking "If Boeing 747 is faster and more fuel efficient, why do we still drive cars"?
- Aki almost 13 years
  
  Does this sound familiar (RISC vs. CISC) ?
- JdeBP almost 13 years
  
  No, because it's not RISC versus CISC. It's one of the other computer science fundamentals, slightly disguised. It's "Why do we offload work from the central processor onto I/O processors?".
- Breakthrough almost 10 years
  
  @vartec as a seasoned CUDA developer, I think that may be the most accurate analogy I have ever seen, hands down. I'm saving that one :)
- supercat over 9 years
  
  @vartec: I think a slightly better analogy might be between buses and taxicabs. If there are forty people who all want to go from the same place to the same place, a bus will be much more efficient. If there are forty people whose desired origins and destinations are widely scattered, even a single taxicab may be just as good as a bus, and for the cost of the bus one could have multiple taxicabs.
- Basic almost 9 years
  
  As with all important technical questions, Mybusters have addressed this (And it's not a bad analogy)
- Ƭᴇcʜιᴇ007 over 8 years
  
  Related: The difference between GPU and CPU
Soren almost 13 years

Like this answer, I think the main reason is that we don't have good main stream programming languages to deal with parallel architectures like this. We have struggled for decades to advance multi threaded programming, and people are still calling , multi threading "evil". Despite that multi-core CPUs and GPUs are a reality, and we will have to come up with new programming paradigms to deal with this.
jkj almost 13 years

Why a downvote?
Chris S almost 13 years

Worth noting that Intel has been working on Larrabee architecture (for way too long) which is essentially a chip with a massive number of x86 cores on it.
Nich Del almost 13 years

Great answer for discussing the hardware reasons and discussing APUs and how they will change this. However, @Soren gives a very good point on the software side. In reality, it's the combination of the hardware issues, the software issues, and the fact that CPUs work and when something is known to work, it's hard to get people to replace it.
CenterOrbit almost 13 years

All very good points, I would like to add that most all of this is focused on computer-based solution. I would like to point out that cellphone processor manufactures more or less are building a merged product where the graphics and CPU among many other things are all contained on one chip. My EVO 3D has a dual-core and quite impressive graphics support. You can bet that as soon as rooting is available I will have a desktop-grade OS (like Ubuntu) dual boot installed. So I'm arguing that instead of one or the other, it is more just blurring the line of difference.
Javier almost 13 years

i guess the last line, as it's simply false. In fact, i can only think of one x86-only mainstream OS; and even that one has been ported to alpha and ARM processors, just not commercially offered at the moment.
jkj almost 13 years

Ok. Removed the last section that was my opinion about mainstream operating system support hindering change to new architectures. Might not be in the scope of the answer.
Vinko Vrsalovic almost 13 years

I would incorporate here the answer by Billy ONeal, it adds a very relevant (for today, as the convergence occurs it will dissappear) aspect.
surfasb almost 13 years

This is probably the best answer here. It is important to understand the fundamental differences between the two paradigms. For GPUs to overtake CPUs, considering today's workloads, essentially means a GPU must turn into a CPU. And thus the question is the answer.
bwDraco almost 13 years

I've expanded my answer based on your comments. Thanks for your feedback!
Billy ONeal almost 13 years

@Cicada: Do you have a reference for that? In any case, even if that is true, even recent hardware is not going to perform well in that case. (e.g. would not have too much a perf advantage over a CPU -- and a power consumption DISadvantage)
Angry Lettuce almost 13 years

Yes, the Fermi devices as you said (with CUDA 4.0 and sm_20), support indirect jumps (and therefore C++ virtual methods, inheritance etc).
JdeBP almost 13 years

I see no-one has yet mentioned the position of the two processors relative to video RAM as a contributory factor in which is the "faster".
Admin almost 13 years

+1 for this being the best answer. Both this and the accepted answer are correct, but this one explains it much more clearly.
Silverfire almost 13 years

please excuse the poor grammar and in general sub-par writing style used in the above, i have not had my coffee. its a fairly complicated concept and the included link is where you should go if you want to understand more. not my bad explanation
bwDraco almost 13 years

I've fixed it for you, and added a link as well.
DrColossos almost 13 years

Could I even have ... both?
Joachim Sauer almost 13 years

@Kevin: Yes, but you'd need a computer with both a CPU and a GPU! If only there were such a thing!
Soren almost 13 years

@BlueRaja -- we are aware of these languages, your definition of main stream must be different than mine :-)
Randolf Richardson almost 13 years

Yup, one step forward, two steps back.
Ben Voigt over 12 years

544 GigaFLOPS from a $300 2 year old GPU is slow?
Billy ONeal over 12 years

@Ben: You only get that kind of performance in data-parallel applications. General sequential operations are a whole different ballgame. (That's only with all 1600 cores on that chip running in parallel, running essentially the same instruction over and over again... and even that's theoretical rather than actual perf)
Ben Voigt over 12 years

@Billy: But that's slowness on a particular class of algorithms, not slowness on double precision arithmetic (which is what you claimed). (And CPUs usually don't achieve benchmark throughputs either)
Billy ONeal over 12 years

@Ben: Your linked article doesn't say anything about double precision. The advertised FLOPs are for single precision. Double precision operations on most GPUs nowadays are at least one sixth the speed of single precision (the notable exception being the Quadro and Tesla Fermi devices) I never said CPUs achieved benchmark throughput, so I'm not sure what your point is there.
Billy ONeal over 12 years

@Ben: Sorry, I see the double precision comment now. In any case, when I said "slow with double precision" I was talking about in comparison to single precision, not in comparison to CPUs.
Ben Voigt over 12 years

@Billy: But this whole question is about GPU vs CPU, not double precision vs single precision (also, CPUs see similar speed differences between single and double precision on many operations).
bwDraco about 12 years

@dlikhten: What you suggested is highly technical as it involves the microarchitecture of the chips involved. I do not want to bog the reader down in these details.
Mikhail over 11 years

I'm down-voting you because you forgot to mention the transfer time and cost difference. You can make a Tesla run as fast a single i3 thread, but its not wroth the money. Copy times mean that that jobs needs to be fairly large, many jobs of that size simply don't exist in regular computing jobs.
Ajay over 11 years

I think most modern CPUs are designed to do 2, 4, or 8 things at once.
UNK over 11 years

@danielcg25: And most modern GPUs are designed to do 256, 512, 1024 things at once (The GTX 680 has 1536 CUDA cores). Each individual CPU core is a distinct entity conceptually, but this is not true of a GPU.
UNK over 11 years

@danielcg25: I'm aware, but a comment with a fundamental (albeit intentional) misunderstanding of the answer could be harmful if anybody was reading it without already knowing the topic. "Being an ass" in that sense isn't really appreciated on SE as it lowers the signal:noise ratio.
Ajay over 11 years

I was just providing some information. Most computers nowadays actually are capable of processing 2-8 things at once. Some processors can do even more than that. It still doesn't come close to GPUs which do 100s of things at once.
UNK over 11 years

@danielcg25: It's a different kind of processing, though, which is what the question is about. Each CPU core is effectively separate, working with its own chunks of data and its own processes. Each CPU core performs a different, separate task to every other, and they do not scale upwards linearly--an octo-core is not twice as useful as a quad core is not twice as useful as a dual core. GPU cores, on the other hand, perform the same task across different pieces of data, and do scale linearly. It is obvious that multi-core CPUs exist, but this is not the same thing.
Jimmy Breck-McKye over 10 years

Good point about the branch prediction optimization - I would have never considered that, but you're right.
Dr. ABT about 9 years

I'm surprised no-one in this thread has mentioned the overhead of sending data to the GPU - limited bandwidth over PCI-Express buses makes some parallel operations on a GPU vastly slower than were they performed on the CPU. One simple case can be seen where varying the size of an FFT made a significant difference in performance on GPU vs. CPU due to the overhead of sending data, setting up a context, reading back results: stackoverflow.com/a/8687732/303612 Smaller operations can be performed in-cache on CPUs, and the memory bandwidth is vastly superior to the current PCI-E architecture
Lie Ryan about 9 years

@Dr.AndrewBurnett-Thompson: that's because that is irrelevant to the question. Currently, GPU is considered an auxiliary processing unit, that's why moving data from/to a GPU is necessary and expensive. If we treat GPU as the first class processing unit, there won't be any need to marshal data between the main memory and the GPU memory.
Dr. ABT about 9 years

Oh ok, so a GPU on-board a CPU will have zero bandwidth overhead moving data between the two. That's optimistic :)
Lie Ryan about 9 years

Not optimistic, it's not zero bandwidth overhead. If a processor with a GPU architecture runs the entire show, there is nothing that needs to be moved, the GPU memory is the main memory. There is no transfer overhead to be talked about in the first place because there is no transfers. This is not hypothetical by the way, AMD's APUs uses HSA (heterogenous system architecture) with unified main memory which allows zero-copying between the CPU and GPU.
Steve about 9 years

I think pipeline overhead is a fundamental detail the higher ranked answers are missing.
Mayo almost 9 years

Great analogy. Will remember this.
j riv almost 6 years

It's generally off topic to talk about VRAM routing tech; if a standard decided to use a GPU instead of a CPU, its memory would not be passing though PCI-E. The same is true for the reverse.
MathCubes almost 5 years

Instruction set too! One is SISD and the other is SIMD which has both their pros and cons.