Why do people use GPUs for high-performance computation instead of a more specialized chip?

24,334

Solution 1

It's really a combination of all your explanations. Cheaper and easier, already exists, and design has shifted away from pure graphics.


A modern GPU can be viewed as primarily stream processors with some additional graphics hardware (and some fixed-function accelerators, e.g. for encoding and decoding video). GPGPU programming these days uses APIs specifically designed for this purpose (OpenCL, Nvidia CUDA, AMD APP).

Over the last decade or two, GPUs have evolved from a fixed-function pipeline (pretty much graphics only) to a programmable pipeline (shaders let you write custom instructions) to more modern APIs like OpenCL that provide direct access to the shader cores without the accompanying graphics pipeline.

The remaining graphics bits are minor. They're such a small part of the cost of the card that it isn't significantly cheaper to leave them out, and you incur the cost of an additional design. So this is usually not done — there is no compute-oriented equivalent of most GPUs — except at the highest tiers, and those are quite expensive.

Normal "gaming" GPUs are very commonly used because economies of scale and relative simplicity make them cheap and easy to get started with. It's a fairly easy path from graphics programming to accelerating other programs with GPGPU. It's also easy to upgrade the hardware as newer and faster products are available, unlike the other options.


Basically, the choices come down to:

  • General-purpose CPU, great for branching and sequential code
  • Normal "gaming" GPU
  • Compute-oriented GPU, e.g. Nvidia Tesla and Radeon Instinct These often do not support graphics output at all, so GPU is a bit of a misnomer. However, they do use similar GPU cores to normal GPUs and OpenCL/CUDA/APP code is more or less directly portable.
  • FPGAs, which use a very different programming model and tends to be very costly. This is where a significant barrier to entry exists. They're also not necessarily faster than a GPU, depending on the workload.
  • ASICs, custom-designed circuits (hardware). This is very very expensive and only becomes worth it with extreme scale (we're talking many thousands of units, at the very least), and where you're sure the program will never need to change. They are rarely feasible in the real world. You'll also have to redesign and test the entire thing every time technology advances - you can't just swap in a new processor like you can with CPUs and GPUs.

Solution 2

My favorite analogy:

  • CPU: A Polymath genius. Can do one or two things at a time but those things can be very complex.
  • GPU: A ton of low skilled workers. Each of them can't do very big problems, but in mass you can get a lot done. To your question, yes there is some graphics overhead but I believe it's marginal.
  • ASIC/FPGA: A company. You can hire a ton of low skilled workers or a couple of geniuses, or a combination of low skilled workers and geniuses.

What you use depends on cost sensitivity, the degree to which a task is parallelizable, and other factors. Because of how the market has played out, GPUs are the best choice for most highly parallel applications and CPUs are the best choice when power and unit cost are the primary concerns.

Directly to your question: why a GPU over an ASIC/FPGA? Generally cost. Even with today's inflated GPU prices, it is still (generally) cheaper to use a GPU than designing an ASIC to meet your needs. As @user912264 points out, there are specific tasks that can be useful for ASICs/FPGAs. If you have a unique task and you will benefit from scale then it can be worth it to design an ASIC/FPGA. In fact, you can design/buy/license FPGA designs specifically for this purpose. This is done to power the pixels in high definition TVs for example.

Solution 3

Your analogy is bad. In the analogy, when you're buying equipment for a large lawn care business, you assume there are good lawn mowers available. This is not the case in the computing world - GPUs are the best tool readily available.

The R&D costs and possible performance gains for a specialized chip are likely too high to justify making one.

That said, I'm aware of Nvidia putting out some GPUs specifically for general purpose computing - they had no video outputs - a bit like selling box fans with the cages already removed.

Solution 4

Of course, you can use specialized chips, either for energy-efficiency or calculation speed. Let me tell you the history of Bitcoin mining:

  • Bitcoin is new, geeks mine with their CPUs.
  • Bitcoin is somewhat new, smart geeks mine with their GPUs.
  • Bitcoin is now (kinda) famous, people buy FPGAs.
  • Bitcoin is now famous (2013), even newbies buy ASICs ("Application Specific Integrated Circuits") to mine efficiently.
  • Block reward drops (periodically), even old ASICs are not profitable anymore.

So no, there are no reasons to use a GPU instead of a specialized "giant calculator". The bigger the economical incentives, the more the hardware gets specialized. However, they are quite hard to design and infeasible to manufacture if you're not producing thousands at once. If it's not viable to design chips, you can buy one of those from the nearest Walmart.

TL;DR Of course you can use more specialized chips.

Solution 5

What you describe in your analogy is exactly what happened. Just as you grabbed your fan and sharpened the blades to try to use it as a mower, a group of researches realized "Hey, we have some pretty nice multi-core processing unit here, lets try to use it for general-purpose computations!".

The result was good and the ball started rolling. The GPU went from a graphics-only device to support general-purpose computation to assist in the most demanding situations.

Because anyway the most computationally demanding operation we expect from computers are graphics. Its enough to take a look at the stunning advances of how games look today as compared to how they did just a few years ago. This means that a lot of effort and money has gone into the development of the GPUs, and the fact that they could also be used to speed up a certain class of general-purpose computation (i.e. extremely parallel) just added to their popularity.

So in conclusion, the first explanation that you offer is the most accurate:

  • Such an alternative would be too expensive to develop when the GPU is already a fine option.

GPUs where already there, they are readily available to everyone and they worked.

Share:
24,334

Related videos on Youtube

Alex S
Author by

Alex S

Updated on September 18, 2022

Comments

  • Alex S
    Alex S over 1 year

    From my understanding, people began using GPUs for general computing because they are an extra source of computing power. And though they are not a fast as a CPU for each operation, they have many cores, so they can be better adapted for parallel processing than a CPU. This makes sense if you already own a computer that happens to have a GPU for graphics processing, but you don't need the graphics, and would like some more computational power. But I also understand that people buy GPUs specifically to add computing power, with no intention to use them to process graphics. To me, this seems similar to the following analogy:

    I need to cut my grass, but my lawn mower is wimpy. So I remove the cage from the box fan I keep in my bedroom and sharpen the blades. I duct tape it to my mower, and I find that it works reasonably well. Years later, I am the purchasing officer for a large lawn-care business. I have a sizable budget to spend on grass-cutting implements. Instead of buying lawn mowers, I buy a bunch of box fans. Again, they work fine, but I have to pay for extra parts (like the cage) that I won't end up using. (for the purposes of this analogy, we must assume that lawn mowers and box fans cost about the same)

    So why is there not a market for a chip or a device that has the processing power of a GPU, but not the graphics overhead? I can think of a few possible explanations. Which of them, if any, is correct?

    • Such an alternative would be too expensive to develop when the GPU is already a fine option (lawn mowers don't exist, why not use this perfectly good box fan?).
    • The fact that 'G' stands for graphics denotes only an intended use, and does not really mean that any effort goes into making the chip better adapted to graphics processing than any other sort of work (lawn mowers and box fans are the same thing when you get right down to it; no modifications are necessary to get one to function like the other).
    • Modern GPUs carry the same name as their ancient predecessors, but these days the high end ones are not designed to specifically process graphics (modern box fans are designed to function mostly as lawn mowers, even if older one weren't).
    • It is easy to translate pretty much any problem into the language of graphics processing (grass can be cut by blowing air over it really fast).

    EDIT:

    My question has been answered, but based on some of the comments and answers, I feel that I should clarify my question. I'm not asking why everyone doesn't buy their own computations. Clearly that would be too expensive most of the time.

    I simply observed that there seems to be a demand for devices that can quickly perform parallel computations. I was wondering why it seems that the optimal such device is the Graphics Processing Unit, as opposed to a device designed for this purpose.

    • Admin
      Admin almost 6 years
      Because they are specialized for this type of thing; it's basically the same type of math. And nVidia has built and sold GPU-only boards for people to do this type of massively parallel number crunching.
    • Admin
      Admin almost 6 years
      Keep in mind that we do have specialised "units" added to chips. AES is done in hardware (I think) on CPUs. AVX is implemented in hardware too. However, where do you stop? The Chipmaker does not know what you need and most people do not have the capabilities (technological or financial) to have their own chips designed for very specific tasks. Graphics cards are - as other said - one type of specialised architecture, which lends itself well to certain tasks. They aren't good for everything - but for certain specific tasks and thus used there.
    • Admin
      Admin almost 6 years
      A more accurate analogy would replace the box fans with 100-meter wide farming combines.
    • Admin
      Admin almost 6 years
      My PC already has a ready to use GPU, designing and producing a dedicated chip would set me back a couple of millions.
    • Admin
      Admin almost 6 years
      Try another analogy. Suppose we have box fans, and we have helicopter rotors. In our hypothetical world, applications for box fans needed progresssively bigger fans running at higher speeds, until we ended up with 20m carbon-fibre-blade box fans, and mass-production made them cheap. Then someone realised that a 20m box fan is essentially just a helicopter rotor with a cage around it. It really is that similar.
    • Admin
      Admin almost 6 years
      As yet another bad analogy (because all analogies are bad), compare the use of GPUs for non-graphics computational tasks to the way hard drive consumers — even massive ones like cloud-storage provider BackBlaze — replenish their stock during production crises by buying over-the-counter external drives at sale prices and "shucking" the cases. It's not that there's no market for internal drives. It's that sometimes, even when the more specialized thing exists, it's cheaper to buy the mass-produced, non-specialized thing and adapt it. backblaze.com/blog/backblaze_drive_farming
    • Admin
      Admin almost 6 years
      I think the primary answer is the first point - it would be too expensive to have something more specialised. Chip fab plants are some of the most advanced and expensive facilities on earth and cost billions of dollars. Interesting article: bloomberg.com/news/articles/2016-06-09/how-intel-makes-a-chi‌​p
    • Admin
      Admin almost 6 years
      @DetlevCM : You are correct. The AES-NI instruction set was added to Intel CPUs around 2010, and has appeared in practically all mainstream CPUs since then. I guess it's fair to say that if you're running on a 5-year-old or newer CPU, there's a high chance your AES is hardware accelerated unless you explicitly turn it off in your application.
    • Admin
      Admin almost 6 years
      It seems you misspelled coprocessor with three uppercase letters ;)
    • Admin
      Admin almost 6 years
      Your business only needs 50 mower blades - go get a quote to have an engineer design your perfectly optimized mower blades and for a metal shop to do a short run of 50 blades for you, then you'll see why everyone doesn't do custom silicon for their specific computations.
    • Admin
      Admin almost 6 years
      There is market for specialized chips, e.g., low-cost Intel Movidius or Intel Nervana (which is AFAIK still in research and not for retail)
    • Admin
      Admin almost 6 years
      Your analogy regarding the box fan and the lawnmower is off. Imagine you started with a box fan for cooling. Then you realized you could also cut grass (badly) with the box fan and then started doing that. Then someone invented a purpose-built grass cutting device, and lots of people started using that to cut grass. Then one day, someone realized if they took the grass cutter and removed a few bits, it makes a kind of fan that cools better than box fans in some situations. Now a bunch of people who want that kind of cooling are buying grass cutters instead of bigger box fans.
    • Admin
      Admin almost 6 years
      I lack the rep to answer, so upvote this please and I'll post a legit answer when I get 10 rep :). I'm a software engineer and use CPU/GPU extensively. I also have EE expertise. "So why is there not a market for a chip or a device that has the processing power of a GPU?" That's called an ASIC and they're expensive to manufacture. An FPGA is essentially an ASIC that can be programmed "in the field," but they're a lot slower - too slow to be competitive with ASICs. Intel has teamed up with Stratix to change that, though (google: AWS EC2 F1). Upvote this comment for a more robust answer :)
    • Admin
      Admin almost 6 years
      Because there are lots of costs in designing a whole new chip. You buy something that already exists you don't have to gamble with such costs.
    • Admin
      Admin almost 6 years
      @Robear Upvotes on comments don't get you any rep.
    • Admin
      Admin almost 6 years
      @Robear Suggest good edits to five different posts that get approved and you'll be able to write answers.
  • Peter Cordes
    Peter Cordes almost 6 years
    High-end GPGPUs have good throughput for 64-bit double-precision, not just single-precision 32-bit float. (Some regular GPUs skimp on HW for double). The major vendors all support IEEE FP math (I think even with denormals). So there's no precision loss unless you want to trade precision for performance, e.g. with 16-bit half-precision FP which has even better throughput on some hardware (and of course half the memory bandwidth). High-performance code on CPUs often uses 32-bit float as well, to get twice as many elements per SIMD vector and half the memory bandwidth.
  • JAB
    JAB almost 6 years
    @PeterCordes I've seen some work in approximate computing that even goes down to eight-bit floating point, though I don't think many GPUs support that in hardware.
  • ratchet freak
    ratchet freak almost 6 years
    ASICs also make sense when the computing literally pays for itself (crypto mining)
  • MSalters
    MSalters almost 6 years
    Actually, FPGA's are often worse than GPU's. The problem is that FPGA's are very flexible; they can implement many various operations. However, computation is generally a form of math, and in fact the bulk is just two operations : addition and multiplication (subtraction and division are variants of the above). GPU's are very, very good at those two operations, much more so than FPGA's.
  • Yakk
    Yakk almost 6 years
    You need to clarify more about FPGA's. The idea that there are a "step up" is a bit misleading. They are more of a step sideways.
  • James_pic
    James_pic almost 6 years
    @ratchetfreak Even in the case of crypto mining, some of Bob's points still apply. They're viable for Bitcoin, because the program is unlikely to change (the Bitcoin community is committed to SHA256). This isn't true of all currencies - for example Ethereum plans to change in the near future, and the Monero community recently demonstrated their commitment to changing the hash if ASIC miners were developed.
  • jamesqf
    jamesqf almost 6 years
    I have to disagree about "the most computationally demanding operation" being graphics, depending of course on exactly who "we" is. For general users, yes, but in the science & engineering community, there are many things more demanding than graphics. After all, acceptable graphics (as for games) can be done with a single mid-range PC and GPU combo. Significant problems often combine hundreds or thousands of such units to get performance in the petaflop range - and then problems still may take days or weeks of compute time.
  • mbrig
    mbrig almost 6 years
    As an example of the last one, Google has their own "Tensor processing units" for machine learning. To what degree they're customized is unclear, but are described as being ASICs.
  • Mark
    Mark almost 6 years
    The most computationally demanding operation I expect from my computer is technically graphics, but structure-from-motion computations are not what most people (or GPU designers) think of when they hear the word "graphics".
  • manav m-n
    manav m-n almost 6 years
    By "accelerators" are you referring to custom made hardware or super clusters of low power computing nodes? Can you elaborate by providing reference to some example accelerator hardware.
  • Raimund Krämer
    Raimund Krämer almost 6 years
    Comments are not for answering anyway, and this seems like a reasonable answer to me.
  • NerdPirate
    NerdPirate almost 6 years
    Sorry, I thought I made that clear from context. Accelerator is just an umbrella term for a coprocessor or offload card. Floating point was originally in a coprocessor and not the main CPU, and it would have been considered an accelerator. GPUs, DSPs, the Xeon Phi, FPGAs when they’re on a PCIe card or something similar, the analog differential equation thing I mentioned, there are devices that aid in virtualization, there is current research in neural network accelerators. Those are all examples of accelerators.
  • Dan Is Fiddling By Firelight
    Dan Is Fiddling By Firelight almost 6 years
    While compute cores, memory controllers, and internal communications features are a large majority of the transistor count in high end cards. Fixed function video encode/decode blocks for the newest codecs are a major fraction of the total size for low end models. I can't find an example now, but when NVidia's Pascal came out I remember people using stated core and transistor counts to try and estimate the size of fixed function hardware concluding that the GPU cores made up barely half the area of the GP107 die (1050/1050 Ti); and speculating if a smaller model would be worth the work needed
  • doneal24
    doneal24 almost 6 years
    @BobtheMogicMoose But it might be orders of magnitude faster to use a custom FPGA designed for genomic analysis than to have the equivalent code in a GPU. When you're paying scientists to sit around waiting for the results, the faster FPGA pays for itself very quickly.
  • wilcroft
    wilcroft almost 6 years
    @MSalters One of the main selling points of FPGAs over GPUs is performance/Watt, which is getting more important as data centres start to hit the power wall (FPGAs are generally more power efficient). As far as math, FPGAs are comparable to GPUs in fixed-point and integer arithmetic, and only lag in floating-point math.
  • Agent_L
    Agent_L almost 6 years
    "Of course you can use more specialized chips" - but there are specialized chips for bitcoin ( SHA-256), then for litecoin(scrypt) and that is pretty much it. High-performance computing hardware for other problems doesn't exist. (That is, with performance higher than current high-end GPUs)
  • MSalters
    MSalters almost 6 years
    @wilcroft: GPU's will beat FPGA's on performance/watt for their typical workloads. No wonder, CPU's are essentially ASIC's - Application-Specific IC's where the Application is graphics display. And as it happens, that involves a lot of matrix multiplications.
  • wilcroft
    wilcroft almost 6 years
    @MSalters citation? For General Purpose Compute (which was the question asked), all the research I've seen shows FPGAs have a significant edge on power. (Sources: ieeexplore.ieee.org/document/5325422, ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7032542, bertendsp.com/pdf/whitepaper/…, ann.ece.ufl.edu/courses/eel6686_15spr/papers/paper1a.pdf, etc.)
  • Jon K
    Jon K almost 6 years
    FPGAs are getting a lot more accessible to the common developer too - Microsoft for instance has a cloud AI solution using FPGAs (Project BrainWave). AWS has some offerings as well. Anyone can rent out some custom FPGAs for specialized tasks without having to build it themselves, not feasible for many use cases even a few years ago.
  • Michael
    Michael almost 6 years
    My understanding is that the major strength of an FPGA is the fact that is can be easily reprogrammed. As such, it doesn't have quite the speed of an ASIC (of which CPUs and GPUs are a types of) but also has a much lower cost (due to lack of tooling costs) for custom functions which it can run faster than a pure software solution.
  • BobtheMagicMoose
    BobtheMagicMoose almost 6 years
    Yeah, I think there are even FPGA hobby kits that are comparable to an arduino raspberry-pi. I still think programming FPGAs is far more costly that more developed architectures.
  • Peter Cordes
    Peter Cordes almost 6 years
    Many of these extensions made sense at the time (like MMX), but are largely just dead-weight in the processor now. 3D rendering is far from the only use-case for SIMD. Most of the "weight" of MMX is the execution units, and those can be shared with wider vector like SSE2, AVX2, and AVX512. Those are heavily used for high-quality video-encoding on CPUs, and many many other tasks, including high-performance computing. But also library implementations of memchr, strlen, and lots of other stuff. e.g. filtering an array more than 1 element at a time