What are the advantage of using indirect rendering in OpenGL?

opengl opengl-es shader

13,695

Solution 1

The performance gain is often not so much due to passing some small variable like "count" or "instance count", but due to knowing these. In order to know these values, you must do a round trip to the CPU, which is only possible after the result is available, i.e. after a server sync (plus it adds the latency of the bus).

Say you are using transform feedback with a geometry shader. This means no matter what you feed in, you don't really know what comes out on the other end, not before the batch has finished and you've queried the counts, anyway.
Indirect rendering addresses this, you don't need to know and actually you don't want to know. The information goes into a buffer object, and the GPU can access it without your intervention.

That's analogous to conditional rendering. Actually you could skip the whole thing of conditional rendering, couldn't you. Instead of submitting commands to the command queue that will maybe not get executed (how inefficient!), you could run your occlusion query and see whether it passes or not, and then decide whether to submit those objects that you want to draw.
Except this means you must wait until the query (and thus the previous batch) is finished, sync, and do a PCIe transfer before making this decision. During this time, the GPU likely stalls, and then you've still not set up the right buffers/textures and submitted commands. In reality, it is therefore much more efficient to speculatively submit commands and let the driver/GPU decide whether to discard them or whether to draw them.

That's also the idea behind ARB_query_buffer_object, which lets you read a query result into a buffer object.

EDIT:
Also, indirect rendering allows for much more efficient submission of render command batches (especially in combination with persistent mappings) which may avoid much or all of the server/client and CPU/GPU synchronizations normally present and may come from another processor core and saves the per-drawcall fixed overhead. See pages 62 onward in Cass Everitt's talk.

Solution 2

In direct rendering the CPU is occupied with preparing and streaming the index data out of its own memory, over a bus with limited bandwidth to the GPU. It must check for GPU state and synchronize with it. Each of those steps is time consuming.

Using indirect rendering all the CPU does is sending one short command, that kicks off a large batch of drawing operations. This saves bus bandwidth. And because the GPU will do work for a longer time span, there's less interruptions that force the CPU to stop whatever it is doing right now (context switch), which means, that complex numeric tasks, like physics simulations will execute more performant.

13,695

viktorzeid

Lets learn OpenGL!

Updated on October 15, 2022

Comments

viktorzeid over 1 year

I read that the APIs like glDrawElementsIndirect, glDrawArraysIndirect help us in indirect rendering. Indirect rendering is different from direct in the way that the rendering parameters like "number of vertex attributes", "number of instances to draw", "starting vertex attribute from buffer object" etc are provided in a buffer object by the GPU itself rather than being provided by the CPU in the draw call.

I understood that. It also explained that the advantage is that it gets rendered faster because there is no CPU interaction involved. But wait, wasn't it the CPU that actually made the render call? It still specified the rendering mode (GL_TRIANGLES etc). It also possibly loaded the vertex attributes.

So is all the perf gain in indirect rendering being accounted for by just not having to pass these tiny variables : "count", "primitive count", "first vertex attribute", "instance count" ? This doesn't make much sense to me. (It is not changing any state either)
Bartek Banachewicz over 10 years

My guess would be that reducing number of CPU-GPU sync points is much more important than command transfer speed.
datenwolf over 10 years

@BartekBanachewicz: That's indeed the case. Each sync point causes and interrupt, which forces the CPU into a (expensive) context switch.
rickster over 10 years

Good answer. Adding to it to reflect the [opengl-es] tag on the question: there's no draw indirect in OpenGL ES up through 3.0, so for now at least you can only do this on the desktop. (There's no geometry shaders on ES either, so at least you know how many vertices come out of your transform feedback.)