pondělí, 13. června 2011

Particle systems optimizations

Particle systems optimizations

I've recently met with few problems while integrating particle systems to more kinds of renderers, so here are few notes and tricky solutions on things I've met. (Consider this article as some uncontrolled brain dump for particles, this thing got up based upon some discussions with other developers and it is more less just a bit of text about our thoughts on particle systems, by the way all these optimisations are running in praxis and they actually work - although it depends...).

Particle systems integration is same for rasterizers and same for ray tracers. Although we won't see that much impact on performance in rasterizers as we see in raytracers, it is due they're better fit for particle systems.

As I said problems are similar and I'll try to show few of them and explain how to solve or at least reduce them.

Fillrate problem

The biggest problem of particles is their rendering, whole particle system can gain a lot or lose a lot in this phase. The most important is only thing - how many times will every pixel be redrawn, until we get resulting color of this pixel.

The worst particles are additive ones (alpha blended can be quite easily optimised with early out - when we reach accumulated alpha of 1.0), lets look at simple fire like this:
Fire computed using ray tracing algorithm and additive "blending" of particles

If we would just render these particles right in additive blending mode (of cours back-to-front so everything will be correct, actually we can also use front-to-back ... but lets leave it, as we don't think just about rasterizers), or if we used additive ray tracing (after hitting the particle, we spawn a new ray at this position in same direction that we trace again, adding its result to the value), it'd result in very high fillrate (lots and lots of overdraw, and thats not good - the result can be very slow), lets see how much fillrate is used on different regions of viewport:
Fillrate - The brighter is higher

We can define maximum value of particles that can be added together (F.e. 16 additive operations), it can lower fillrate a bit. It can be applied on rasterizers with texkill or alpha test), it gives some speedup, but this technique can't be used with additive blending, only with alpha blending. With raytracers there seems to be higher gain (one can terminate tracing rays on pixel even with additive blending, and we gain more).
Fillrate - defining max value of possible additions (to 16) 

What helps us most with fillrate is so called Particle trimming. Instead of quad we choose much more suitable shape for particle. We can even stay with four vertices shapes, although it is much better to use F.e. 7-vertices shape. Vertex processing is less hit for performance, than huge overdraw of single pixel.
Particle trimming - green represents shape of particle 

We will gain most fps with particle trimming, also we can gain even more with compositing more particles into single (LODing for particles).

Bandwidth problems


The second problem, can even overgrow over fillrate problem. Huge CPU generated particle system with 1M particles and 64-byte per particle can have 64MiB stream for every frame, that's 1.92GiB to go every second to card!

If we generate geometry on geometry shader during the run, we ca achieve 2/3 speedup in bandwidth problems. We can also compute whole particle system on GPU (gives us even better performance, although one must not have too busy GPU).

For ray tracing real-time geometry generation (like in geometry shader) has its sense, although we won't save that much as in case of rasterizers and GPUs. We have to rebuild whole scene hierarchy and that takes time. If we don't generate particle geometry during rendering we can die on CPU<->RAM latency, otherwise we will die on the same thing, but we will probably use cache more.

Although we can accelerate the computation by parallelization on mutliple cores, or using SIMD extensions, be it AVX or SSE.

State changes problem

State changes are problem for both rasterizers and ray tracers. I strongly recommend using texture atlases as one can gain a lot of speed with them (and you won't need to bind texture for every particle system. On the other hand, there are situations where texture atlases are impossible to do or absolutely useless.

Matter of time...

The last problem is when to render particles? It is simple with ray tracers ... insert them before updating of hierarchy, be it KD-tree or BVH (or any other favourite of yours). And renderer takes care of them. Although it is not that simple with rasterizers.

For forward rendering it is best to render front-to-back (with other models), but thats not possible, it is needed to render them after all models and mostly back to front. And that rewrites Z-buffer ... slow, slow ... :( There is probably no way with additive particles, for alpha blended there is.

For deferred rendering it is best to use separated pass for particles after generation of G-Buffer, and after light computation (lighting phase) mix them into the buffer. And thus we discard all deferred advantages for particles, but we get them. Second possibility is MSAA G-Buffer and stippling. Another solution is depth-peeling, but well we're getting too far from realtime performance now.

The most common is to use separated pass, even half sized, we can use this mainly for consoles, as the fillrate is bigger problem there than on PCs.

So thats all?

Not at all! But this is just short article on few of the problems. If you wish you can implement some of the things, here are summed tips:
- use particle trimming, it is simple and you will gain a lot
- use geometry shaders on newer cards, R2VB on older if possible
- use texture atlases if possible and if suitable for renderer
- use less larger particles rather than more smaller (if you're limited when computing particles) otherwise NOT and NEVER (if you're limited on fillrate)
- don't compute with particle systems that you can't see (some variation of frustum or occlusion culling is good), keep just time variable for them so they won't "jump"

Well that were few points for particles. If you wish them to implement feel free and do it. You'll probably gain one or two fps on most GPUs and that counts!

Note: I hope you liked the article, feel free to comment. It was lying here on my disk for a while so I released it to public (also there is czech version on www.ceske-hry.cz ).

0 komentářů:

Přidat komentář