this post was submitted on 10 Jul 2023
71 points (97.3% liked)
Programming
17800 readers
201 users here now
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities [email protected]
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
If you want your code to run on the GPU, the complete viability of your code depend on it. But if you just want to run it on the CPU, it is only one of the many micro-optimization techniques you can do to take a few nanoseconds from an inner loop.
The thing to keep in mind is that there is no such thing as "average developer". Computing is way too diverse for it.
And the branchless version may end up being slower on the CPU, because the compiler does a better job optimizing the branching version.
Because of the performance improvements from vectorization, and the fact that GPUs are particularly well suited to that? Or are GPUs particularly bad at branches.
How often do a few nanoseconds in the inner loop matter?
Looking at all the software out there, the vast majority of it is games, apps, and websites. Applications where performance is critical, such as control systems, operating systems, databases, numerical analysis, etc, are relatively rare compared to apps/etc. So statistically speaking the majority of developers must be working on the latter (which is what I mean by an "average developer"). In my experience working on apps there are exceedingly few times where micro-optimizations matter (as in things like assembly and/or branchless programming as opposed to macro-optimizations such as avoiding unnecessary looping/nesting/etc).
Edit: I can imagine it might matter a lot more for games, such as in shaders or physics calculations. I've never worked on a game so my knowledge of that kind of work is rather lacking.
Yes. GPUs don't have per-core branching, they have dozens of cores running the same instructions. So if some cores should run the if branch and some run the else branch, all cores in the group will execute both branches, and mask out the one they shouldn't have run. I also think they they don't have the advanced branch prediction CPUs have.
https://en.wikipedia.org/wiki/Single_instruction,_multiple_threads
Makes sense. The most programming I've ever done for a GPU was a few simple shaders for a toy project.
It doesn't matter until you need it. And when you need it, it's the difference between life and death
Fintech. Stock exchanges will go to extreme lengths to appease their wolves of Wallstreet.
Also if you branch on a GPU, the compiler has to reserve enough registers to walk through both branches (handwavey), which means lower occupancy.
Often you have no choice, or removing the branch leaves you with just as much code so it's irrelevant. But sometimes it matters. If you know that a particular draw call will always use one side of the branch but not the other, a typical optimization is to compile a separate version of the shader that removes the unused branch and saves on registers
Dunno how true that is, but I've heard that branches cannot be parallelized. While one side of the branch is being done, the cores which haven't branched just remain idle until it completes.
Yes GPUs are bad at branching. But my ray tracer that is made of 90% branches still runs faster on the GPU than the CPU.
In general you are still correct.