this post was submitted on 15 Jun 2023
3 points (100.0% liked)

GPU_programming

93 readers
5 users here now

Programming Lemmy instance focused on GPUs. CUDA, OpenCL, ROCm, DirectX, Vulkan are all on subject here.

founded 1 year ago
MODERATORS
 

A more recent GPU paper that deep-dives into NVidia Volta through microbenchmarks. This penetrates deeply into black-box magic, showing off and reverse-engineering NVidia's SASS assembly of the machine itself (below the PTX-assembly layer the typical CUDA programmer would be familiar with).

It is clear that the NVidia Volta GPU architecture includes memory barriers figured out by the PTX->SASS compiler, and I feel like I got a sense of what the instruction-scheduler of Volta (the chip itself) was doing.

Though this is an advanced read, it is one of the few online resources I'm aware of that goes into the nitty-gritty of how NVidia's popular GPUs actually work and execute code. Maybe not a "must read" for most GPU-programmers, but a useful read if you're trying to grab every ounce of performance from an NVidia GPU.

top 2 comments
sorted by: hot top controversial new old
[โ€“] qwerty 2 points 1 year ago (1 children)

This (along with the subsequent Turing paper) is an excellent resource!

Regarding reverse-engineering of SASS, a (possibly?) complete instruction "reference" (from Pascal to Hopper) can be found using methods here. Using quotes here because there is enough info to disassemble/assemble the code section in cubin binaries - since we have a detailed desc of inst encoding, operand types, modifiers etc.. However, instruction behavior, misc requirements, side-effects, etc.. are not documented.

[โ€“] AlmightySnoo 2 points 1 year ago

subsequent Turing paper

linking it here for anyone interested: https://arxiv.org/pdf/1903.07486.pdf