
93 readers
5 users here now

Programming Lemmy instance focused on GPUs. CUDA, OpenCL, ROCm, DirectX, Vulkan are all on subject here.

founded 2 years ago

Currently getting Hacker News hug of death right now, but hopefully in a few days the traffic subsides. From what I could load, it looks like a good article. does have a mirror.

Learning DirectX 12 (
submitted 9 months ago by dragontamer to c/gpu_programming

Sorry for the spam, just did a bit of a research / search and decided I needed to save at least these DirectX12 links and read up on them later.


Just searching for stuff, and this came up. I figure it was worth saving here.

HLSL Shader Model 6.8 (
submitted 9 months ago by dragontamer to c/gpu_programming

Looks like the update's big feature is "Work Graphs".


Someone wanted a more portable printf implementation, so they created one for themselves.

I'll be giving this article a good look over for sure.


Saving this .pdf here.

The relational join operator is a very memory-intensive and even computationally-intensive operation. Though real-life databases can be in the TB range, there are a number of applications of smaller, memory-only databases that could feasibly fit in the 4GB or 8GB of smaller GPUs.

Its a well known fact that relational-joins (and joins-of-joins) can be parallelized. Database programmers meticulously perform planning-algorithms to optimize this important operation and parallelize it across cores or even systems. Seeing research into a natural GPU application warms my heart at least!

GPUs are well known to parallelize and improve upon sorting algorithms (see embarrassingly parallel solutions like Bitonic Sort... but also GPU-specific / SIMD-designed sorting algorithms like MergePath). One of the most common ways to perform a relational join is to sort both sets of data on the relational-join, and then linearly scan through both relations matching up (left.blah == right.blah). This paper seems to take this approach and measures how good GPUs are at this. (At least, for data that does fit in the GPU RAM).

There's also "Hash-Join", which is investigated in this paper as well.


The abstract stuck out to me, and I like dabbling in the 3SAT stuff on a hobby level.

The gist is that these researchers have utilized the TensorCores / FP16 Matrix Multiplication routines found in neural-network chips/instructions to start searching for MaxSAT (which seems to be related to 3SAT somehow, I'll be reading more about this...)

The Book of Shaders (
submitted 1 year ago by dragontamer to c/gpu_programming

Found this reference online, figured I'd save it here. Looks like an excellent introduction to fragment/pixel shaders.


Seems like a personal project that is basically a personal-version of the Intel "ispc" tool. Still, 2nd or 3rd programming languages of this nature isn't a bad thing, if anything, we need more ideas and more implementations to figure out how best to map GPU-like programming to AVX512 or other CPU SIMD languages.


A solid example of how to perform performance analysis on modern GPUs and video games.

DirectX programmers (and GPU programmers of all kinds) probably should use this article as a template for thinking about GPU-performance in relationship to a greater task.

You search to find the "slowest shader", you analyze it and its parallelism to see if its adequate. You hone in and perform deeper analysis. I guess a programmer goes one step further and needs to think of a solution / improvement though (rather than "just" benchmarking).

But this style of analysis is very useful and helpful in the optimization process.

WebGL shader examples (
submitted 1 year ago by dragontamer to c/gpu_programming

StreamHPC here goes over some concepts to solve N-Queens on a GPU.

N-Queens is a classic "homework problem" for traditional AI courses (back when search algorithms were considered AI at least). As usual, the GPU has a couple of changes if you want to run as fast as possible.

view more: next ›