this post was submitted on 03 Nov 2024
273 points (98.9% liked)
Technology
59675 readers
4666 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yes, there was the Xeon Phi, Knights Landing, with up to 72 cores, and 4 threads per core!
The Knights Landing was put into production though, but it was more a compute unit than a GPU.
I'm not aware they tried to sell it as a GPU too? Although If I recall correctly they made some real time ray tracing demos.
So, trying not to dox myself, I worked with the architect twice.
Knights Ferry was derived directly from Larrabee (GPU), P54Cs with pre-AVX-512, .
KNC was a die shrink with more cores. Both of these were PCIe accelerators only.
KNL had full Airmont Atom cores with smt4, basically meaningful cores with proper AVX-512. Also you could boot them with linux, or as a PCIe accelerator.
KNM jadded ML instructions, basically 8/16bit float and faster SIMD.
They cancelled KNH.
I interviewed some of the actual Larrabee guys, they were wild, there was a lot of talk about dynamic translation, they were trying to do really complex things, but when people talk like that it makes me think they were floundering on the software and just looking for tech magic solutions to fundamental problems.
Intel always dies when the software gets more complex than really simple drivers, it's their achilles heel.
KNL also had the whole MCDRAM on package for basically HBM bandwidth, but that didn't actually work very well in practice, again due to software issues (you have to pick where you allocate, and using it as an l4 cache was not always effective).