this post was submitted on 30 Sep 2023
8 points (90.0% liked)

Free Open-Source Artificial Intelligence

2975 readers
16 users here now

Welcome to Free Open-Source Artificial Intelligence!

We are a community dedicated to forwarding the availability and access to:

Free Open Source Artificial Intelligence (F.O.S.A.I.)

More AI Communities

LLM Leaderboards

Developer Resources

GitHub Projects

FOSAI Time Capsule

founded 2 years ago
MODERATORS
 

So maybe you've seen something like this:

Well here are some screenshots from a YT video I found interesting on the subject (video linked at the end). In a nutshell, AVX 512 is an instruction set architecture designed for 512 bit wide instructions. This rabbit hole will maybe help you understand why CPUs are so much slower than a GPU, or at least, if you already know the basics about computing architecture, this will show where the real bottleneck is located.

image image image image image

https://www.youtube.com/watch?v=bskEGP0r3hE

I'm really curious how well various AVX 512 architectures improve CPU performance. Obviously it was worth implementing a lot of these instructions into llama.cpp, so someone felt it was important. In my experience there is no replacement for running a large model. I use a Llama2 70B daily. I can't add more system memory beyond the 64GB I have. I need to look into the potential to use a swap partition for even larger models, but I haven't tried this yet. Looking at accessible hardware to use with AI, the lowest cost path to even larger models seems to be a second hand server/workstation with 256-512GB of system memory, as many cores as possible, and whatever the best implementation of AVX512 accessible for a good price then add a 24GBV consumer GPU to this. That could still be less than $3K and on paper it might run a 180B model, and still be cheaper than just a single enterprise 48GBV GPU. Maybe someone here has actual experience with this and how various chipsets handle the load in practice. It is just a curiosity I've been thinking about.

top 2 comments
sorted by: hot top controversial new old
[โ€“] Blaed 3 points 1 year ago* (last edited 1 year ago) (1 children)

I have come to believe Moore's law is finite, and we're starting to see the exponential end of it. This leads me to believe (or want to believe) there are other looming breakthroughs for compute, optimization, and/or hardware on the horizon. That, or crazy powerful GPUs are about to be a common household investment.

I keep thinking about what George Hotz is doing in regards to this. He explained on his podcast with Lex Fridman that there is much to be explored in optimization, both with quantization of software and acceleration of hardware.

His idea of 'commoditize the petabyte' is really cool. I think it's worth bringing up here, especially given the fact it appears one of his biggest goals right now is solving the at-home compute problem. But in a way that you could actually run something like a 180B model in-house no problem.

George Hotz' tinybox

($15,000)

  • 738 FP16 TFLOPS
  • 144 GB GPU RAM
  • 5.76 TB/s RAM bandwidth
  • 30 GB/s model load bandwidth (big llama loads in around 4 seconds)
  • AMD EPYC CPU
  • 1600W (one 120V outlet)
  • Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)

You can pre-order one now. You have $15k laying around, right? Lol.

It's definitely not easy (or cheap) now, but I think it's going to get significantly easier to build and deploy large models for all kinds of personal use cases in our near and distant futures.

If you're serving/hosting models, it's also worth checking out vLLM if you haven't already: https://github.com/vllm-project/vllm

[โ€“] j4k3 3 points 1 year ago

Hardware moves notoriously slow, so I imagine we still have several years before a good solution manifests in the market.

Somebody needs to build a good Asimov character roleplay and coax the secret for the positronic brain out of him. I'd like to buy the new AMD R-Daneel Olivaw 5000 please. Hell, I'll settle for a RB-34 Herbie model right now.