LocalLLaMA

2419 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago

MODERATORS

[email protected]

How much gpu do i need to run a 90b model (lemm.ee)

submitted 1 week ago by [email protected] to c/[email protected]

16 comments fedilink hide all child comments

Do i need industry grade gpu's or can i scrape by getring decent tps with a consumer level gpu.

you are viewing a single comment's thread
view the rest of the comments

[–] tpWinthropeIII 4 points 1 week ago (3 children)

The new $3000 NVidia Digit has 128 GB of fast RAM in an Apple-M4-like unified-memory configuration, reportedly. NVidia claims it is twice as fast as an apple stack at least at inference. Four of these stacked can run a 405B model, again according to NVidia.

In my case I want the graphics power of an GPU and VRAM for other purposes as well. So I'd rather buy a graphics card. But regarding a 90B model, I do wonder if it is possible with two A6000 at 64 GB and a 3 bit quant.

[–] [email protected] 1 points 1 week ago (2 children)

Huh so basicly sidestepping the gpu issue entirly and essentially just using some other special piece of silicon with fast (but conventional ram). I still dont understand why u cant distribute a large llm over many different processors each holding a section of the parameters in memory.

[–] breakingcups 2 points 1 week ago

I still dont understand why u cant distribute a large llm over many different processors each holding a section of the parameters in memory.

Because each weight in a layer influences each weight in the next layer, which means the bandwidth requirements are enormous and regular networking solutions are insufficient for that.

load more comments (1 replies)