LocalLLaMA

2237 readers

11 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

[email protected]

How do I get LLaMA going on a GPU? (sh.itjust.works)

submitted 1 year ago by [email protected] to c/[email protected]

2 comments fedilink hide all child comments

Everyone is so thrilled with llama.cpp, but I want to do GPU accelerated text generation and interactive writing. What's the state of the art here? Will KoboldAI now download LLaMA for me?

top 2 comments

sorted by: hot top controversial new old

[–] [email protected] 2 points 1 year ago

there's a bit more setup involved but I would look into https://github.com/oobabooga/text-generation-webui

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

Hi, I'm happy to see you are willing to give llama a try! If you want to do GPU-Accelerated processing, it depends on your OS and Hardware what you are able to do. If you have a Nvidia card, you will be able to use cuBLAS, instructions here: https://github.com/ggerganov/llama.cpp#cublas . I don't have experience with other cards, but I'll try to help if issues arise!

Also, for more ease-of-use try text-generation-webui (https://github.com/oobabooga/text-generation-webui). Well, ease-of-use, until you can want to use GPU acceleration, because you'll need to look at https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration if you want to do that with LLaMA.

33B and 65B models seem to be the best for storytelling and writing.