this post was submitted on 23 Jul 2023
3 points (80.0% liked)

Code Monkeys

8 readers
1 users here now

Code Monkeys

A place for people to post code and other techy stuff

founded 1 year ago
MODERATORS
 

Some helpful links regarding training your own LLM since I've been playing with:

https://github.com/geekylink/PicoGPT

Hacker News post: https://news.ycombinator.com/item?id=36832572

LLAMA: https://github.com/ggerganov/llama.cpp

Dead simple LLAMA: https://cocktailpeanut.github.io/dalai/#/

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 year ago

Training data:

OpenWebText2 is an enhanced version of the original OpenWebTextCorpus covering all Reddit submissions from 2005 up until April 2020, with further months becoming available after the corresponding PushShift dump files are released.

https://openwebtext2.readthedocs.io/en/latest/