Not directly no.
It may be able to code one (the code is relatively short and well known) and give training program, and then you would need to spend a few trillion tokens to make it generate data.
Not directly no.
It may be able to code one (the code is relatively short and well known) and give training program, and then you would need to spend a few trillion tokens to make it generate data.
Where can we see this well known code? I'd like to see how it works.
Here is an implementation in pytorch:
https://github.com/lyeoni/gpt-pytorch/blob/master/model.py
Here is one in pure C that karpathy started:
Thanks!
You can generate synthetic data matching the distribution your transformer learned. You can use this dataset to train another model. As of now, that's about it.
Yep, this is called model distillation.
Don't give them any ideas.. 😂
ok... LOL