this post was submitted on 17 Jun 2023
8 points (90.0% liked)
Free Open-Source Artificial Intelligence
2900 readers
1 users here now
Welcome to Free Open-Source Artificial Intelligence!
We are a community dedicated to forwarding the availability and access to:
Free Open Source Artificial Intelligence (F.O.S.A.I.)
More AI Communities
LLM Leaderboards
Developer Resources
GitHub Projects
FOSAI Time Capsule
- The Internet is Healing
- General Resources
- FOSAI Welcome Message
- FOSAI Crash Course
- FOSAI Nexus Resource Hub
- FOSAI LLM Guide
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'm actually playing around with EXLlama, IIRC it works with pretty much every model, and it can be a real game changer specially for long conversations, code, or stories.
Unfortunately there is still the unavoidable problem of the context length burning VRAM like no tomorrow. You either get a decent AI with the attention span of a gold fish or an idiot AI which can remember 3 times as much stuff as before.
Handy, progress, but ultimately there is still ground to cover.
I keep hearing about this EXLama! I really got to try it. Glad to hear it's going well for you.
I think it's only a matter of time until context length is no longer an issue. I'm curious to see how RWKV develops, its infinite context length is interesting.
I hope they make some major breakthroughs, I like the idea of a super massive RNN, but a transformer with infinite context length could be a game changer for both architectures.
It would be absolutely awesome, with infinite context length that would mean a much greater ease when it comes to handling models. I can be lazy and instead of creating a LORA, just use an entire book's style as a reference right there in the prompt.
For programmers, just dump the entire codebase, or Documentation.
Of course, all this is only possible if VRAM is less of a bottleneck than it currently is, as well as the fact that it can reliably reference information on an arbitrarily large context. (Not much use having huge context if performance degrades, it loses its marbles or forgets key pieces of information along the way)
I'm with you there. I love how Mosaic just fed the entire Great Gatsby to StoryWriter. This is the sort of context length I need in my life. Would make my projects so much easier. I don't think we're too far from having it on consumer hardware.
You should check out my latest post - which ironically addresses parts of your first comment, but you still need a lot of VRAM... 6000+ tokens context is now possible with ExLlama.
It's crazy to see how fast these developments are happening!