brucethemoose

joined 9 months ago
[–] brucethemoose 4 points 2 hours ago* (last edited 2 hours ago)

The mega rich that run them are the fundamental issue.

As always.

[–] brucethemoose 3 points 1 day ago* (last edited 1 day ago)

This is true. Some elements of JC Avatar were super interesting and detailed. Like the spaceship, that was *ridiculously well-thought-out for such a brief appearance, and no telling how much time was spent on the fauna.

But... they made the overarching story and characters so unremarkable.

I felt something similar watching Black Panther: Wakanda Forever. Talokan (the underwater city) was breathtaking and incredible, no telling how much labor was put into it... only for that gorgeous setting to be used for a brief swim through and never seen again.

[–] brucethemoose 1 points 1 day ago* (last edited 1 day ago) (1 children)

This is true, I remember it being lauded in theaters.

I mean, I guess not everyone is media savvy and the story could have felt "new" to some, but there wasn't a character that stood out or anything, no cast stealing the show...

[–] brucethemoose 28 points 1 day ago* (last edited 1 day ago)

Sadly, it will not really register with most voters, not just because they're largly ignoring political news:

https://www.axios.com/2024/12/26/politics-cable-news-ratings-trump-ap-poll

But because everyone is stuck in their own little filter bubble of podcasts, algorithmic feeds, influencers and such that color everything.

[–] brucethemoose 11 points 1 day ago (1 children)

Indeed.

"Why are these candidates terrible!?" everyone cries, after basically ignoring primaries and other party races.

[–] brucethemoose 108 points 1 day ago* (last edited 1 day ago) (25 children)

Its actually very worrying.

He's already killed a budget bill and defied a MAGA bread and butter policy issue, for his obvious benefit, back-to-back with zero real pushback. His bounds seem basically unlimited.

[–] brucethemoose 49 points 1 day ago* (last edited 1 day ago) (1 children)

Followers aren't abandoning Trump over one issue. TBH Elon will probably tamp this down by manipulating Twitter.

But this is a preview of things to come, as Axios notes.

[–] brucethemoose 30 points 1 day ago (1 children)

Oh, so technically this is a repost of https://lemmy.world/post/23634989

The URL has already been posted, but Axios ninja edited the article as events unfolded, as they tend to do. I'm treating this as a new development, but mods, please delete if you think that's prudent.

367
submitted 1 day ago* (last edited 1 day ago) by brucethemoose to c/politics
 

Reality check: Trump pledged to end the program in 2016.

Called it. When push comes to shove, Trump is always going to side with the ultra-rich.

[–] brucethemoose 2 points 1 day ago (1 children)

Yep. I didn't mean to process shame you or anything, just trying to point out obscure but potentially useful projects most don't know about :P

[–] brucethemoose 1 points 1 day ago* (last edited 1 day ago)

https://en.m.wikipedia.org/wiki/External_memory_algorithm

Unfortunately that's not really relevant to LLMs beyond inserting things into the text you feed them. For every single word they predict, they make a pass through the multi-gigabyte weights. Its largely memory bound, and not integrated with any kind of sane external memory algorithm.

There are some techniques that muddy this a bit, like MoE and dynamic lora loading, but the principle is the same.

[–] brucethemoose 2 points 1 day ago (3 children)

A1111

Eh, this is a problem because the "engine" is messy and unoptimized. You could at least try to switch to the "reforged" version, which might preserve extension compatibility and let you run features like torch.compile.

[–] brucethemoose 1 points 1 day ago (5 children)

Oh you should be able to batch the heck out of that on a 4080. Are you not using HF diffusers or something?

I'd check out stable-fast if you haven't already:

https://github.com/chengzeyi/stable-fast

VoltaML is also old at this point, but it has really fast AITemplate implementation for SD 1.5: https://github.com/VoltaML/voltaML-fast-stable-diffusion

 

Trump, who has remained silent thus far on the schism, faces a quickly deepening conflict between his richest and most powerful advisors on one hand, and the people who swept him to office on the other.

All this is stupid. But I know one thing:

Trump is a billionaire.

And I predict his followers are going to learn who he’ll side with when push comes to shove.

Also, Bannon’s take is interesting:

Bannon tells Axios he helped kick off the debate with a now-viral Gettr post earlier this month calling out a lack of support for the Black and Hispanic communities in Big Tech.

 

I think the title explains it all… Even right wing influencers can have their faces eaten. And Twitter views are literally their livelihood.

Trump's conspiracy-minded ally Laura Loomer, New York Young Republican Club president Gavin Wax and InfoWars host Owen Shroyer all said their verification badges disappeared after they criticized Musk's support for H1B visas, railed against Indian culture and attacked Ramaswamy, Musk's DOGE co-chair.

 

I have no idea if anyone on Lemmy is into Avatar lore/fanfiction, but in the spirit of posting content here instead of Reddit... here goes.


A new Avatar series featuring 'twin' Avatars has been leaked, in case you missed it:

https://knightedgemedia.com/2024/12/avatar-seven-havens-twin-earth-avatar-series-will-initially-be-26-episodes-long/

https://lemmy.world/post/23427458

In a nutshell, its allegedly set in a cataclysmic world overrun by spirit vines, and two twins are the 'Avatars' with diametric personalities. Not much is known beyond that, but I've been brainstorming some post-LoK ideas forever.

And now I kinda feel like writing them out. Here's my thought dump for a story:


In the (largely undepicted) three years of canon Korra Alone, Korra (traveling anonymously) makes a stop on Kyoshi Island hoping to reconnect with her spirit. Instead, she meets a humble blacksmith with a bird spirit on his shoulder, and connects with him. They both wrestle with the demons haunting them, and they discover secrets on the island from Kyoshi's era.

Korra dies in 190 AG (at 37), already weakened from her metal poisoning, saving the world froma a cataclysm that leaves much of the world overgrown.

Intially, the story jumps between this period in Korra Alone (174AG) and 206 AG, where Asami Sato is struggles to steer Future Industries in a world dominated by megacorps in the safe 'havens' dotted through the world. While Kyoshi Island has barely changed at all, the 'future' thread has a more cyberpunk feel. Chi based cybernetics are commonplace, but the more augmented someone is, the more their bending is compromised, and bender vs nonbender tensions flare up once more. The world outside the safe havens is a dangerous wasteland. Tech derived from studying spirits has let to the proliferation of holograms, BCIs, and even primitive assistants and virtual environments, and advances in power storage/generation already seen in LoK mean everything is largely electric. Yet the world is still "analog," with tube radios and TVs, no digital electronics, and 'dumb' virtual assistants that are error-prone and incapable of math, giving it a retro feel. There are no guns, of course, but personal weapons like arc casters, flamethrowers, cryo blasters and such all mimic bending.

The Sei'naka clan has risen to power in the Fire Nation, taking advantage of the aftermath of the 100 Years War, the Red Lotus Insurrection, Future Industry's relative benevolence, and even the recently calamity. Now a ruthless corporation bigger than Future Industires, they dominate business and politics wherever they expand.

The White Lotus's search for the Avatar has failed. Asami rather infamously misidentified the Avatar... until one day, she find them.


Once this background is established, the story jumps back to our inseparable twin Avatars, Pavi and Nisha, born deep in the Foggy Swamp. Thanks to their predecessor, they live a harmonious, largely isolated life as members of the Foggy Swamp Tribe on the back of a water Lion Turtle. To her utter shock, they manage to manifest Korra at nine, withh Past Life Korra appearing as a nine-year-old. Dumbfounded, not even sure who the 'real' Avatar is, the girls assume she is just another spirit in the swamp. So Korra makes the decision to go along with this, and let them have a childhood she never had under the White Lotus as she figures out just what's going on with the twin Avatars.

Ultimately, the real world comes crashing into the new Avatars' isolated life, and they react poorly to Korra telling them the truth at 16. Through some more disasters and tragedies, they end up on the streets of Republic City, separated for a time, before meeting friends. The rest of the story revolves around corporate and personal greed (very much like the real wo9rld), conflict (and synergy) between the environment/spirits and technology, rivalries, family, friends across lifetimes, the nature of consciousness, reincarnation and the soul, and a conspiracy going all the way back to Kuruk threading through everything.


Some character profiles I'm working on:

  • Priya: Independent, kind, and resourceful. Priya is a reluctant hero who avoids altercations or fighting, but still believes in helping people using her creativity and wits. She's a talented musician and loves to make up songs with her taanbur (guitar-like instrument). Priya lost most of her leg in an accident from a fallen tree that killed her parents in the Foggy Swamp, but bends roots and muddy water as if they were her own limb. Its eventually revealed that she carries Raava. Once she finds out, Priya in particular is reluctant to accept her role as the Avatar, until a tragedy forces her hand.

  • Nikki is more snarky. She loves her powers and the attention it affords her, but her biggest fear is to be forgotten or not accepted by others. Nikki is awkward but puts on a face of superior confidence to hide the fact that she feels like a fish out of water. Despite her cocky attitude, Nikki is also an innovator, and her wild side is useful at times. Like her sister, she's highly attuned to the swamp, able to connect to and even manifest the collective memories of lost loved ones by touching spirit vines in the swamp. Both are apparently waterbenders with a proclivity for mud. Its eventually revealed that she carries Vaatu inside her. Nikki is missing part of her arm, but bends as a replacement, much line Ming-Hua.

  • Korra: Largely as she is in LoK. Hot blooded, quick to fight, passionate, empathetic, and not very spiritual. In a reversal of roles, Priya and Nikki keep her manifested constantly, and Korra becomes their best friend, learing about thier life in the Foggy Swamp. Later in the story, Korra's almost like a Johnny Silverhand to the new Avatars: manifested at will, a voice constantly in thier heads offering commentary, occasionally butting heads with them in a complex but close and encouraging relationship.

  • Asami Sato: Largely as she was in LoK, driven, collected, strong, smart, loyal. Now she's fifty, with a cybernetic leg from an accident. Asami still altruistic, and retained control of Future Industries through the years, but struggles with pushback from a corporate world driven by expansion and greed, and ultimately has to grapple with some of what her own company has done under her nose.

  • Mako: Largely as he was, brooding, cool, a noir-like detective. Recently retired as police chief, and has been secretly piecing together the conspiracy running through the plot.

  • Ren: The blacksmith Korra meets on Kyoshi Island. Softspoken, painfully shy, air headed and ADD, stocky and green-eyed, Ren nonetheless has a dry wit. He's self depreciating to a fault, but has a soft heart. To Korra's utter shock, Ren is a metalbender and a lavabender, using the combination to effortlessly sculpt armor and weapons, and tinker with delicate electronics. He's terrified of lightning, with a massive scar covering his back that flares up in storms or when anxious. Almost as broken as Korra is at the start, Ren reveals that his father's ancestors were lavabending miners and blacksmiths in the Hundred Years War. His past is initially shrouded in mystery, but its slowly revealed that his mother is the scientist who originally conceived of spirit vine technology, and that Varrick only replicated some of her work. Ren's mom has an 'Oppenheimer moment' and defects from Kuvira's proto Earth Empire. Ren ends up as the only survivor, deeply scarred from a spirit vine "detonation" similar to the LoK finale, that fused his soul to his body, and he's hiding from warlords hunting him for what he knows. Through the story, he grows particularly close to Asami and Korra, and grapples with some of the technology he pioneers.

  • Kaida: CTO of Future Industries, Kaida is the biological daughter of Korra and Ren, who both died when she was 11. Utterly tenacious, hot-blooded, fearless, and a fierce fighter like her mom, Kaida barges into the story literally melting the metal floor in front of reporters harassing her 'mom,' Asami. Fiercely intelligent, impulsive, but with some of her dad's air-headedness, introversion, and love of tinkering with technology, Kaida is almost constantly clad in meteor-metal alloy plate armor she wears as a second skin. She favors a jian, like Korra learned to use on Kyoshi Island. Kaida a talented engineer, but struggles with the tremendous legacy she's been thrust into.

  • Yuri Sei'naka: One of many vying for supremacy in the Sei'naka family, Yuri resembles Azula; A charismatic leader with a ruthless streak, an obsession with perfection, and a fantastically talented firebender, she has Azula's the same sharp yellow eyes and features. Like her twin brother, Yoru, Yuri chose the 'hard' path of bending over advanced cybernetics the wealthy have access too. Nevertheless, she has a good moral compass, and is unconditionally loyal to her brother.. The siblings have an intense rivalry with Kaida, just as thier company rivals Future Industries.

  • Yoru Sei'naka: A firebending and lightning bending prodigy and a cunning strategist, Yoru is mute, having lost his ability to speak in a sparring accident as a kid. Yoru and Yuri are practically inseperable, with Yuri serving as his voice. Tasked with tracking down the unkown Avatar by the matriarch of the clan, and always beholden to his intense sense of honor, Yoru suffers through a tragic 'Zuko' arc through the story.

spoiler

  • Father Glowworm: The ancient spirit survived the death of Yun, and is an ever-present invisible hand through the story, albeit with a newfound distate for humans. The swamp, taboo spirit vine technology, and just how he tunnels between worlds will all tie into crises Priya and Nikki must navigate.

  • I'm still working on other antagonists, but there will be a warlord who tries to capture Ren on Kyoshi Island, a ruthless corporate matriarch of the Sei'naka dynasty (Natsu?), a charismatic rebel like something between Amon and Zaheer, and more. I'm also thinking on a blind airbending thief who rejected his rich family, and a loud, warm Sun Warrior whos people have resettled in Republic City, and an introverted netrunner-like hacker as companions for the Avatar.


  • Other thoughts:

    • I don't like some 'leaked' aspects of the upcoming show, like the twin Avatars being nine and the White Lotus being so involved and 'problematic.' I'd much rather have the twin Avatars be lost, ignorant of thier own nature in the Foggy Swamp because they appear to be waterbenders with a proclivity for mud.

    • On that note, stealing the idea from here, maybe Priya can only bend air and water, while Nikki can only bend earth and fire, reflecting the split of their spirits and personalities.

    • Remnants of the Northern and Southern Water Tribes have drifted to political extremes.

    • The 'wasteland' is populated by spirits, and human opportunists looking to brave it.

    • The Avatars' monkey cat companion is a spirit they befriended in the forest.

    • Spirit Vine technology is taboo and effectively 'lost' after the calamity.

    • The Avatars' Tribe lives atop a Lion Turtle the swamp hid for millenia.

    • The 'nature' of the Foggy Swamp is expanded. For instance, in one chapter, Priya and Nikki manifest and talk to respresentations of their parents, built from the collectively memory of everyone who ever knew them, all connected though vines. It brings up existential questions in Korra's head, and parallels with some of the spirit-based technology the rest of the world has developed.


    ...So, those are my scattered thoughts so far.

    Does that sound like a sane, plausible base for a post-LoK story? Do you think any of it would fit into canon? I particularly like the idea of a 'metal lavabending' canon companion, and maybe some more futuristic elements in the havens that do exist.

    36
    submitted 1 week ago* (last edited 1 week ago) by brucethemoose to c/avatar
     

    Most details are in the article ^

    Reddit source of images: https://www.reddit.com/r/TheLastAirbender/comments/1hi2tte/more_confirmation_on_the_leaks_this_was_using_the/

    I find this interesting! Post apocalyptic is a good way to "reset" the world, and the idea of twin Avatars has been batted around the fandom for some time.

    56
    submitted 1 week ago* (last edited 1 week ago) by brucethemoose to c/technology
     

    Maybe even 32GB if they use newer ICs.

    More explanation (and my source of the tip): https://www.pcgamer.com/hardware/graphics-cards/shipping-document-suggests-that-a-24-gb-version-of-intels-arc-b580-graphics-card-could-be-heading-to-market-though-not-for-gaming/

    Would be awesome if true, and if it's affordable. Screw Nvidia (and, inexplicably, AMD) for their VRAM gouging.

    324
    submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/selfhosted
     

    I see a lot of talk of Ollama here, which I personally don't like because:

    • The quantizations they use tend to be suboptimal

    • It abstracts away llama.cpp in a way that, frankly, leaves a lot of performance and quality on the table.

    • It abstracts away things that you should really know for hosting LLMs.

    • I don't like some things about the devs. I won't rant, but I especially don't like the hint they're cooking up something commercial.

    So, here's a quick guide to get away from Ollama.

    • First step is to pick your OS. Windows is fine, but if setting up something new, linux is best. I favor CachyOS in particular, for its great python performance. If you use Windows, be sure to enable hardware accelerated scheduling and disable shared memory.

    • Ensure the latest version of CUDA (or ROCm, if using AMD) is installed. Linux is great for this, as many distros package them for you.

    • Install Python 3.11.x, 3.12.x, or at least whatever your distro supports, and git. If on linux, also install your distro's "build tools" package.

    Now for actually installing the runtime. There are a great number of inference engines supporting different quantizations, forgive the Reddit link but see: https://old.reddit.com/r/LocalLLaMA/comments/1fg3jgr/a_large_table_of_inference_engines_and_supported/

    As far as I am concerned, 3 matter to "home" hosters on consumer GPUs:

    • Exllama (and by extension TabbyAPI), as a very fast, very memory efficient "GPU only" runtime, supports AMD via ROCM and Nvidia via CUDA: https://github.com/theroyallab/tabbyAPI

    • Aphrodite Engine. While not strictly as vram efficient, its much faster with parallel API calls, reasonably efficient at very short context, and supports just about every quantization under the sun and more exotic models than exllama. AMD/Nvidia only: https://github.com/PygmalionAI/Aphrodite-engine

    • This fork of kobold.cpp, which supports more fine grained kv cache quantization (we will get to that). It supports CPU offloading and I think Apple Metal: https://github.com/Nexesenex/croco.cpp

    Now, there are also reasons I don't like llama.cpp, but one of the big ones is that sometimes its model implementations have... quality degrading issues, or odd bugs. Hence I would generally recommend TabbyAPI if you have enough vram to avoid offloading to CPU, and can figure out how to set it up. So:

    This can go wrong, if anyone gets stuck I can help with that.

    • Next, figure out how much VRAM you have.

    • Figure out how much "context" you want, aka how much text the llm can ingest. If a models has a context length of, say, "8K" that means it can support 8K tokens as input, or less than 8K words. Not all tokenizers are the same, some like Qwen 2.5's can fit nearly a word per token, while others are more in the ballpark of half a work per token or less.

    • Keep in mind that the actual context length of many models is an outright lie, see: https://github.com/hsiehjackson/RULER

    • Exllama has a feature called "kv cache quantization" that can dramatically shrink the VRAM the "context" of an LLM takes up. Unlike llama.cpp, it's Q4 cache is basically lossless, and on a model like Command-R, an 80K+ context can take up less than 4GB! Its essential to enable Q4 or Q6 cache to squeeze in as much LLM as you can into your GPU.

    • With that in mind, you can search huggingface for your desired model. Since we are using tabbyAPI, we want to search for "exl2" quantizations: https://huggingface.co/models?sort=modified&search=exl2

    • There are all sorts of finetunes... and a lot of straight-up garbage. But I will post some general recommendations based on total vram:

    • 4GB: A very small quantization of Qwen 2.5 7B. Or maybe Llama 3B.

    • 6GB: IMO llama 3.1 8B is best here. There are many finetunes of this depending on what you want (horny chat, tool usage, math, whatever). For coding, I would recommend Qwen 7B coder instead: https://huggingface.co/models?sort=trending&search=qwen+7b+exl2

    • 8GB-12GB Qwen 2.5 14B is king! Unlike it's 7B counterpart, I find the 14B version of the model incredible for its size, and it will squeeze into this vram pool (albeit with very short context/tight quantization for the 8GB cards). I would recommend trying Arcee's new distillation in particular: https://huggingface.co/bartowski/SuperNova-Medius-exl2

    • 16GB: Mistral 22B, Mistral Coder 22B, and very tight quantizations of Qwen 2.5 34B are possible. Honorable mention goes to InternLM 2.5 20B, which is alright even at 128K context.

    • 20GB-24GB: Command-R 2024 35B is excellent for "in context" work, like asking questions about long documents, continuing long stories, anything involving working "with" the text you feed to an LLM rather than pulling from it's internal knowledge pool. It's also quite goot at longer contexts, out to 64K-80K more-or-less, all of which fits in 24GB. Otherwise, stick to Qwen 2.5 34B, which still has a very respectable 32K native context, and a rather mediocre 64K "extended" context via YaRN: https://huggingface.co/DrNicefellow/Qwen2.5-32B-Instruct-4.25bpw-exl2

    • 32GB, same as 24GB, just with a higher bpw quantization. But this is also the threshold were lower bpw quantizations of Qwen 2.5 72B (at short context) start to make sense.

    • 48GB: Llama 3.1 70B (for longer context) or Qwen 2.5 72B (for 32K context or less)

    Again, browse huggingface and pick an exl2 quantization that will cleanly fill your vram pool + the amount of context you want to specify in TabbyAPI. Many quantizers such as bartowski will list how much space they take up, but you can also just look at the available filesize.

    • Now... you have to download the model. Bartowski has instructions here, but I prefer to use this nifty standalone tool instead: https://github.com/bodaay/HuggingFaceModelDownloader

    • Put it in your TabbyAPI models folder, and follow the documentation on the wiki.

    • There are a lot of options. Some to keep in mind are chunk_size (higher than 2048 will process long contexts faster but take up lots of vram, less will save a little vram), cache_mode (use Q4 for long context, Q6/Q8 for short context if you have room), max_seq_len (this is your context length), tensor_parallel (for faster inference with 2 identical GPUs), and max_batch_size (parallel processing if you have multiple user hitting the tabbyAPI server, but more vram usage)

    • Now... pick your frontend. The tabbyAPI wiki has a good compliation of community projects, but Open Web UI is very popular right now: https://github.com/open-webui/open-webui I personally use exui: https://github.com/turboderp/exui

    • And be careful with your sampling settings when using LLMs. Different models behave differently, but one of the most common mistakes people make is using "old" sampling parameters for new models. In general, keep temperature very low (<0.1, or even zero) and rep penalty low (1.01?) unless you need long, creative responses. If available in your UI, enable DRY sampling to tamp down repition without "dumbing down" the model with too much temperature or repitition penalty. Always use a MinP of 0.05 or higher and disable other samplers. This is especially important for Chinese models like Qwen, as MinP cuts out "wrong language" answers from the response.

    • Now, once this is all setup and running, I'd recommend throttling your GPU, as it simply doesn't need its full core speed to maximize its inference speed while generating. For my 3090, I use something like sudo nvidia-smi -pl 290, which throttles it down from 420W to 290W.

    Sorry for the wall of text! I can keep going, discussing kobold.cpp/llama.cpp, Aphrodite, exotic quantization and other niches like that if anyone is interested.

     

    cross-posted from: https://lemmy.world/post/19925986

    https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e

    Qwen 2.5 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B just came out, with some variants in some sizes just for math or coding, and base models too.

    All Apache licensed, all 128K context, and the 128K seems legit (unlike Mistral).

    And it's pretty sick, with a tokenizer that's more efficient than Mistral's or Cohere's and benchmark scores even better than llama 3.1 or mistral in similar sizes, especially with newer metrics like MMLU-Pro and GPQA.

    I am running 34B locally, and it seems super smart!

    As long as the benchmarks aren't straight up lies/trained, this is massive, and just made a whole bunch of models obsolete.

    Get usable quants here:

    GGUF: https://huggingface.co/bartowski?search_models=qwen2.5

    EXL2: https://huggingface.co/models?sort=modified&search=exl2+qwen2.5

    15
    submitted 3 months ago* (last edited 3 months ago) by brucethemoose to c/[email protected]
     

    https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e

    Qwen 2.5 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B just came out, with some variants in some sizes just for math or coding, and base models too.

    All Apache licensed, all 128K context, and the 128K seems legit (unlike Mistral).

    And it's pretty sick, with a tokenizer that's more efficient than Mistral's or Cohere's and benchmark scores even better than llama 3.1 or mistral in similar sizes, especially with newer metrics like MMLU-Pro and GPQA.

    I am running 34B locally, and it seems super smart!

    As long as the benchmarks aren't straight up lies/trained, this is massive, and just made a whole bunch of models obsolete.

    Get usable quants here:

    GGUF: https://huggingface.co/bartowski?search_models=qwen2.5

    EXL2: https://huggingface.co/models?sort=modified&search=exl2+qwen2.5

     

    Obviously there's not a lot of love for OpenAI and other corporate API generative AI here, but how does the community feel about self hosted models? Especially stuff like the Linux Foundation's Open Model Initiative?

    I feel like a lot of people just don't know there are Apache/CC-BY-NC licensed "AI" they can run on sane desktops, right now, that are incredible. I'm thinking of the most recent Command-R, specifically. I can run it on one GPU, and it blows expensive API models away, and it's mine to use.

    And there are efforts to kill the power cost of inference and training with stuff like matrix-multiplication free models, open source and legally licensed datasets, cheap training... and OpenAI and such want to shut down all of this because it breaks their monopoly, where they can just outspend everyone scaling , stealiing data and destroying the planet. And it's actually a threat to them.

    Again, I feel like corporate social media vs fediverse is a good anology, where one is kinda destroying the planet and the other, while still niche, problematic and a WIP, kills a lot of the downsides.

     

    cross-posted from: https://lemmy.world/post/19242887

    I can run the full 131K context with a 3.75bpw quantization, and still a very long one at 4bpw. And it should barely be fine-tunable in unsloth as well.

    It's pretty much perfect! Unlike the last iteration, they're using very aggressive GQA, which makes the context small, and it feels really smart at long context stuff like storytelling, RAG, document analysis and things like that (whereas Gemma 27B and Mistral Code 22B are probably better suited to short chats/code).

    view more: next ›