overview for brucethemoose

Trump might have to ask foes like Trudeau for emergency eggs in c/news

[–] brucethemoose 4 points 5 hours ago

I might just move to Canada if he does that.

Trump might have to ask foes like Trudeau for emergency eggs in c/news

[–] brucethemoose 2 points 5 hours ago

That’s what allies do.

…Ugh.

This is why the USA supports Ukraine, typical USA imperialism in c/comicstrips

[–] brucethemoose 16 points 9 hours ago* (last edited 9 hours ago) (1 children)

The dream is more important than reality.

This is how some of Trump's real estate endeavors worked before he got into politics: hype, and sell to the public markets at an inflated price. For all people criticize him, he’s very good at encouraging the masses to buy things they really shouldn’t buy.

And later? Doesn’t matter, someone else is holding the bag after that, and it’s onto the next thing. Sounds a lot like crypto TBH.

Macron: EU needs ‘hundreds of billions’ in defense spending as US pivots away in c/[email protected]

[–] brucethemoose 10 points 14 hours ago* (last edited 14 hours ago) (2 children)

I wonder what US military suppliers think of all this?

Isn’t a EU a huge buyer? They sure as heck won’t be anymore.

People on Socials - 2025 Edition in c/[email protected]

[–] brucethemoose 5 points 1 day ago

Just skip it and go straight to the source, lol.

People on Socials - 2025 Edition in c/[email protected]

[–] brucethemoose 18 points 1 day ago (1 children)

But the caretaker is a slop bot, right?

People on Socials - 2025 Edition in c/[email protected]

[–] brucethemoose 20 points 1 day ago (7 children)

What’s Facebook?

Diplomacy dies on live TV as Trump and Vance gang up to bully Ukraine leader in c/world

[–] brucethemoose 6 points 1 day ago

If you are talking about Europe, I’d check your own house and the growing far right parties.

uh yeah how much more evidence do you people need in c/politicalmemes

[–] brucethemoose 33 points 2 days ago* (last edited 2 days ago) (9 children)

Known as Mike from PA, Mike is a progressive Political Activist from Pennsylvania and is known as the Steelman of the left, Mike advocates for union labor and social democracy.

So… False flag Tweet?

Alibaba Releases Advanced Open Video Model, Immediately Becomes AI Porn Machine in c/technology

[–] brucethemoose 3 points 2 days ago (1 children)

Anime/semi-real style is already shockingly good.

Alibaba Releases Advanced Open Video Model, Immediately Becomes AI Porn Machine in c/technology

[–] brucethemoose 6 points 2 days ago* (last edited 2 days ago)

This is Lemmy. Why not self host the generation? :)

The reaction of the Ukrainian ambassador to the US during the argument between Zelensky and Trump. in c/[email protected]

[–] brucethemoose 4 points 2 days ago

Because there’s a drastic disparity between Republicans/Democrats, and (not illustrated in this pdf alone), the level of support among Republics dropped massively within a year.

-4

Trump 2.0 initial approval ratings higher than in first term (www.axios.com)

submitted 3 weeks ago by brucethemoose to c/politics

10 comments fedilink

53% of Americans approve of Trump so far, according to a newly released CBS News/YouGov poll conducted Feb. 5 to 7, while 47% disapproved.

A large majority, 70%, said he was doing what he promised in the campaign, per the poll that was released on Sunday.

Yes, but: 66% said he was not focusing enough on lowering prices, a key campaign trail promise that propelled Trump to the White House.

44% of Republicans said Musk and DOGE should have "some" influence, while just 13% of Democrats agreed.

5

CachyOS Is the Distro to End All Distros (lemmy.world)

submitted 1 month ago* (last edited 1 month ago) by brucethemoose to c/cachyos

1 comments fedilink

Hey, I have nothing to do with CachyOS or this Lemmy community, but just wanna say I love this distro.

It's everything annoying about Arch Linux (to me) fixed, more convenient, and objectively fast as heck. I distro hopped for a long time, but have zero inclination to switch after finding CachyOS. I hardly need to tweak anything. It's all optimal out of the box! And how many other distros offer their own AVX2/AVX512 packages by default?

...I haven't even reinstalled CachyOS on my main PC for almost two years. I can't say that for Ubuntu, Fedora, PopOS, or (heaven forbid) Manjaro, all of which are more ostensibly stable yet always seem to break, or get behind on fixes I need. Kionite was too finicky with the whole immutable thing. Garuda Linux was OK, but more bloated, and not nearly as "optimally preconfigured" as CachyOS.

40

Behind the Curtain: Meta's make-up-with-MAGA map (www.axios.com)

submitted 1 month ago* (last edited 1 month ago) by brucethemoose to c/politics

4 comments fedilink

Here's the Meta formula:

Put a Trump friend on your board (Ultimate Fighting Championship CEO Dana White).

Promote a prominent Republican as your chief global affairs officer (Joel Kaplan, succeeding liberal-friendly Nick Clegg, president of global affairs).

Align your philosophy with Trump's on a big-ticket public issue (free speech over fact-checking).

Announce your philosophical change on Fox News, hoping Trump is watching. In this case, he was. "Meta, Facebook, I think they've come a long way," Trump said at a Mar-a-Lago news conference, adding of Kaplan's appearance on the "Fox and Friends" curvy couch: "The man was very impressive."

Take a big public stand on a favorite issue for Trump and MAGA (rolling back DEI programs).

Amplify that stand in an interview with Fox News Digital. (Kaplan again!)

Go on Joe Rogan's podcast and blast President Biden for censorship.

16

Elon Musk's headline dominance squeezes other CEOs (www.axios.com)

submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/enoughmuskspam

6 comments fedilink

Taboola's data, shared exclusively with Axios, shows Musk has outpaced his closest peers — Jeff Bezos and Mark Zuckerberg — for years, but the gap widened dramatically in 2024.

The spam is already exponential. :(

372

Trump sides with Musk in H-1B fight (www.axios.com)

submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/politics

73 comments fedilink

Reality check: Trump pledged to end the program in 2016.

Called it. When push comes to shove, Trump is always going to side with the ultra-rich.

215

Elon Musk pledges "war" over H-1B visa program, calls opponents racists (www.axios.com)

submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/politics

59 comments fedilink

Trump, who has remained silent thus far on the schism, faces a quickly deepening conflict between his richest and most powerful advisors on one hand, and the people who swept him to office on the other.

All this is stupid. But I know one thing:

Trump is a billionaire.

And I predict his followers are going to learn who he’ll side with when push comes to shove.

Also, Bannon’s take is interesting:

Bannon tells Axios he helped kick off the debate with a now-viral Gettr post earlier this month calling out a lack of support for the Black and Hispanic communities in Big Tech.

249

Musk calls MAGA element "contemptible fools" as virtual civil war brews (www.axios.com)

submitted 2 months ago by brucethemoose to c/politics

54 comments fedilink

141

MAGA vs. Musk: Right-wing critics allege censorship, loss of X badges (www.axios.com)

submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/leopardsatemyface

11 comments fedilink

I think the title explains it all… Even right wing influencers can have their faces eaten. And Twitter views are literally their livelihood.

Trump's conspiracy-minded ally Laura Loomer, New York Young Republican Club president Gavin Wax and InfoWars host Owen Shroyer all said their verification badges disappeared after they criticized Musk's support for H1B visas, railed against Indian culture and attacked Ramaswamy, Musk's DOGE co-chair.

10

Brainstorming Post LoK/Avatar Seven Havens Story Ideas (self.avatar)

submitted 2 months ago by brucethemoose to c/avatar

0 comments fedilink

I have no idea if anyone on Lemmy is into Avatar lore/fanfiction, but in the spirit of posting content here instead of Reddit... here goes.

A new Avatar series featuring 'twin' Avatars has been leaked, in case you missed it:

https://knightedgemedia.com/2024/12/avatar-seven-havens-twin-earth-avatar-series-will-initially-be-26-episodes-long/

https://lemmy.world/post/23427458

In a nutshell, its allegedly set in a cataclysmic world overrun by spirit vines, and two twins are the 'Avatars' with diametric personalities. Not much is known beyond that, but I've been brainstorming some post-LoK ideas forever.

And now I kinda feel like writing them out. Here's my thought dump for a story:

In the (largely undepicted) three years of canon Korra Alone, Korra (traveling anonymously) makes a stop on Kyoshi Island hoping to reconnect with her spirit. Instead, she meets a humble blacksmith with a bird spirit on his shoulder, and connects with him. They both wrestle with the demons haunting them, and they discover secrets on the island from Kyoshi's era.

Korra dies in 190 AG (at 37), already weakened from her metal poisoning, saving the world froma a cataclysm that leaves much of the world overgrown.

Intially, the story jumps between this period in Korra Alone (174AG) and 206 AG, where Asami Sato is struggles to steer Future Industries in a world dominated by megacorps in the safe 'havens' dotted through the world. While Kyoshi Island has barely changed at all, the 'future' thread has a more cyberpunk feel. Chi based cybernetics are commonplace, but the more augmented someone is, the more their bending is compromised, and bender vs nonbender tensions flare up once more. The world outside the safe havens is a dangerous wasteland. Tech derived from studying spirits has let to the proliferation of holograms, BCIs, and even primitive assistants and virtual environments, and advances in power storage/generation already seen in LoK mean everything is largely electric. Yet the world is still "analog," with tube radios and TVs, no digital electronics, and 'dumb' virtual assistants that are error-prone and incapable of math, giving it a retro feel. There are no guns, of course, but personal weapons like arc casters, flamethrowers, cryo blasters and such all mimic bending.

The Sei'naka clan has risen to power in the Fire Nation, taking advantage of the aftermath of the 100 Years War, the Red Lotus Insurrection, Future Industry's relative benevolence, and even the recently calamity. Now a ruthless corporation bigger than Future Industires, they dominate business and politics wherever they expand.

The White Lotus's search for the Avatar has failed. Asami rather infamously misidentified the Avatar... until one day, she find them.

Once this background is established, the story jumps back to our inseparable twin Avatars, Pavi and Nisha, born deep in the Foggy Swamp. Thanks to their predecessor, they live a harmonious, largely isolated life as members of the Foggy Swamp Tribe on the back of a water Lion Turtle. To her utter shock, they manage to manifest Korra at nine, withh Past Life Korra appearing as a nine-year-old. Dumbfounded, not even sure who the 'real' Avatar is, the girls assume she is just another spirit in the swamp. So Korra makes the decision to go along with this, and let them have a childhood she never had under the White Lotus as she figures out just what's going on with the twin Avatars.

Ultimately, the real world comes crashing into the new Avatars' isolated life, and they react poorly to Korra telling them the truth at 16. Through some more disasters and tragedies, they end up on the streets of Republic City, separated for a time, before meeting friends. The rest of the story revolves around corporate and personal greed (very much like the real wo9rld), conflict (and synergy) between the environment/spirits and technology, rivalries, family, friends across lifetimes, the nature of consciousness, reincarnation and the soul, and a conspiracy going all the way back to Kuruk threading through everything.

Some character profiles I'm working on:

Priya: Independent, kind, and resourceful. Priya is a reluctant hero who avoids altercations or fighting, but still believes in helping people using her creativity and wits. She's a talented musician and loves to make up songs with her taanbur (guitar-like instrument). Priya lost most of her leg in an accident from a fallen tree that killed her parents in the Foggy Swamp, but bends roots and muddy water as if they were her own limb. Its eventually revealed that she carries Raava. Once she finds out, Priya in particular is reluctant to accept her role as the Avatar, until a tragedy forces her hand.
Nikki is more snarky. She loves her powers and the attention it affords her, but her biggest fear is to be forgotten or not accepted by others. Nikki is awkward but puts on a face of superior confidence to hide the fact that she feels like a fish out of water. Despite her cocky attitude, Nikki is also an innovator, and her wild side is useful at times. Like her sister, she's highly attuned to the swamp, able to connect to and even manifest the collective memories of lost loved ones by touching spirit vines in the swamp. Both are apparently waterbenders with a proclivity for mud. Its eventually revealed that she carries Vaatu inside her. Nikki is missing part of her arm, but bends as a replacement, much line Ming-Hua.
Korra: Largely as she is in LoK. Hot blooded, quick to fight, passionate, empathetic, and not very spiritual. In a reversal of roles, Priya and Nikki keep her manifested constantly, and Korra becomes their best friend, learing about thier life in the Foggy Swamp. Later in the story, Korra's almost like a Johnny Silverhand to the new Avatars: manifested at will, a voice constantly in thier heads offering commentary, occasionally butting heads with them in a complex but close and encouraging relationship.
Asami Sato: Largely as she was in LoK, driven, collected, strong, smart, loyal. Now she's fifty, with a cybernetic leg from an accident. Asami still altruistic, and retained control of Future Industries through the years, but struggles with pushback from a corporate world driven by expansion and greed, and ultimately has to grapple with some of what her own company has done under her nose.
Mako: Largely as he was, brooding, cool, a noir-like detective. Recently retired as police chief, and has been secretly piecing together the conspiracy running through the plot.
Ren: The blacksmith Korra meets on Kyoshi Island. Softspoken, painfully shy, air headed and ADD, stocky and green-eyed, Ren nonetheless has a dry wit. He's self depreciating to a fault, but has a soft heart. To Korra's utter shock, Ren is a metalbender and a lavabender, using the combination to effortlessly sculpt armor and weapons, and tinker with delicate electronics. He's terrified of lightning, with a massive scar covering his back that flares up in storms or when anxious. Almost as broken as Korra is at the start, Ren reveals that his father's ancestors were lavabending miners and blacksmiths in the Hundred Years War. His past is initially shrouded in mystery, but its slowly revealed that his mother is the scientist who originally conceived of spirit vine technology, and that Varrick only replicated some of her work. Ren's mom has an 'Oppenheimer moment' and defects from Kuvira's proto Earth Empire. Ren ends up as the only survivor, deeply scarred from a spirit vine "detonation" similar to the LoK finale, that fused his soul to his body, and he's hiding from warlords hunting him for what he knows. Through the story, he grows particularly close to Asami and Korra, and grapples with some of the technology he pioneers.
Kaida: CTO of Future Industries, Kaida is the biological daughter of Korra and Ren, who both died when she was 11. Utterly tenacious, hot-blooded, fearless, and a fierce fighter like her mom, Kaida barges into the story literally melting the metal floor in front of reporters harassing her 'mom,' Asami. Fiercely intelligent, impulsive, but with some of her dad's air-headedness, introversion, and love of tinkering with technology, Kaida is almost constantly clad in meteor-metal alloy plate armor she wears as a second skin. She favors a jian, like Korra learned to use on Kyoshi Island. Kaida a talented engineer, but struggles with the tremendous legacy she's been thrust into.
Yuri Sei'naka: One of many vying for supremacy in the Sei'naka family, Yuri resembles Azula; A charismatic leader with a ruthless streak, an obsession with perfection, and a fantastically talented firebender, she has Azula's the same sharp yellow eyes and features. Like her twin brother, Yoru, Yuri chose the 'hard' path of bending over advanced cybernetics the wealthy have access too. Nevertheless, she has a good moral compass, and is unconditionally loyal to her brother.. The siblings have an intense rivalry with Kaida, just as thier company rivals Future Industries.
Yoru Sei'naka: A firebending and lightning bending prodigy and a cunning strategist, Yoru is mute, having lost his ability to speak in a sparring accident as a kid. Yoru and Yuri are practically inseperable, with Yuri serving as his voice. Tasked with tracking down the unkown Avatar by the matriarch of the clan, and always beholden to his intense sense of honor, Yoru suffers through a tragic 'Zuko' arc through the story.

spoiler

Father Glowworm: The ancient spirit survived the death of Yun, and is an ever-present invisible hand through the story, albeit with a newfound distate for humans. The swamp, taboo spirit vine technology, and just how he tunnels between worlds will all tie into crises Priya and Nikki must navigate.

I'm still working on other antagonists, but there will be a warlord who tries to capture Ren on Kyoshi Island, a ruthless corporate matriarch of the Sei'naka dynasty (Natsu?), a charismatic rebel like something between Amon and Zaheer, and more. I'm also thinking on a blind airbending thief who rejected his rich family, and a loud, warm Sun Warrior whos people have resettled in Republic City, and an introverted netrunner-like hacker as companions for the Avatar.

Other thoughts:

I don't like some 'leaked' aspects of the upcoming show, like the twin Avatars being nine and the White Lotus being so involved and 'problematic.' I'd much rather have the twin Avatars be lost, ignorant of thier own nature in the Foggy Swamp because they appear to be waterbenders with a proclivity for mud.
On that note, stealing the idea from here, maybe Priya can only bend air and water, while Nikki can only bend earth and fire, reflecting the split of their spirits and personalities.
Remnants of the Northern and Southern Water Tribes have drifted to political extremes.
The 'wasteland' is populated by spirits, and human opportunists looking to brave it.
The Avatars' monkey cat companion is a spirit they befriended in the forest.
Spirit Vine technology is taboo and effectively 'lost' after the calamity.
The Avatars' Tribe lives atop a Lion Turtle the swamp hid for millenia.
The 'nature' of the Foggy Swamp is expanded. For instance, in one chapter, Priya and Nikki manifest and talk to respresentations of their parents, built from the collectively memory of everyone who ever knew them, all connected though vines. It brings up existential questions in Korra's head, and parallels with some of the spirit-based technology the rest of the world has developed.

...So, those are my scattered thoughts so far.

Does that sound like a sane, plausible base for a post-LoK story? Do you think any of it would fit into canon? I particularly like the idea of a 'metal lavabending' canon companion, and maybe some more futuristic elements in the havens that do exist.

38

'Avatar: Seven Havens' Rumors Emerge (knightedgemedia.com)

submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/avatar

13 comments fedilink

Most details are in the article ^

Reddit source of images: https://www.reddit.com/r/TheLastAirbender/comments/1hi2tte/more_confirmation_on_the_leaks_this_was_using_the/

I find this interesting! Post apocalyptic is a good way to "reset" the world, and the idea of twin Avatars has been batted around the fandom for some time.

56

[Rumor] Shipping Listing Suggests 24GB+ Intel Arc B580 (lemmy.world)

submitted 2 months ago* (last edited 2 months ago) by brucethemoose to c/technology

12 comments fedilink

Maybe even 32GB if they use newer ICs.

More explanation (and my source of the tip): https://www.pcgamer.com/hardware/graphics-cards/shipping-document-suggests-that-a-24-gb-version-of-intels-arc-b580-graphics-card-could-be-heading-to-market-though-not-for-gaming/

Would be awesome if true, and if it's affordable. Screw Nvidia (and, inexplicably, AMD) for their VRAM gouging.

326

Guide to Self Hosting LLMs Faster/Better than Ollama (self.selfhosted)

submitted 4 months ago* (last edited 4 months ago) by brucethemoose to c/selfhosted

83 comments fedilink

I see a lot of talk of Ollama here, which I personally don't like because:

The quantizations they use tend to be suboptimal
It abstracts away llama.cpp in a way that, frankly, leaves a lot of performance and quality on the table.
It abstracts away things that you should really know for hosting LLMs.
I don't like some things about the devs. I won't rant, but I especially don't like the hint they're cooking up something commercial.

So, here's a quick guide to get away from Ollama.

First step is to pick your OS. Windows is fine, but if setting up something new, linux is best. I favor CachyOS in particular, for its great python performance. If you use Windows, be sure to enable hardware accelerated scheduling and disable shared memory.
Ensure the latest version of CUDA (or ROCm, if using AMD) is installed. Linux is great for this, as many distros package them for you.
Install Python 3.11.x, 3.12.x, or at least whatever your distro supports, and git. If on linux, also install your distro's "build tools" package.

Now for actually installing the runtime. There are a great number of inference engines supporting different quantizations, forgive the Reddit link but see: https://old.reddit.com/r/LocalLLaMA/comments/1fg3jgr/a_large_table_of_inference_engines_and_supported/

As far as I am concerned, 3 matter to "home" hosters on consumer GPUs:

Exllama (and by extension TabbyAPI), as a very fast, very memory efficient "GPU only" runtime, supports AMD via ROCM and Nvidia via CUDA: https://github.com/theroyallab/tabbyAPI
Aphrodite Engine. While not strictly as vram efficient, its much faster with parallel API calls, reasonably efficient at very short context, and supports just about every quantization under the sun and more exotic models than exllama. AMD/Nvidia only: https://github.com/PygmalionAI/Aphrodite-engine
This fork of kobold.cpp, which supports more fine grained kv cache quantization (we will get to that). It supports CPU offloading and I think Apple Metal: https://github.com/Nexesenex/croco.cpp

Now, there are also reasons I don't like llama.cpp, but one of the big ones is that sometimes its model implementations have... quality degrading issues, or odd bugs. Hence I would generally recommend TabbyAPI if you have enough vram to avoid offloading to CPU, and can figure out how to set it up. So:

Open a terminal, run git clone https://github.com/theroyallab/tabbyAPI.git
cd tabbyAPI
Follow this guide for setting up a python venv and installing pytorch and tabbyAPI: https://github.com/theroyallab/tabbyAPI/wiki/01.-Getting-Started#installing

This can go wrong, if anyone gets stuck I can help with that.

Next, figure out how much VRAM you have.
Figure out how much "context" you want, aka how much text the llm can ingest. If a models has a context length of, say, "8K" that means it can support 8K tokens as input, or less than 8K words. Not all tokenizers are the same, some like Qwen 2.5's can fit nearly a word per token, while others are more in the ballpark of half a work per token or less.
Keep in mind that the actual context length of many models is an outright lie, see: https://github.com/hsiehjackson/RULER
Exllama has a feature called "kv cache quantization" that can dramatically shrink the VRAM the "context" of an LLM takes up. Unlike llama.cpp, it's Q4 cache is basically lossless, and on a model like Command-R, an 80K+ context can take up less than 4GB! Its essential to enable Q4 or Q6 cache to squeeze in as much LLM as you can into your GPU.
With that in mind, you can search huggingface for your desired model. Since we are using tabbyAPI, we want to search for "exl2" quantizations: https://huggingface.co/models?sort=modified&search=exl2
There are all sorts of finetunes... and a lot of straight-up garbage. But I will post some general recommendations based on total vram:
4GB: A very small quantization of Qwen 2.5 7B. Or maybe Llama 3B.
6GB: IMO llama 3.1 8B is best here. There are many finetunes of this depending on what you want (horny chat, tool usage, math, whatever). For coding, I would recommend Qwen 7B coder instead: https://huggingface.co/models?sort=trending&search=qwen+7b+exl2
8GB-12GB Qwen 2.5 14B is king! Unlike it's 7B counterpart, I find the 14B version of the model incredible for its size, and it will squeeze into this vram pool (albeit with very short context/tight quantization for the 8GB cards). I would recommend trying Arcee's new distillation in particular: https://huggingface.co/bartowski/SuperNova-Medius-exl2
16GB: Mistral 22B, Mistral Coder 22B, and very tight quantizations of Qwen 2.5 34B are possible. Honorable mention goes to InternLM 2.5 20B, which is alright even at 128K context.
20GB-24GB: Command-R 2024 35B is excellent for "in context" work, like asking questions about long documents, continuing long stories, anything involving working "with" the text you feed to an LLM rather than pulling from it's internal knowledge pool. It's also quite goot at longer contexts, out to 64K-80K more-or-less, all of which fits in 24GB. Otherwise, stick to Qwen 2.5 34B, which still has a very respectable 32K native context, and a rather mediocre 64K "extended" context via YaRN: https://huggingface.co/DrNicefellow/Qwen2.5-32B-Instruct-4.25bpw-exl2
32GB, same as 24GB, just with a higher bpw quantization. But this is also the threshold were lower bpw quantizations of Qwen 2.5 72B (at short context) start to make sense.
48GB: Llama 3.1 70B (for longer context) or Qwen 2.5 72B (for 32K context or less)

Again, browse huggingface and pick an exl2 quantization that will cleanly fill your vram pool + the amount of context you want to specify in TabbyAPI. Many quantizers such as bartowski will list how much space they take up, but you can also just look at the available filesize.

Now... you have to download the model. Bartowski has instructions here, but I prefer to use this nifty standalone tool instead: https://github.com/bodaay/HuggingFaceModelDownloader
Put it in your TabbyAPI models folder, and follow the documentation on the wiki.
There are a lot of options. Some to keep in mind are chunk_size (higher than 2048 will process long contexts faster but take up lots of vram, less will save a little vram), cache_mode (use Q4 for long context, Q6/Q8 for short context if you have room), max_seq_len (this is your context length), tensor_parallel (for faster inference with 2 identical GPUs), and max_batch_size (parallel processing if you have multiple user hitting the tabbyAPI server, but more vram usage)
Now... pick your frontend. The tabbyAPI wiki has a good compliation of community projects, but Open Web UI is very popular right now: https://github.com/open-webui/open-webui I personally use exui: https://github.com/turboderp/exui
And be careful with your sampling settings when using LLMs. Different models behave differently, but one of the most common mistakes people make is using "old" sampling parameters for new models. In general, keep temperature very low (<0.1, or even zero) and rep penalty low (1.01?) unless you need long, creative responses. If available in your UI, enable DRY sampling to tamp down repition without "dumbing down" the model with too much temperature or repitition penalty. Always use a MinP of 0.05 or higher and disable other samplers. This is especially important for Chinese models like Qwen, as MinP cuts out "wrong language" answers from the response.
Now, once this is all setup and running, I'd recommend throttling your GPU, as it simply doesn't need its full core speed to maximize its inference speed while generating. For my 3090, I use something like sudo nvidia-smi -pl 290, which throttles it down from 420W to 290W.

Sorry for the wall of text! I can keep going, discussing kobold.cpp/llama.cpp, Aphrodite, exotic quantization and other niches like that if anyone is interested.