fhein

joined 2 years ago
[–] fhein 20 points 2 months ago

Seems like my "fuck EA I'm not giving them money ever again" policy is beginning to pay off :)

[–] fhein 2 points 2 months ago

I bought a Razer Basilisk 3 because it was the only mouse where I could reach both thumb buttons with the fingertip-ish grip I use. Wasn't fully supported by Linux software at first, but worst case I could program it on Windows which I had on a dual boot at the time. Now that I can use it with Polychromatic and OpenRazer it even works better on Linux. On Windows the Razer software won't let me save individual LED colours to the mouse, and needs to be running all the time in order to do that..

[–] fhein 3 points 2 months ago* (last edited 2 months ago)

My guess is that Microsoft will provide their own kernel level anticheat to game developers, using a secure API which will be impossible to emulate with Wine etc.

[–] fhein 1 points 2 months ago

We just had Windows Update brick itself due to a faulty update. The fix required updating them manually while connected to the office network, making them unusable for 2-3 hours. Another issue we've had is that Windows appears to be monopolizing virtualization HW acceleration for some memory integrity protection, which made our VMs slow and laggy. Fixing it required a combination of shell commands, settings changes and IT support remotely changing some permission, but the issue also comes back after some updates.

Though I've also had quite a lot of Windows problems at home, when I was still using it regularly. Not saying Linux usage has been problem free, but there I can at least fix things. Windows has a tendency to give unusable error messages and make troubleshooting difficult, and even when you figure out what's wrong you're at the mercy of Microsoft if you are allowed to change things on your own computer, due to their operating system's proprietary nature.

[–] fhein 6 points 2 months ago

Already? I'm still using Fedora 39 since that's the only version supported by CUDA Toolkit :S

[–] fhein 2 points 2 months ago

Article is written in a bit confusing way, but you'll most likely want to turn off Nvidia's automatic VRAM swapping if you're on Windows, so it doesn't happen by accident. Partial offloading with llama.cpp is much faster AFAIK if you want to split the model between GPU and CPU, and it's easier to find how many layers you can offload if it fails to load instead when you set it too high.

Also if you want to experiment partial offload, maybe a 12B around Q4 would be more interesting than the same 7B model with higher precision? I haven't checked if anything new has come out the last couple of months, but Mistral Nemo is fairly good IMO, though you might need to limit context to 4k or something.

[–] fhein 2 points 2 months ago

S har gradvis bytt socialdemokrati mot nyliberalism.. Är det för att folk som bara är ute efter pengar och makt har lyckats arbeta sig upp inom partiet, och nu gör som de vill?

[–] fhein 3 points 2 months ago (2 children)

Det är väl en ganska central del av högerpolitik, att man ska lägga så lite pengar som möjligt på samhällsinvesteringar och offentlig sektor? Och såklart det behövdes nedskärningar för att finansiera jobbskatteavdrag, RUT och ROT. Kanske är en bidragande anledning att vi haft så ambivalent politik i landet senaste decennierna, att folk röstar med plånboken och sen blir förvånade och arga när samhället inte längre fungerar som det ska.

[–] fhein 1 points 2 months ago (1 children)

Mixtral in particular runs great with partial offloading, I used a Q4_K_M quant while only having 12GB VRAM.

To answer your original question I think it depends on the model and use case. Complex logic such as programming seems to suffer the most from quantization, while RP/chat can take much heaver quantization while staying coherent. I think most people think quantization around 4-5 bpw gives the best value, and you really get diminishing returns over 6 bpw so I know few who thinks it's worth using 8 bpw.

Personally I always use as large models as I can. With Q2 quantization the 70B models I've used occasionally give bad results, but often they feel smarter than 35B Q4. Though it's ofc. difficult to compare models from completely different families, e.g. command-r vs llama, and there are not that many options in the 30B range. I'd take a 35B Q4 over a 12B Q8 any day though, and 12B Q4 over 7B Q8 etc. In the end I think you'll have to test yourself, and see which model and quant combination you think gives best result at the inference speed you consider usable.

[–] fhein 3 points 2 months ago

On Linux, AMD GPUs work significantly better than Nvidia ones. If you have a choice, choose an AMD

Unless you're interested in AI stuff, then Nvidia is still the best choice. Some libraries are HW accelerated on AMD, and hopefully more will work in the future.

[–] fhein 2 points 2 months ago

Ofc I know it's not meant to be literal, but talking about killing black people or not is too direct. The subjects people like this usually want to talk about tend to be more layered, e.g. "what should we do about the Jew problem" so that if you take the bait you'll implicitly accept that "the Jew problem" exists to begin with.

[–] fhein 4 points 3 months ago

Klart det alltid bara handlar om grova brott när lagar ska klubbas igenom. En dag kanske rikspolischefen börjar "tro att det skulle göra skillnad" om de också använde sitt nya fina system till att hålla koll på var alla medborgare befinner sig hela tiden, ifall någon skulle få för sig att begå ett brott i framtiden.

view more: ‹ prev next ›