What's ironic is that the local llm/diffusion communities will not touch these. They're just too slow, and impossibley finicky to set up with models big enough for people to actually want.
AMD's next gen could change that, but they've already poisoned the branding. Good job.
Bitnet is theoretical now and unsupported by NPUs anyway.
Basically they are useless for large models :P
The IGPs on the newest AMD/Intel IGPs are OK for hosting models up to like 14B though. Maybe 32B if with the right BIOS, if you don't mind very slow output.
If I were you, on a 3080, if you keep desktop vram usage VERY minimal, I would run TabbyAPI and a 4bpw exl2 quantization of Qwen 2.5 14B coder, instruct, and RP finetune... pick your flavor. I'd recommend this one in particular.
https://huggingface.co/bartowski/SuperNova-Medius-exl2/tree/4_25
Run it with Q6 cache and set the context to like 16K, or whatever you can fit in your vram.
I guarantee this will blow away whatever llama (8b) setup you have.