Bitnet is theoretical now and unsupported by NPUs anyway.
Basically they are useless for large models :P
The IGPs on the newest AMD/Intel IGPs are OK for hosting models up to like 14B though. Maybe 32B if with the right BIOS, if you don't mind very slow output.
If I were you, on a 3080, if you keep desktop vram usage VERY minimal, I would run TabbyAPI and a 4bpw exl2 quantization of Qwen 2.5 14B coder, instruct, and RP finetune... pick your flavor. I'd recommend this one in particular.
https://huggingface.co/bartowski/SuperNova-Medius-exl2/tree/4_25
Run it with Q6 cache and set the context to like 16K, or whatever you can fit in your vram.
I guarantee this will blow away whatever llama (8b) setup you have.
I don't think anyone wants a hot war in NK, and I'm not sure what good it would do.
Europe needs to (and should have) get off their butts and send every piece of hardware they have to Ukraine though, even cutting edge ones. Maybe even enforce a no-fly zone. As I keep asking, what are they waiting for... Spain to invade France? No, they built all this stuff to deter Soviet aggression, and its just sitting there, rotting instead of doing its job. If Ukraine would have stayed secure, they basically would never have to worry about this again.
Now they have no excuse. Russia clearly has no shame. And it's almost (but not quite) too late.