If you have a bunch of nodes, what do you need redundant NICs for? The other nodes should pick up the slack.
It's unlikely for the NIC or cable to suddenly go bad. If you only have one switch, you're not protected against its failure, either.
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.
Resources:
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
If you have a bunch of nodes, what do you need redundant NICs for? The other nodes should pick up the slack.
It's unlikely for the NIC or cable to suddenly go bad. If you only have one switch, you're not protected against its failure, either.
I plan to have 2 switches.
Of course, if a switch fails, client devices connected to the switch would drop out, but any computer connected to both switches should have link redundancy.
There are still tons of reasons to have redundant data paths down to the switch level.
At the enterprise level, we assume even the switch can fail. As an additional note, only some smart/managed switches (typically the ones with removable modules and cost in the five to six figures USD per chassis) can run a firmware upgrade without blocking networking traffic.
So from a failure case and switching during an upgrade procedure, you absolutely want two switches if that's your jam.
On my home system, I actually have four core switches: a Catalyst 3750X stack of two nodes for L3 and 1Gb/s switching, and then all my "fast stuff" is connected to a pair of ES-16-XG, each of which has a port channel of two 10G DACs back to to Catalyst stack, with one leg to each stack member.
To the point about NICs going bad - you're right its infrequent but can happen, especially with consumer hardware and not enterprise hardware. Also, at the 10G fiber level, though infrequent, you still see SFPs and DACs go bad at a higher rate than NICs
That product will never exist as there are only a handful of customers who would want it and even less who would pay for it.
Also, lookup the MTBF reports. It's more likely that all your Client systems will fail before a switch does.
I'm going to go a different route than your question. If you have a spare m.2 slot and room in your PC, you can install a m.2 network adapter. I recently installed a m.2 to 2.5gbe adapters in a Dell 3060 SFF as a proof of concept at home for getting Proxmox ceph cluster working over 2.5gbe.
This is the way to do it for minipcs in my experience. As long as for some reason the box you're using only allows for a whitelist of wlan cards to be used, but I haven't run into any that does that yet.
Oh that's nice. I have a second m2 slot in my TH80. Putting a stronger network card there could be a cool future upgrade
I have been trying to do bonds with USB adapters and while it usually seems to work fine at first, they just seem to randomly drop out when run 24/7 so I stopped doing that. In theory it seems like a good idea though.
You just saved me a headache.
I have been trying to do bonds with USB adapters
-- If you’re doing it for performance, you should compare a low end 2.5gE switch and cards to all that complexity. Higher performance, simpler, more reliable
-- if it’s to learn about bonding, consider how many you need and whether doing the same thing multiple times is a benefit
-- if it’s for redundancy/reliability, I don’t think this is going to work. My plan is to build a cluster of single board computers and do everything in containers. Keep the apps portable and the hardware replaceable
Sure, but you forgot about reusing perfectly good older 1gbit equipment with sufficient ports to do nice 4gbit bonds. I have been doing that with 4 port Intel NIC PCIe expansion cards for a while on those servers with free slots, but on those thin clients re-purposed as servers that is usually not the case.
I just have that happen in general with USB NICs. Random drops for seemingly no reason.
They're not meant for infrastructural use, just as travel adapters.
Such an abomination :-)
I cant speak to that brand specifically, but the USB ethernet adapters I have used are super unreliable. I have had 2 burn out randomly (out of 3). So it might not be as redundant as you would like.
I've used the tplink ones that they're using and they've been pretty solid. I can't say how they'd fare in a 24/7 setup though since they're not really intended for that.
Why do you need a redundant network is basically what I'm wondering? It seems like an odd match up with small consumer boxes.
There would be some quality-of-life improvements like being able to replace a switch without pulling down entire cluster, but it is mostly for a challenge.
A good answer to a "why?" question is "why not?" This can be a great learning or practice opportunity for redundant network links and other interface challenges.
Absolutely this. I learned so much on my homelab that at this point has more resiliency than some medium businesses (n+1 switching, vSAN for critical VMs, n+0.5 UPS for 15 minutes)
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:
Fewer Letters | More Letters |
---|---|
AP | WiFi Access Point |
PCIe | Peripheral Component Interconnect Express |
RAID | Redundant Array of Independent Disks for mass storage |
SSD | Solid State Drive mass storage |
4 acronyms in this thread; the most compressed thread commented on today has 17 acronyms.
[Thread #199 for this sub, first seen 8th Oct 2023, 16:15] [FAQ] [Full list] [Contact] [Source code]
If those are gigabit, I think I have that exact adapter. I have never used it in production, but I have not run into any issues using it with a laptop when diagnosing. Theoretically you can connect hosts directly to each other via usb3 ala level one and have really fast through put but I have not even started investigating this.
The level1 video shows thunderbolt networking though. It is an interesting concept, but it requires nodes with at-least 2 thunderbolt ports in order to have more than 2 nodes.
You are right, I typed the wrong thing.
Adding to this - I have those adapters to, ans fyi they don't support jumbo frames.
Nice idea if you actually have the rest of the redundant network, uplink and all that jazz (otherwise you're wasting time and money).
the reason this won't ever be a product is because if you're serious about your redundancy you're installing extra NICs inside the servers, which are ideally not second-hand. the only people who would be the target market of such a product is just you.
also: do these servers not have pcie slots inside? is there truly no way of adding nics inside?
Yes, the entire network is supposed to be redundant and load-balanced, except for some clients that can only connect to one switch (but if a switch fails it should be trivial to just move them to another switch.)
I am choosing dell optiplex boxes because it is the smallest x86 nodes I can get my hands on. There is no pcie slot in it other than m.2 SSD slot which will be used for SSD.
Get a 10GbE nic and OpenVswitch
Why not just use a separate switch and wireless AP for redundancy? Wi-Fi can be your backup if your wired switch goes down. Assuming your Dells have Wi-Fi cards, that is.
Personal Private Cloud then all the money spent on a truly redundant network back-bone is far better spent on just about anything else. Especially if your solution relies on notoriously unrealiable USB nics.
If redundant everything is important then you need to change your planning toward proper rack servers and switches, only way to get a cost effective reliable redundant setup. Use CEPH rather than RAID and ideally, if your workloads are runnable as containers, use a solution that allows you to abstract away the whole server concept as well. Hyper-converged infrastructure is excellent for cost effective setups.
If redundant everything is important then you need to change your planning toward proper rack servers and switches
I ain't got that budget man.
Then skip redundant networking and use that cash on the router hardware such that you get enough ports that you won't need a switch
It is for a challenge, the goal is to build a cloud with workload decoupled from servers decoupled from users who'd deploy the workload, with redundant network and storage, no single choke point for network traffic, and I am trying to achieve this with a small budget.
Ah okay. If you don't have any PCI slots available then USB is going to be the only option then. Though if you have WiFi you could use that as your fail over solution.
Basically, from a hardware standpoint, I'd say a minimum viable solution would look like (no wifi):
2 cheap 4-6 port Switches. Preferably with an uplink port and then you can mock redundant ISP connection by putting a splitter on the WAN connection and putting it into each of the two switches.
3-4 worker nodes each connected to both switches. Preferably a PCI nic but if that is a non-starter then USB. Though here is where pretty much every cent should be spent. Important that each has a minimum of 200 GB storage because that enables you to abstract storage into a CEPH pool and still have enough space to run some decent example workloads.
All other redundancy can and should be handled in software. OpenStack is one approach, Proxmox can probably also work.