this post was submitted on 19 Jul 2024

632 points (98.5% liked)

Technology

62127 readers

6971 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

632

An angry admin shares the CrowdStrike outage experience (www.theregister.com)

submitted 6 months ago by [email protected] to c/technology

136 comments fedilink hide all child comments

IT administrators are struggling to deal with the ongoing fallout from the faulty CrowdStrike update. One spoke to The Register to share what it is like at the coalface.

Speaking on condition of anonymity, the administrator, who is responsible for a fleet of devices, many of which are used within warehouses, told us: "It is very disturbing that a single AV update can take down more machines than a global denial of service attack. I know some businesses that have hundreds of machines down. For me, it was about 25 percent of our PCs and 10 percent of servers."

He isn't alone. An administrator on Reddit said 40 percent of servers were affected, along with 70 percent of client computers stuck in a bootloop, or approximately 1,000 endpoints.

Sadly, for our administrator, things are less than ideal.

Another Redditor posted: "They sent us a patch but it required we boot into safe mode.

"We can't boot into safe mode because our BitLocker keys are stored inside of a service that we can't login to because our AD is down.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 201 points 6 months ago (2 children)

Pity the administrators who dutifully kept a list of those keys on a secure server share, only to find that the server is also now showing a screen of baleful blue.

Lol, can you imagine? It empathetically hurts me even thinking of this situation. Enter that brave hero who kept the fileshare decryption key in a local keepass :D

[–] [email protected] 129 points 6 months ago* (last edited 6 months ago) (1 children)

That's why the 3-2-1 rule exists:

3 copies of everything on
2 different forms of media with
1 copy off site

For something like keys, that means:

secure server share
server share backup at a different site
physical copy (either USB, printed in a safe, etc)

Any IT pro should be aware of this "rule." Oh, and periodically test restoring from a backup to make sure the backup actually works.

[–] IphtashuFitz 38 points 6 months ago (1 children)

We have a cron job that once a quarter files a ticket with whoever is on-call that week to test all our documented emergency access procedures to ensure they’re all working, accessible, up-to-date etc.

[–] [email protected] 8 points 6 months ago

Are you hiring!?

[–] kescusay 64 points 6 months ago (1 children)

Seems like an argument for a heterogeneous environment, perhaps a solid and secure Linux server to host important keys like that.

[–] [email protected] 55 points 6 months ago (4 children)

Linux can shit the bed too. You need to maintain a physical copy.

[–] [email protected] 52 points 6 months ago

Their point is not that linux can't fail, it's that a mix of windows and linux is better than just one. That's what "heterogeneous environment" means.

You should think of your network environment like an ecosystem; monocultures are vulnerable to systemic failure. Diverse ecosystems are more resilient.

[–] [email protected] 28 points 6 months ago (1 children)

Sure but the chances of your Windows and Linux machines shitting the bed at the same time is less than if everything is running Windows. It's exactly the same reason you keep a physical copy (which after all can break/burn down etc.) - more baskets to spread your eggs across.

[–] [email protected] 9 points 6 months ago (1 children)

Very few businesses are going to spend the money running redundant infrastructure on two different operating systems. Most of them won't even spend the money on a proper DR plan.

[–] [email protected] 26 points 6 months ago (1 children)

Then they get to suffer the consequences when shit like this happens

[–] [email protected] 6 points 6 months ago

Then they get to suffer the consequences when shit like this happens

Oh, they are.

[–] noobface 12 points 6 months ago

Hey Ralph can you get that post-it from the bottom of your keyboard?

[–] StaySquared 4 points 6 months ago (2 children)

CS did take down Linux a few years back.. I forget the exact details.

[–] Avatar_of_Self 4 points 6 months ago

Yes, but has it taken both OS' out at the same time? It hasn't but it could happen, however, the chances are even less. There's obvious risk mitigation in mixing vendors in infrastructure for both hardware and software in the enterprise.

If some critical services were lost in your enterprise last time until RH updated their kernel then you could have benefitted from running that service from Windows as well. Now the reverse is true. You could have another DC via Samba on Linux in your forest if you wanted to, in order to have an AD still for example. Same goes for file share servers, intermediary certificate servers (hopefully your Root CA is not always on the network) and pretty much most critical services.

Most enterprises run a lot of services off of a hypervisor and have overhead to scale (or they are already in a sinking ship), so you can just spin up VMs to do that. It isn't as if it is unreasonably labor intensive compared to other similar risk mitigation implementations. Any sane CCB (obviously there are edge cases but we are talking in general here) will even let you get away without a vendor support contract for those, since they are just for emergency redundancy and not anywhere near critical unless the critical services have already shit the bed.

[–] [email protected] 1 points 6 months ago

Sounds like we may have an easier conclusion to draw here