These stories are originally posted over the past decade on Reddits TalesfromTechSupport so I am copying over to Lemmy to help bring some life into this /c/
Some of you know I work for an ISP in a land down under. This incident took place a few ~~months~~ years ago when Apple ios 7.1 came out
Just got back from lunch one day and one of our layer 2 wholesalers call up to log a "fault"
Me: G'day slazer speaking
ResellerIT: Hi mate, I am wanting to log a speed fault with one of our private schools.
Me: no worries mate. What school?
ResellerIT: RegionalPrivateSchool. Your favourite one, they are only getting really high latency and between 5 to 10mb/s
damn it not those guys again
Back story. When this school went live their hardware firewall had a bug where after x amount of data was pushed, it could only do about 20mb/s in either direction.
Me: Considering previous problems with that school have they rebooted their firewall?
ResellerIT: Yes, odd thing happened though, when the firewall came up it ran at the 100mb/s for about 10 - 15 min before dropping back again.
Me: Odd, let me check it out.
I log onto the radio and see the school usage is bouncing between 80 to 100mb/s.
Me: Mate, have you looked at their current usage?
ResellerIT: No, why would I?
Me: Just look. You will work it out.
ResellerIT: Bugger me, that's quite a but of usage. I'll take it from here, sorry to call you mate.
/call
I kept the radio screen open in the background in case he called back and went back to my "active internet monitoring" AKA Reddit while listening to LRRLive on Twitch.
A few hours later I get an email from my boss asking what is happening at RegionalPrivateSchool, he got a call from the account manager. The only time the account manager gets involved is when he isn't getting in info out of his IT team (ResellerIT).
I flicked him an email back recapping my chat with ResellerIT and look at the radio it is still flatlining 80mb/s both ways.
I decided to take a look as to why a school with no students in it is still using 80% of their bandwidth in both directions. So I run the SuperSecretSexySpecial command on the radio that shows the top 20 source and destination IPs along with packets per second in real time.
When looking at the SuperSecretSexySpecial output I do some reverse look ups on the addresses. The school seemed to be pulling an arse tone of traffic from the local Akamai cache and pushing just as much up to addresses that map back to dsl services.
I start thinking, why is the school doing so much data? First thought, second Wednesday of the month Windows updates. But then I thought surely a school should run WSUS in case a bad patch comes out. As for the upload maybe some of the staff have discovered torrents aren't blocked on the firewall and let them run overnight.
I shoot my findings though to my boss, the account manager and ResellerIT. I include in the email that this is all speculation as well as some pointers for fixing it they can pass onto the schools IT guys. I get an email back from the account manager with some comments from the schools IT people saying they don't run windows, it is an Apple school and they are already running the apple version of WSUS. They also boasted that their school was one of the ipad trial schools. 1,300 students all with ipads, my second worse nightmare.
Then I remembered what my work iphone did this morning and an article I was reading at lunch, ios 7.1 for iphone, ipad and ipod came out a few days ago and we all know what happens next. The flood of app updates.
I decided to call the school and talk with their IT guys about running some tests for me. First step was to remove the apple update server network cable. When he did, the traffic dropped back from 80mb/s both ways to about 15mb/s. I asked them to plug the server back in and surely when it came back online the usage started again.
At that point I speculated that the student devices are calling back to the school to get the ios7.1 update and any apps that also require updates.
The following Friday I get an email from the account manager, thanking me for helping with the issue at the school. It turns out I was spot on with the student devices calling back to the school for app updates. After the schools IT guy reconfigured the apple server their speed tests were back up to 100mb/s both ways and sub 15ms response times.
The boss was so happy with my work he let me off early on Friday with a bottle of something special.
Another tale from the the land downunder. This time for all you RF geeks. I apologise in advance if I use dB, dBm, and dBi incorrectly, I tend to use them interchangeably at work.
One of those random things I have to do is support wireless gear that our ISP sells on the side to system integrators for point to point wireless between buildings.
It is fairly easy work, we over engineer the links to perform better than the system integrators expect. This is a story about how the original engineer over engineered the link too much.
The link was installed about 6 years ago and from what I understand hasn't performed as expected.
In the office, at my desk working on how one of our transit providers fudged up their route map and was advertising our address space back to us, a story for another time maybe.
phone rings
Me: G'day slazer speaking.
Customer: Hi, its Customer from [redacted], we bought a wireless link from your firm few years ago and it has been working mostly well till last week when it fell over and we haven't been able to get it back.
Me: Ooooookk, let me grab your details and I will give it a crack.
Customer: The box in the rack says Redline AN50E and the link light is off.
Me: all right, do you still have management access to the radio?
Customer: I do on this side, not the remote site obviously.
Me: Makes sense. on the status page what are the RSSI and SNR values
Customer: RSSI says -86 dbm for all 3 values and SNR is 0 dBm
Me: Is the other end powered on?
Customer: Yes, the guys in the other office can login to the management as well.
Me: That's good, can they tell you the values on that side too?
hold music starts
Customer: They are seeing the same values.
damn
Me: Do you mind if we come down and have a look?
Customer: No worries mate, just ask for me at reception.
I make a list of kit we will need for the job and "delegate" it to my minion to load into the van and we head out.
We get to site and Customer shows us around the master end of the link. I spot the first of many problems. The ethernet is running in half duplex mode (may account for their poor performance.) and the radio is running at 20dB transmit power.
I turn to Customer.
Me: have you played with any of these settings?
Customer: When it was originally installed the tech said if we have any problems with the link we should turn the transmit power up to 20.
I stare blankly at him for a few seconds before double checking I'm not going insane. I make note of the usual misconfiguration suspects, frequency, channel size, encryption enabled, correct encryption key and drop the transmit power down to 1 dB. We head over to the slave end.
Most of the settings are correct, with the exception of transmit power, again it is running at 20dB. I drop it back to 1dB and see the SNR come up above zero for a few seconds before disappearing.
We do a test on the indoor coax cable going to the roof and see no RF coming back down the cable. Damn a faulty outdoor unit. So we head up to the roof and see what we can do about the outdoor unit.
I let my minion and Customer go up the ladder first and as I pop my head out of the roof access hole I see a disaster.
The original tech installed a 60cm panel for a rf link which is no more than 50M. Rf geeks will know why this is a disaster. 20dB of transmit power along with a 28dBi antenna, no way that is legal in Australia.
We swap out the outdoor unit on the slave site, because we were on that side, and as soon as we plugged in the new outdoor unit it started chirping away with its alignment buzzer saying it has the maximum modulation.
Me: That's not good.
Minion: What do you mean? The link is working with this new outdoor unit, so we found the faulty part.
Me: Yea, but where is the antenna connected at the moment?
Minion: In the faulty unit.
Me: Yes, so with 1 dB transmit power on both end and only one 30cm panel on the master side we are forming a link.
Minion: So?
Me: What do you think will happen when we attach the 60 cm panel and put the transmit power back to 20dB?
Minion: It will get saturated and the link will fail.
Me: Yes, so all the drop outs they are talking about is because the link was overengineered too much.
We reattached the panel and looked at the management RSSI -36dB, SNR 30dB.
Me: That has sorted it.
phone rings, Customer comes up on caller ID
Me: Hi mate, we got it back up, how is it looking?
Customer: The link light is on, but I cant ping across the link.
damn it Rf is up and talking but no traffic is passing, the encryption key must be wrong. I get him to correct the encryption key and his traffic starts flowing again.
I confirm the modulation and transmit power are ok and head back over to the master end to talk with Customer.
Me: The outdoor unit is most likely to have burnt out because the RF levels were too strong.
Customer: I notice now when I put the transmit power to 20dB the link goes offline.
Me: Never change that value to above 1 ever
Customer: Ok then. The speeds are better, before it was running between 6 and 12 Mb/s now it is saying 54Mb/s
Me: Yes, because of RF magic we turned the signal power down to get a better signal.
Customer: I'll accept that.
And with that done, Minion and I went back to the office.
Context for those who aren't in the RF world. Imagine having a conversation with someone across an alleyway with one person shouting at the top of their lungs and the other using a megaphone. At some point hearing damage kicks in.
When using new wireless kit, never assume the vendor knows what they are doing, most of the time they do not know what the local laws regarding wireless equipment even are. We have some vendors ignore standards while others follow the standard so closely the kit becomes unusable.
We installed a new 900Mhz radio to a customer who was in a particular bad spot. All seem well, the customer was getting the speed over the wireless and the latency was rather good.
A few weeks after install I get a call from the customer.
ring ring
Me: G'day slazer speaking.
Cus: Hi, this is [manager] calling from [customer] we have a guy here saying the radio on our roof is interfering with [national mobile carrier] in the area.
Me: Ooook, that doesn't sound good. Can I talk with him?
Cus: Sure. I'll shoot the call down to reception where he is.
call transfer
Me: G'day this is Slazer, we run the kit on the roof, what is the issue?
CarrierTech: This is CarrierTech from [contracting firm] we have been sent out by [national carrier] to find out why their customers are experiencing call problems in this area.
Me: I see, is [Cus] still hanging around?
CarrierTech: Yes,
Me: Sweet, I need to have a quick word with him and we can sort this out.
Phone passed back to Cus
Me: Hi mate, Thanks for calling us. We will handle everything from here and you wont have to do anything.
Cus: Ok, sounds good, I will pass you back to CarrierTech
Phone ping pong finishes.
Me: Right mate, lets get this sorted. What are you seeing and how can we resolve it.
CarrierTech: I noticed the radio on this roof and our kit is saying it is running in the 900Mhz band. What brand and model is the radio?
Me: It is a Ubiquiti Nanobridge M900.
CarrierTech: Is the firmware up to date and you are running in the Australian country code?
Me: Yes.
CarrierTech: Ok, so it looks like it currently isn't complying with Aussie rules because it is sitting in the middle of the 900Mhz band assigned to [national carrier].
Me: Not good, What is there band?
CarrierTech: [freq band]
Me: Yea, we are sitting in the middle of that, luckily this is a backup link so I can mess with it during business hours. Let me lock out those frequencies and reboot the unit.
few min later
Me: Ok, I have gone as far away as I can from their band, how is it looking?
CarrierTech: I will have to check from outside. Can I have a number I can call you back on?
Me: sure, [insert company number]
CarrierTech: OK, I will call back a little later.
About 20 min later he calls back.
CarrierTech: It looks like that has cleared up the problem. Where does this link go back to?
Me: [insert address from city 10Km away]
CarrierTech: sigh I spent the entire day there yesterday chasing down the same problem and narrowed it down to that street. I should of started at this end.
Me: Well, my apologies mate, I will have to get in touch with the vendor and get this fixed for the next firmware release.
CarrierTech: Yes. I am sure [National Carrier] will also push them and the ACMA about it.
Me: On that note. I assume because the problem is fixed we won't be getting a call from them?
CarrierTech: No, if they complained to the ACMA it would be 6 months before they could do anything about it.
Me: Sounds about right for a government department, just out of curiously how many sites were affected by this?
CarrierTech: About 20 to 30 sites.
Me: wow, now I am really glad you called us first.
insert ending formalities
/End call
I let the boss know what happened and he was glad how it worked out.
Last time we had a run in with the ACMA it ended badly for them, but that is another tale for another time.
This story is a few tears old, but I'll try to remember all the fun parts.
Back then I was working with a company that among other stuff also outsourced telephone services to customers. So they would get their phones from us, all the infrastructure, we did all the technical stuff with the ISP, got everyone their extension and call groups etc.
We had only a hand full of customer who used this service from us, but or company itself of course also relied on it.
Most parts of the infrastructure were customer specific except one. The main entrance/exit server (+backup) into/out of our datacenter. But for our cause, they were so oversized, that no amount of traffic would even be closely able to bring them down. (Or were they)
On usual days we would handle maybe 50-100 external calls simultaneously. Cause remember, those servers were to the outside. All other traffic would not touch them. The servers were (according to the specs) able to do 4000 simultaneous calls.
To the day of the incident. It began around 8 in the morning. We would get a few incidents reporting calls not being established, which we brushed off at first, cause it was more probable that the other site was at fault.
Later one of our customers also opened up incidents reporting this in mass. At this point, we were getting a little worried and looked into the logs. What we found was not fun. Much to our dismay, we saw that we had around 7000 simultaneous calls trying to bomb our system. Most of which were trying to reach one specific customers call center.
After a while we found out that this customer had a countrywide mandatory survey they didn't tell us about. For this survey an external call center was hired to handle all the calls.
We hopped into a call with them and found out a few things: They were expecting about 15-20k calls a day, and their contract said something about "up to 2k" and when questioned, how that would work, they told us about a specific rule in their contract with their ISP. This rule meant that all calls above the 2k limit would get a "number is busy" kinda answer and had to wait or hang up.
We called the ISP. They just told us (and the customer in the same call): "Yeah, we sell that feature, but that doesn't really work and mostly isn't even used..."
So the ISP broke their contract but were to big to fail and the customer didn't tell us enough, but was angry our stuff didn't work.
End of the story was, that we rerouted all the calls directly to the call center and then the call numbers dropped back to a few hundred.
Edit: Survey was mandatory.
My incident over ~~2~~ 9 years ago involves the federal regulator making impossible claims.
Working in the wonderful world of Wireless Internet Service Provider (WISPs), you get those calls once in a blue moon that makes you question everything.
phone rings
Me: G'day, this is slazer.
Caller: Hi, this is Fred calling from the ACMA (the Aussie version of the FCC). Can I talk to your senior radio engineer please.
Me: We don't have one, but I am the senior network engineer. I will do what I can do help.
Fred: Ok, I am at [site] and we are detecting some interference on the local council 80Mhz band and we believe your equipment is responsible.
Me: I am sorry, run that by me again.
Fred: We believe the equipment operated by your company on [site] is interfering with the local councils 80Mhz emergency push to talk system.
Me: Ooook. That sounds impossible our equipment is running at 5Ghz. How did you get to that conclusion?
Fred: Well, we have shut down all the other wireless operators on the tower but the interference is still there. In your cabinet there is what looks like an amp which takes up about the bottom 6RU. Would you be able to turn that off?
Me: We don't have an amp in our cabinet. That is our UPS in case there is a power outage.
Fred: A UPS? That explains why your equipment didn't go down when we turned off your breaker.
Me: It also kept beeping at you till you turned the power back on didn't it?
Fred: Yes. So is there a way we can turn your kit off so we can finish our tests?
Me: Not at this time of the day. We have clients actively using the service.
Fred: Ok, I will run some more tests and get back to you.
/call
I take down his number in case he calls back and let my minions know that if he calls put him directly though to me. I call our vendor rep, just to make sure I am correct.
Vendor: Hello this is (dude) from (vendor)
Me: G'day , it is slazer from (WISP). Do you have some time to chat, I just got off the phone with the ACMA.
Vendor: Oh boy, whats up?
Me: Well one of the ACMA "engineers" have said the kit we have installed is interfering with an 80Mhz push to talk system.
Vendor: That doesn't sound possible. If it were possible, we would have people all over the world complaining.
me: I know, just doing a sanity check. I will let you know if it turns out to be your stuff, which I doubt.
Vendor: No worries mate, thanks.
/call
I also call the boss and let him know what is going on. He has the same mind set as the vendor, impossible for us to interfere with an 80Mhz system.
A couple hours pass and he calls back.
Me: g'day mate, how did you go?
Fred: You have a radio pointed between 50 and 60 degrees off the tower, I think that is responsible for the problem.
I look up the radio in question and it is a 5.4Ghz radio.
Me: That can't be. It is a 5Ghz radio.
Fred: can you turn it off so see if the interference goes away?
Me: Like I said before I can't turn off any of our radios unexpectedly during the day, that particular radio goes to the school in [suburb].
Fred: Hmm, when can we turn it off to test?
Me: provided the school is OK with the outage, 2 weeks from now at 3AM.
Fred: Your shitting me?
Me: No, part of the contact we have with the school says we have to give 2 weeks notice for any planed maintenance that could impact their service.
Fred: But why 3AM?
Me: Because that is the time when it will disrupt the schools service the least.
Fred: There has to be a better time then 3AM.
Me: Not really, the schools nightly backup goes from 8PM till 2AM.
Fred: Seriously?
Me: Yes. I will call the school now and organise the outage. I will give you a call back when I have confirmed everything.
/call
I organised the outage with the customer and kept everyone in the loop.
Outage window came along and I got a call from Fred.
Fred: How far off are you?
Me: I am ready to go.
Fred: Eh? Aren't you meeting us here?
Me: No, why spend 2 hours travelling up there at night when I can do it from the comfort of my home?
Fred: OK, well lets get started.
I turn off all the radios except the the one I am using to log into the site via.
Me: They are all off except one, how is it looking?
Fred: Still seeing the interference. When you say they are off, I am still seeing the same amount of lights on your gear in the hut.
Me: I have turned off the radio unit on the outdoor unit. So at the moment all our radios bar one are not transmitting.
Fred: Which one is on?
Me: Our backhaul, if I turn it off I wont be able to turn it back on remotely. What I can do is bounce it. Have are you looking at your kit?
Fred: Yes.
I reboot the final backhaul radio.
Me: OK, you have about 2 min before it comes back online. How is it looking?
Fred: No different...... What in the world is causing this interference.
Me: No clue mate, we operate in the 5Ghz band. Seeing as you haven't found anything I am going to turn our kit back on now.
Fred: but we haven't finished testing yet.
Me: Yes we have, all our kit was off and you said there was no difference in the interference.
Fred: It must be your kit. It is the only unlicensed kit in the area. Everyone else is using licensed spectrum.
Me: ............. I would ask how you came to the conclusion of they don't use licensed spectrum so they must be the problem, but it is 3AM and I would like to go back to bed.
Fred: But we aren't done yet.
Me: Yes, we are. Good night.
/call
I turn on our equipment again and write up a report for the boss, then return to bed.
A couple days later, we received a warning notice from the ACMA about the events that transpired. Sadly, this is where my part in the story ends and the boss picks it up.
After several back and forth between the boss, our lawyers, and the ACMA rep. The warning is withdrawn and the 80Mhz kit gets moved to another tower a couple hundred meters down the road only to run into the same interference problem.
I don't know if they ever fixed the problem, it has been a few years and it doesn't bother me.
These stories are originally posted over the past decade on Reddits TalesfromTechSupport so I am copying over to Lemmy to help bring some life into this /c/
Sigh, I had one of those Mondays. As per the rules all names are replaced to protect the identity of the stupid and ill informed.
Some auzzie slang/humour may come off as offensive, I apologise, its just how we roll in the land down under
Back story, I work for a fixed wireless ISP. I deal anywhere between integration firms and the onsite IT bloke. This particular incident took place at the HQ of a multi site medical center group
Get a call at 6:30AM
Me: G'day slazer speaking.
Customer IT guy (Lets call him Steve): Hi mate, its Steve from Medical Group our head office is offline at the moment. We had a really bad storm go though last night, it may just be power but can you guys be on stand by just in case?
Me: nyaaa, all right. I'll do my usual morning stuff and get into the office asap. Can you check out HQ and let me know?
Steve: no worries mate.
2 min later
imessage from the boss: slazer, the HQ of Medical Group is down. whats going on.
imessage to the boss: Just got off the phone with their IT bloke and he is going in to checking power. I'll get to the office early and prep our spare radios.
no reply from the boss.
[insert usual morning stuff of shower, shave, and shi....]
While driving to the office I get another call from Steve
Me: G'day Steve, how is it looking on your end?
Steve: Well, we have lost a UPS and a switch to last nights storm, you may of lost your radio though, there is no up-link light on your Cisco NTU.
Me: bugger, I guess you have tried power cycling it?
Steve: Yea, the light is on the power injector but no light on the NTU. Our sparkie (aussie slang for an electrician) is coming in to check everything else is OK, I'll get him to check your cable too.
Me: Cheers mate, I'll get a spare radio configured and head straight up to you.
3 accidents on the motorway D: a normal 45 min trip takes 2 hours but I get there eventually.
CIT: You took your time mate.
Me: Traffics a female dog.
CIT: Fair call, the sparkie had a look at the run from the server room to the radio on the roof, he said everything is fine. where do you want to start?
Me: Well lets make sure the POE injector is OK first.
We head to the server room and I notice there is no light on the POE injector. I do the usual troubleshooting and the light on the POE will only stay on while the cable to the radio is not plugged in. I check the injector by plugging in the replacement radio, lights stays on and the radio turns on and starts squawking while it searches for a base station to connect to. The port on the NTU also comes on ruling out the POE and NTU as the cause of the fault.
Me: Well the problem is not down here. Lets go for a sticky beak on the roof.
just as i finish saying the sentence, the sparkie appears out of nowhere.
Sparkie: Everything is fine on the roof, I have checked the cable and the radio is powered up
Me: ......... its not that I don't believe you, its just that..... no bugger it, I don't believe you.
Sparkie: hmmf
the sparkie walks off.
Steve: Little rude there mate? Me: Only because he lied.
Stevelooks confused
Me: By how the light was behaving on the injector, there is no way everything is fine.
Steve: Fair enough mate, let me know what you find.
He goes back to checking the servers and I head up to the roof alone. Once I get onto the roof I notice there is no light on the bottom of the radio...
I remove the waterproof bung and saw the rj45 head had been...... I don't have a word that will get passed the profanity filter for how the head looked.
Now, I have seen RJ45 heads shorted before from either over voltage (doing 54v to a 24v device) or water getting into the bung but nothing this bad.
It takes me a moment to collect myself and I begin repairing the cable. YAY for service loops!!! I install the replacement radio and get off the roof to make sure the customer is back online.
Warning: PUT YOUR DRINKS DOWN BEFORE OPENING THE PICTURE
I find the Steve in his "office" (read cubby hole)
Steve: back online are we? Good, What was the problem.
Me: May wana get the sparkie in for this.
Stevelooks confused, but pages him to his "office".
Sparkie: Whats up?
Me: When you said you checked the cable, what did you do?
Sparkie: I put a RJ45 tester on both ends and it tested OK.
Me: Again, I do not believe you. Tell me, how did you "test" this?
I gave both Steve and the sparkie a moment to collect their jaws from the table.
Steve: You can go slazer, thanks for getting the connection working. May I keep that head?
Me: Sure mate, I have a pic, that is all we require.
I am not sure if I will find out what happens with that sparkie, but I doubt I want to. On the bright side, because I had to travel before 7AM the company paid for my breakfast :D
To those of you who saw the pic before my warning of putting your drinks down, I am sorry. For those of you who blandly ignored it.... well, I am still sorry, but you were warned.
Update Time
So it turns out the sparkie vocabulary is smaller than both myself and Steve thought most sparkies have. When he was told to check the cable going to the radio on the roof he thought they were talking about the Wifi Access Point on the 3th floor.
His reasoning: Because ground, 1st and 2nd have floors above them they have ceilings. 3rd floor is the top floor so it is not a ceiling, it is a roof...... I'll let that logic sink in for the rest of you too.
Some background, I work full stack while we also man the support email from users. I'm manning the support email this week, but today I was also tech support for a fellow developer.
We use HP docks to connect everything from screens to keyboards. But today a dock would not do anything when my colleague attempted to use it.
Being the nosy kind, I went and asked the usual
- Did you reboot?
- Did you remove the power to the dock?
- Try messing with the drivers?
- lock the screen before unplugging?
- Tried another dock?
All yes, none worked. Our IT support hadn't opened for the day yet and he was looking into updating the specific dock driver.
So I asked, did you try the other USB-C port? And what do you know, that worked. Then he just plugged right back into the first USB-C port and everything was back to normal. I don't know who made the drivers, but it's pretty danning when they can brick a specific USB port until it's forced to redo whatever config that messes it up, by using another USB port...
If anyone wonders, the docks have a magnetically joined charging and USB plug, so it's fairly natural to plug them in together side by side. It's also almost uniquely a dock issue and not a dead USB port, so it's funny that the enite thing uncloggs from just using another port for a second. But a reboot does not...
This is a more recent story while working for an MSP in Europe compared to my time working for an ISP in Australia
the cast:
Me: Slazer
OT: Other Tech
I get a message on slack
OT: Hey, I am seeing something weird in the French office for customer, can you help me look into it?
Me: Sure
Queue the Teams call.
OT: So all the Access Points in that office are reported as offline in cloud.vendor.com portal but the customer is not reporting an issue.
Me: Ok, that is odd. What is the monitoring system saying?
OT: Monitoring says everything is OK, I can ping them and do SNMP calls to all the AP, they are just reporting as offline in the portal.
OT: The other thing is the firewall says the AP are trying to access cloud.vendor.com but the local in policy is denying the traffic.
Me: That is rather strange.
I log into the firewall and check the logs and see the APs are in fact trying to access cloud.vendor.com but the destination is 255.255.255.255. Not the expected IP from the vendors documentation.
Me: Well I want to say it's a DNS issue what happens when you reboot the AP?
OT: Rebooting from the portal doesn't work but I rebooted on from the switchport and the same thing happens.
Me: Is the on prem DNS server working?
OT: Yea, the domain controller is the DHCP/DNS server and it has no issue with access, the customer hasn't reported connection issues. It looks to be just the APs.
Me: Ok then, are they being allocated the right DNS servers?
OT logs into the domain controller and everything is looking good.
Me: dafuq?..... Wait, do these even use the DNS server from DHCP or do we set one via the device template?
OT: Not sure, never had this happen before. When we provision these they are plug and play.
I log into the vendor portal and start poking around and notice all the APs have the same DNS server of 208.67.222.222 (OpenDNS)
Me: Ok, well the AP aren't using the local DNS server they are using openDNS. Lets start a packet capture to see what is going on.
I setup a packet capture on the firewall and limit it to the IP of the AP we are looking at and let it run for a bit and crack open the capture in Wireshark.
I just start laughing at the error
OT: I know that laugh, what did you find?
Me: what do you make of this error?
Every single DNS query had this as the response.
The OpenDNS service is currently unavailable in France and some French territories due to a court order under Article L.333-10 of the French Sport code. See https://support.opendns.com/hc/en-us/
OT: Wha????
Me: Yea.... Now for the hard part.
OT: Hard part?
Me: How do we fix this? There is no ssh logins to the AP, we can't push config because the devices are offline according to the portal, and there is no way we are getting console to each of those units.
OT: I see.
Then the dumb idea occurred to me.
Me: I have a dumb idea. We DNAT any traffic destined for OpenDNS to Googles DNS so we can reconfigure the units to use the local DNS servers.
OT: Would that work?
Me: It should.... I hope.
We then setup DNAT for the AP specifically to rewrite the DNS request destined for OpenDNS and forward it to Googles DNS.
After activating the config we start seeing the devices come online in the portal as if nothing happened to them.
OT: Hey, it worked.
Me: omg, it actually worked...
I am somewhat sill shocked it worked.
At some point I will get some time to clean up that DNAT and finish reconfiguring the APs.
This is a repost of a story I posted on Reddit a few years ago.
Story participants
Me: Slazer
Boss: the boss
T1: Tech 1
T2 Tech 2
Backstory
The boss is all about redundancy and backup. If he finds a single point of failure that I have missed he lets us know and sets a time frame for when he wants it resolved along with a when the failover testing should be done. Because an untested backup is worse than no backup.
To spare the boring BGP details
We have 2 data centres in our closest state capitol. With transit multihomed transit through a single level 2 carrier (while not true multihomed we have transit of last resort through one of our layer 2 customers).
One day the boss arrives in the office around 10:30 AM after being in a huff about hearing of a major outage in a competitors network.
Boss: Slazer, did you get our traffic balanced over our 2 transit paths like we discussed a while ago?
Me: Yes, DC1 advertises prefix 1,3,5 and the aggregate. DC2 advertises prefixes 2,4,6 and the aggregate.
Boss: What happens when one of the transit fails?
Me: I am advertising the DC2 prefixes out DC1 with the backup BGP community. Then doing the same thing for DC1 prefixes over DC2. In the event of a transit failure the upstream has a backup path ready to go. Boss: and it works?
Me: Yes, last time I tested it was about 2 or 3 months ago and it failover over correctly.
Boss: Why haven't you tested it sooner?
Me: RANCID hasn't reported a configuration change since the last test. I only test it if there has been a config change on and of those routers.
Boss: But how can you be sure it still works?
Me: Shall I force a failover now to show it works?
Boss: Sure. (which I assume he said with sarcasm)
Me: Starts logging to DC1 core router
T1 seeing me do my configuration change face.
T1: If you are doing that I am going for a break.
I shutdown our transit interface for DC1 and wait for BGP to time out.
After about 10 min with no calls the boss turns around and continues the conversation.
Boss: So when will you be testing the failover?
Me: We are, right now.
Boss: What??!! as his face drops.
Me: You agreed. Plus this way now you know for sure it works because the phones haven't started ringing.
T2: Slazer is right. The graphs show how an increase in traffic on DC2 transit.
Boss slides over to T2 desk. Sure enough, the graph for DC1 transit is reading zero traffic and the graph for DC2 is showing all the transit traffic for the state.
Boss: That doesn't looks like much traffic.
Me: Only about 20-30% of our traffic goes via Transit, the rest goes via the various IXs we are on.
Boss: Who don't we get via the IX?
Me: Customers of our transit provider who aren't on any IX, Telstra and Optus as they aren't on any IX, and any international site that doesn't use a CDN.
We continue discussing for a good 20 - 30 min about where we get various traffic from and further redundancy in the core networks. During which time T1 returns from his break.
T1: Phones are quiet?
Me: Yes.
Boss: Can you turn the DC1 transit back on?
I walk back to my desk and turn the transit interface back on and see the BGP peer back on. While T2 and the boss are watching the graph for DC2 transit it drops about 2/3 of traffic and that appears back on DC1 transit.
And from that day the Boss hasn't asked about the transit failover because now he knows it works.
User: I need a new ssd to expand the storage on my laptop
C: Sure, for that laptop you need to submit an M.2 installation ticket.
User: No I need an ssd.
C: Yes sir, that is an ssd, it's just the form factor. Go to [company's internal website] and submit a request for an M.2 ssd installation.
User: They have 2 models to pick from, which do I choose?
C: The M.2.
User: ...Is that an ssd?
C: ...Let me just show you.
Dealing (primarily) with other techs
Rule T1 - CYA
Rule T1A - Always have someone else to blame it on.
Rule T2 - Never lie to another tech.
Rule T2A - Unless that tech is the person you're about to blame. See Rule T1A.
Rule T2B - Sometimes you will need to lie in order to deal with things like warranty repairs or getting ISPs to do the right thing.
Rule T3 - Never assume anything.
Rule T3A - Does the issue even exist?
Rule T3B - Is it even plugged in?
Rule T3C - Is it turned on?
Rule T4 - Don't expect your boss or coworkers or users to understand just what it is that you do.
Rule T4A - Even if they are a tech.
Rule T5 - Sometimes, you will be the one who is wrong.
Rule T6 - Don't try to do work over the Internet while in a moving airplane.
Rule T7 - Never call support with your cellphone if you can help it. Otherwise, you won't be able to drop the problem in someone else's lap.
Rule T8 - You will really screw up eventually and it is going to be a doozy.
Rule T9 - Backup following the Rule of Three. A backup, a copy of the backup, and a copy of the copy. Test them.
Rule T9A - Consider using other backup strategies. See Link TL1(https://www.unitrends.com/blog/3-2-1-backup-sucks).
Rule T9B - There is no backup. If there is a backup, it is either corrupt or years out of date.
Rule T9C - If you can't restore from it, you don't have a backup.
Rule T9D - If you haven't tested your backup recently, you don't have a backup.
Rule T9E - A year ago is not "recently".
Rule T10 - Assume that there are also inside threats, even inside IT. It's not paranoia if they really are after you (or your stuff).
Rule T10A - Don't trust your coworkers. They might be using Rule T2A.
Rule T10B - Don't even trust yourself. One error and you might cause serious damage or become a security leak.
Rule T10C - The new member on your team will send critical sensitive information to anyone who asks without trying to do any verification.
Rule T11 - When you need tech support, the tech support person is likely to be clueless.
Rule T11A - Whenever you have a problem, you will be unable to find a solution until just before the tech you called for help arrives.
Rule T11B - If the tech you called in isn't clueless, then you were and your problem has an obvious solution that you completely missed that they will point out seconds after they arrive.
Rule T11C - If none of these apply, the solution will be something random that will make no sense whatsoever to you or the technician.
Rule T12 - Every tech has their own set of Rules, even if they don't know it.
Rule T13 - Every tech is also a user.
Rule T13A - Techs will treat you like you are a user.
Rule T14 - Make sure your coworkers don't make changes before going on vacation.
Rule T15 - No technical person reads all of the rules. They will act like they know them until the place catches fire, then complain about incomplete documentation.
Rule T15A - Especially if it was the documentation that went up in flames first.
Rule T16 - Womprats aren't much larger than two meters.
Rule T17 - Third-Party IT will make configuration overhauls without notifying your company's IT department, and then blame your company for problems caused by their configuration mishap.
Rule T18 - You are incompetent. You just don't know it. At least, that's what your replacement will think.
Rule T18A - You will have to deal with techs who are incompetent.
Rule T18B - Sometimes, you really are incompetent.
Rule T19 - You might find people who support you. Reciprocate.
Rule T20 - Always verify who you are corresponding with. This includes not using Reply All.
Rule T21 - Use your inner laziness to do the most elegant solution possible.
Rule T21A - Know the difference between "truly lazy" and "plain laziness".
Rule T22 - If nothing seems to work, reboot.
Rule T23 - Cables can and will be used as ropes.
Rule T24 - Other techs will never read the manual.
Rule T24A - Neither will you.
Rule T25 - Your fellow techs will expect you to be their tech support.
Rule T26 - A tech will install equipment in dangerous environments.
Rule T27 - Third part IT will remove equipment and not tell you or the user.
Rule T28 - The biggest enemy of good IT is that they are outnumbered by lazy IT.
Rule T29 - Grow a beard so that people don't recognize you.
Rule T568A - white green, green, white orange, blue, white blue, orange, white brown, brown
Rule T568B - white orange, orange, white green, blue, white blue, green, white brown, brown
Rule T1000 - Buy stock in Boston Dynamics but sell all of it before 2029.
GitHub: https://github.com/r/morriscox
Rules of Tech Support
Rule 1 - Users lie.
Rule 1A - It may not be malicious or willful, but Rule 1 is always in effect.
Rule 1B - Users assume you don't know they are lying.
Rule 1C - Users continue to lie as a result.
Rule 1D - When caught in a lie, users get angry.
Rule 1E - Users lie even when they aren't users.
Rule 1F - If they are not lying, then they are wrong.
Rule 1G: Accept that you will eventually have to lie to get the user to do what you need them to do.
Rule 2 - Explain everything as simply as possible.
Rule 2A - There is no language simple enough to make a user understand anything.
Rule 2B - Emojis are NEVER an answer.
Rule 3 - User caused problems are caused by tech support.
Rule 3A - As it's your fault, they don't want to be billed.
Rule 3B - All issues are user issues. If there are no users, no issues get reported, no tickets get created. Ergo, it must be users who are responsible.
Rule 4 - If it doesn't work, it is your fault.
Rule 4A - If it does, you had nothing to do with it.
Rule 5 - If you take the time to visit the user's desk, the problem will magically have fixed itself.
Rule 5A - Or the solution is bound to be really simple.
Rule 5B - Or the user left the office moments after entering the ticket, and won't be back for days. How long is uncertain as these users never use their calendar.
Rule 5C - Or when they do, they won't have shared it with you or they entered an all-day event as taking an hour.
Rule 5D - The problem will be solved by doing something you already asked them to but they said it didn't work. - /u/Responsible-Slide-95
Rule 6 - All users consider their situation to be more important than others, even if they know you are helping someone else.
Rule 6A - All users want VIP treatment.
Rule 6B - But they don't ever want to pay for VIP treatment.
Rule 7 - It doesn't matter how much time the user claims something will take. See Rule 1.
Rule 8 - Users never read error messages, if they read anything at all.
Rule 8A - If a user reads an alert or error message, they don't know what to do even if they can only do one thing.
Rule 8B - The more advanced degree a user has, the less likely they are to read anything.
Rule 8C - They will give the wrong error message.
Rule 8D - If a user receives an error, when asked what it says, the user will reply: "I don't know, just an error. I closed it."
Rule 8E - "Isn't it YOUR JOB to know that?"
Rule 8F - Users will not read you the entire error code or message or will read everything else.
Rule 8G - If the user reads you the error message in its entirety, it will be irrelevant to the issue.
Rule 9 - Expect any and all jargon and technical terms (such as wireless) to be misunderstood.
Rule 9A - Expect everything to be misinterpreted.
Rule 9B - All jargon is the same to users.
Rule 9C - All jargon will be used incorrectly.
Rule 10 - About half of tech support is solving issues that are only partially related to what is supposed to be fixed.
Rule 11 - No system is idiot-proof enough to best all users.
Rule 11A - If you haven't found a user able to best your system, it's because they haven't found you yet.
Rule 11B - Nature will take as a challenge any attempt to create an idiot-proof system.
Rule 12 - There is nothing so stupid that no one will do.
Rule 12A - Stupid questions do exist.
Rule 12B - There is no such thing as a stupid question, just stupid people. Asking a stupid question identifies a stupid user and therefore the question itself is not stupid.
Rule 13 - Never believe a user who claims that there is nothing that needs to be saved. See Rule W10 and Rule W10A.
Rule 14 - Sometimes you need to trick users in order to get the job done.
Rule 14A - Sometimes you have to make people, not just users, terrified to get them to do what they are supposed to.
Rule 15 - Users care more about things working than in how you pulled it off.
Rule 16 - A user's appreciation for your work is inversely proportional to how difficult it was.
Rule 17 - If you have an accent, then you will be perceived to be in a foreign country.
Rule 18 - Never trust a user.
Rule 18A - Everyone is a user. Even you.
Rule 19 - The most intelligent person you know will be defeated by a mere computer.
Rule 19A - Even if it's you.
Rule 20 - The quickest way to find out who is responsible for something is to do the scream test. Remove that something and see who complains.
Rule 20A - If nobody screamed instantly, users may wait until it has been long enough that the thing has been thrown away and can't be recovered any more. Then you will learn that said thing was critical for some task that absolutely has to be done right now, just like every X years.
Rule 21 - Never underestimate the power of the end user to complicate things.
Rule 22 - If it looks different, then it's broken.
Rule 23 - Never give a user options.
Rule 24 - When you receive a ticket and call the user immediately they definitely won't be at their desk.
Rule 24A - If you email them they will already be on vacation.
Rule 24B - The less time that they're in the office, the more urgent their issue is.
Rule 25 - Watch out for Finagle's Law which states that 'Anything that can go wrong, will — at the worst possible moment.'
Rule 26 - Always have a small list of phrases to get users to do what you are trying to get them to do.
Rule 26A - Only share these with other techs.
Rule 27 - Don't let people know you are a tech. They are likely to ask for free tech support.
Rule 27A - Never, EVER, give out personal contact information.
Rule 28 - Sometimes, you will be the one who is wrong.
Rule 29 - Expect equipment to be placed in bad locations.
Rule 30 - It's always the printer|DNS|server|browser|connection. It's never the printer|DNS|server|browser|connection.
Rule 30A - It's always the printer. Printers are evil.
Rule 30B - Printers are evil because of users.
Rule 30C - If a document fails to print, users will keep trying just to make sure it prints.
Rule 30D - The true importance of the documents they are trying to print will be inversely proportional to the fit they are throwing.
Rule 30E - Users will mash buttons and go through random menus and do random actions until errors go away or the printer is messed up. See also Rule W84.
Rule 30F - Did you check DNS? Check again.
Rule 31 - All user provided information must be verified.
Rule 32 - If you are a female tech, users will ask to speak to a man.
Rule 32A - You will be the only one who can actually help the user even though they will not believe a girl really knows anything.
Rule 32B - You actually know twice as much as the male techs but get only half the respect.
Rule 32C - Guys will pay more attention to your looks/voice than your mind.
Rule 32D - You'll get tons of calls from men (especially if you are attractive) who will even disconnect stuff to get you to go to them.
Rule 32DD - Women will cause IT problems to keep you away from men.
Rule 33 - Just because it worked yesterday does not mean that it will today.
Rule 33A - Just because it didn't work yesterday does not mean that it won't today.
Rule 33B - Things only work when you are paying attention to them.
Rule 34 - Never refer to this Rule by its name.
Rule 35 - Updates will be both solutions and banes, usually at the same time.
Rule 36 - Sometimes, you have to nuke everything.
Rule 37 - Focus on getting things working, then on getting them done right.
Rule 37A - By hook or by crook.
Rule 37B - When things are working right, leave them alone.
Rule 37C - If something starts working, even if you KNOW what you just did shouldn't have fixed it, raise your hands in the air unthreatening-like and slowly back out of the room.
Rule 37D - You only think it's working. The real cause will wait a while and then break everything in a spectacular fashion a few months down the line. Luckily, by then it's usually no longer your problem.
Rule 37E - It will still be your problem.
Rule 38 - There's always a relevant xkcd.
Rule 38A - If you can't find a relevant xkcd, it's because you haven't looked hard enough.
Rule 38B - If there is no relevant xkcd, there is always a relevant Dilbert strip.
Rule 38C - If there is no relevant xkcd or Dilbert strip, there's a relevant entry in The Seventy Maxims of Maximally Effective Mercenaries. Link L1
Rule 38D - If you can't find a relevant xkcd, Dilbert, or Maxim, your problem does not exist.
Rule 39 - You and your work will never be appreciated since if you did your job right, none of these problems would have happened.
Rule 40 - All IT urban legends are true.
Rule 41 - If it takes TFTS to turn you paranoid, you likely haven't been in tech support for very long.
Rule 41A - You aren't paranoid. They really are out to get you.
Rule 42 - You already know the answer.
Rule 43 - Every tech is also a user.
Rule 44 - Never make changes before going on vacation.
Rule 45 - The more you specialize, the less you will remember about basic desktop functions.
Rule 46 - No technical person reads all of the rules. They will act like they know them until the place catches fire, then complain about incomplete documentation.
Rule 46A - Especially if it was the documentation that went up in flames first.
Rule 47 - Don't help anyone who is not paying you in some way as they won't take your advice seriously.
Rule 48 - Vendors will tell you that you need to upgrade to the newest version in order to fix things. If you are on the latest version, they will tell you to wait till the next version.
Rule 48A - If the problem remains reproducible on the latest version, they may tell you to downgrade. Even if you just upgraded per Rule 48.
Rule 48B - It's not a bug, it's an undocumented feature.
Rule 49 - Never assume anyone else is smarter than you.
Rule 49A - Never assume you are smarter than anyone else.
Rule 49B - A user's intelligence will always be precisely what is needed for maximum damage.
Rule 50 - Scheduled updates won't.
Rule 50A - Anything scheduled will break things, especially if you are not available.
Rule 51 - Drivers will drive you bonkers, if you can even find them. Even if you can find them they may not be compatible.
Rule 51A - Drivers are the real threat, not hardware.
Rule 51B - Drivers using hardware [heavy machinery] are also a real threat. Backhoes/diggers have a magnetic attraction to fiber optics and the drivers have an innate ability to find optical fiber. Link L2.
Rule 52 - No is the answer for every request as long as it's plausible.
Rule 53 - Treat your job like a role playing game. Link L3.
Rule 54 - Don't run stuff that you are not supposed to unless Rule 37 and Rule 37A apply.
Rule 55 - The Seventy Maxims of Maximally Effective Mercenaries are always applicable. Link L1
Rule 55A - Sometimes the applicability of the Maxims is not immediately obvious.
Rule 56 - Get to know the Dunning-Kruger effect. Link L4.
Rule 57 - You might want to consider starting the day with coffee or tea and ending with whiskey or scotch or bourbon or beer...
Rule 58 - Vendors might not follow standards.
Rule 59 - You might find people who support you. Reciprocate.
Rule 60 - When a user activates the Swedish Fish rule, they get preferential treatment.
Rule 61 - Like the military says, never volunteer.
Rule 62 - Some bugs are Heisenbugs; they can only occur if they are not being observed. Users do not count as observers.
Rule 63 - Something will be needed right after you get rid of it.
Rule 63A - Once you replace it, you will no longer need it.
Rule 63B - You will buy something and then find out that what you currently have already has what you needed.
Rule 64 - User managed projects will always fail.
Rule 64A - And they will blame you.
Rule 65 - You will complain about something and then realize that you are the one that is guilty.
Rule 66 - You will find yourself putting out fire after fire without any chance to document anything.
Rule 66A - Then get blamed for not documenting everything.
Rule 67 - Try using metaphors and analogies in addition to or instead of technical terms.
Rule 68 - The higher rank an employee is, the more problems you will have with them.
Rule 69 - Refer to Rule 34.
Rule 70 - Anything that will show up as a link should be a link.
Rule 71 - Never take actions that assume a system is a certain way.
Rule 71A - Especially if not assuming makes little or no difference to the troubleshooting process.
Rule 71B - And never if the incorrect assumption will be recognizable to the user.
Rule 72 - Always give users the least amount of access/permissions that you can realistically get away with.
Rule 73 - It's always Dave or Steve or Kevin. Unless it's a Karen.
Rule 74 - Try to phrase things in a way that helps users save face.
Rule 75 - Maintenance, and sometimes coworkers or users, will unplug things and plug them back in wrong or not at all.
Rule 75A - If anything goes wrong they won't tell anyone. You will get to handle the "website down!" or "the internet stopped working!" tickets.
Rule 76 - Only have the minimal required equipment needed for users.
Rule 77 - Your company will be in a very old very shoddy building.
Rule 78 - If someone is acting odd, it might be a social engineering attack. Verify everything.
Rule 78A - VIPs within the company that actually do have the power to have you fired at whim will be the most angered by attempts to verify and will be the hardest to verify.
Rule 78B - Social engineering attackers know Rule 78A.
Rule 79 - Users think they can connect to anywhere from anywhere.
Rule 80 - If this port is taken, port 443 will be as well.
Rule 81 - Most of your job is figuring out what users are talking about.
Rule 81N6 - The GoogleBing awaits.
Rule 82 - Temporary solutions aren't.
Rule 83 - Every company has a Production environment and a Testing environment. If you're lucky, they are separate environments.
Rule 84 - Users already have a certificate of proficiency in computering.
Rule 85 - Always let someone know that you are there to fix a problem.
Rule 86 - You might encounter a user who is nice, doesn't need everything explained, takes you seriously, reads you complete error messages, and does what you tell them to do with no drama. neigh Seriously, they do exist.
Rule 87 - Users who always demand the latest hardware never work in a position that requires the latest hardware.
Rule 88 - Sometimes you need a user to fix your problem.
Rule 88A - Only a user will find the real problem.
Rule 89 - You will be expected to be your own tech support.
Rule 90 - You will have to support software older than you are.
Rule 91 - The OSI model has layer 8 (user) and layer 9 (management).
Rule 92 - It's always a bad sign if someone is happy to see you.
Rule 93 - "Only one thing" never is.
Rule 94 - Hypothetical questions aren't.
Rule 95 - Every mail from the helpdesk or system administration will be too much to handle if it is longer than two lines.
Rule 96 - Business will demand more experience for their job postings than exists.
Rule 97 - Always keep copies of drivers you download.
Rule 98 - Don't ask users if something is on the screen. Have them read the screen.
Rule 99 - A fix will only work until you fall asleep.
Rule 100 - A theme, especially a system theme, will make it difficult to read anything.
Rule 101 - Urgent isn't.
Rule 102 - Someday you will forget to use the mute button. Double mute.
Rule 404 - You will never find it. See https://www.explainxkcd.com/404
Rule 404A - If a page is not found, then the entire site|Internet is down.
Rule 404B - Online manuals will disappear without warning. Download a copy for yourself.
Rule 600613 - Used to go to websites instead of going directly.
Rule Ferengi - Users want their problem fixed quickly. Bribes will ensure they will be.
This doesn't exactly fit but it's close enough.
This was in 2016 or 2017.
A good friend of mine was in need of a new PC and wanted to buy a prebuilt. I gently (maybe not so gently) told him that was a waste of money and I could build a better one for cheaper. So I put together a decent budget oriented mid tier build for him. i5 and a GTX 1060 6gb. All was well at this point. We ordered the parts to his house and I got on a train to his city to put it together.
At this point in time, I wasn't an expert on building PC's. I had built two PCs in total. both very budget oriented.
So we (I) start building the PC. All goes well. Each part at it's place. Each cable connected (supposedly). Press the power button. doesn't POST. reseat everything. doesn't POST . My friend is getting very anxious at this point which is slightly annoying to me... Saying he should have just built a prebuilt. This in turn is making me uneasy. We decided to take it a PC shop so that they can test if a component is dead. And mostly to ease his anxiousness.
We take the whole computer, which is basically 'ready' in a cardboardbox to the pc shop. Taking public transit with a huge and decently heavy box isn't that comfortable. Arriving at the PC shop we explain the problem to the worker. He takes one look at it and goes: 'You haven't connected to 6pin power cable for the CPU'
Cue embarassment. But also relief that it was just a stupid oversight, fitting to my inexperience.
Back in his home, we connect the cable. It posts. all is good. Mainboard has no wifi built in so we get an ethernet cable. No connection. Ethernet doesn't work. I tried everything that I could and it didn't work. He bought a cheap usb wifi adapter. which works. To this day The ethernet doesn't work. He is still using that PC. I'm telling him he should upgrade it, or more like replace it completely.
He want's to buy a prebuilt. I tell him I can build him a pc for better and cheaper. I feel like im experiencing a deja vu. Hope the second round goes smoother..
I work in the IT department of a mid-sized company. One of my responsibilities is encoding newly-entered tickets (we use pretty old software for this, so auto-encoding isn't an option). Because of this, I get to personally greet every new ticket that comes knocking at the front door of our Service Desk, from the frustratingly complex to the yawn-inducingly mundane. This ticket fell in the latter category, and came across as follows.
NEW TICKET: Desk phone is not working
[Head of HR] has opened a ticket
Comments: My desk phone does not work. There is no dial tone, but there is a message for me that I cannot retrieve.
The head of our HR department isn't exactly the most tech-savvy, but she's polite and (usually) puts in tickets instead of calling us directly. That, plus her elite status in the company, means we usually give her tickets a higher priority.
Phones aren't normally my area, but I figure that something like this shouldn't be too hard. The first few thoughts that cross my mind include some imaginations of the HR lead somehow managing to unplug her desk phone completely and some speculations that she may have swapped out the typical receiver with a faulty headset. To test these theories, I call her desk phone from mine. It rings all the way through, and then goes to voicemail. From this, I conclude that the phone must at least be connected.
Since I'm a fairly organized (read: obsessive) person, I quickly type a note of what I've tried in the ticket, and submit a comment that I'll stop over sometime that morning. After maybe two minutes, I get a reply back that I can stop over anytime. Not wanting to put it off until after lunch—as I've been known to do with "easy" tickets on occasion—I let my teammates know what I'm doing and scale the mountain leading to the executives' lairs. As I reach the top of the stairs, I immediately notice that the HR lead is in a meeting. She spots me, she gives me a look that says, "Sorry!", and I slink back to the dungeon where we IT people dwell.
As I go to update the ticket to mention that she was busy when I went to investigate the issue, the lady herself ventures into our cavernous halls. "Sorry about that, GrammarPanda. You should be able to look at the phone now!" I delete the update I was typing on the computer, smile, nod, and hike back to the office to take a look.
Walking into her office, I find her desk phone and immediately see that it's functional. The display is reading normally, and all the right lights are on. Okay, so it's definitely a receiver thing, I reason. I pick up the receiver, and find that the dial tone is, in fact, missing. But, in pulling the receiver to my ear, I notice two things:
- I've never in my life seen anyone tangle their desk phone's receiver cord so badly; and,
- The receiver is connected to some sort of sound amplification device.
This is when I recall that our HR lead is very, very hard of hearing. It was actually what made me suspect a receiver issue in the first place. But, I didn't realize that she installed this gizmo to help. Oh, well; it's pretty neat, and it makes sense. In any case, the device has several LEDs on its front, and none of them are on. I notice that the receiver is plugged into the device, and the device is plugged into the base of the phone. All the wires are securely connected, so either a cable is bad or the device isn't powered by the phone.
To check on this thought, I flip the device over. Right away, I see a battery compartment that (thankfully) has a pull tab and isn't screwed into place. I pop the panel off and see a puffy 9V battery. If someone would have told me that this battery was 30 years old, I would have believed them. I promptly remove the battery from the unit for proper disposal, retrieve a new one from the supply area, install the new battery into the device, and put everything back in place. Now, when I pick up the receiver, I hear a dial tone that could be used to call whales living in the ocean on the other side of the country.
Satisfied with this solution, I walk down the stairs and run into the HR lead on her way back up the stairs. When I explained the solution, she smiled gratefully, saying, "See, now I never would have thought to look for a battery!" "That's what I'm here for," I reply while returning her smile.
With that, I sneak back to my stalagmite-encased desk, type up the resolution, and close the ticket. Sometimes, solving a simple problem can be extremely satisfying—especially when users show gratitude like that.