Sorry about the similar heading, but it just fit perfectly.
A couple of weeks ago, one of my servers (a nice Dell R815 dual 12 core cpu 128g ram box) suddenly complained that it couldn’t detect its memory any more. Dell‘s usual approach (which, for me, never improved anything) when something like this happens is to power off the machine, unplug the power cables, remove the memory modules, put them back where they were (“reseat” the modules) and then power on the machine again. Afterwards, support usually wants you to also clear the system event log and see if that changes anything.
Well, just like all the times something like this happened before, it did not change a thing. Machine kept saying
Memory not detected
and didn’t even make it to the POST. So after support made me cross-test each CPU in every possible memory combination and this also did not change anything, they decided that a “certified technician” (or “pro”, as I’m going to call him from here) would have to come out and have a look.
The next day. It was raining and rather cold for a day in June. The pro was expected to be there between 10 and 11 AM. So he calls at about 9:30 and says he has “a shitload of work to do” but will try to make it in time. Okay … so he talks a little sloppy … if his work is professional, I’m fine with that.
He arrived about 2 hours late and the first impression was – is this a construction worker or a computer technician? Shirt partly hanging out of his pants, chest hair sticking out and every time he leaned over to pick something up, you couldn’t help but see a good part of his bare butt because his pants were loose. Well, if at least his work is professional, I’d be fine with that.
Before you start working on a computer system’s hardware, you usually connect yourself to the ground in order not to damage things by the build-up of static electricity. The pro didn’t do this. “Things aren’t this sensitive any more these days” he said when I asked him whether it is not good practice any more to do this (as I had learned during my apprenticeship). Well, the machine is dead anyways and he is the one responsible for this, so I’ll let him do it the way he is used to do it. At least … he is the pro, isn’t he?
So I tell him what I have, in consultation with official Dell support, already done and show him the documentation I had written down while doing the cross-testing of CPUs and memory modules. But of course, obviously … , he is the pro and I don’t know squirt about computers, so he has to repeat these tests. While he is removing the machines cover plate I start wondering about another thing. Where is he going to put the CPUs and memory modules that he is going to take out of the system? Usually, you have a small
antistatic mat around to lay the removed devices onto. The pro does not …
So he litterally starts playing jackstraws with 6 of the 8 pairs of memory modules on the bare desk, leaving no chance to remember which module was in which slot. But once again, he is the pro – he knows what he is doing … So there are only 4 modules left in the system (each CPU needs at least one pair) and he powers on the machine. Suprise! It doesn’t do anything, just like before.
“Hum, let’s remove one CPU.” “Erm, I had already done all this before, would you like to see my documentation again?” “No.” After unscrewing the cooling device it still sits firmly on the CPU since the heat transfer paste somewhat glues things together. This particular thing was so strong, you could almost lift the entire box (about 35 kilos) from the desk by just pulling on the unscrewed cooler. So what does the pro do? He bears against the pulling by placing his other hand on the computer’s mainboard! My colleague and I didn’t believe our eyes when we saw this, but it actually happened.
Anyway, he managed to get the cooler out and remove the CPU. But without the antistatic mat – where was he going to put the CPU? With thermal paste on one side and the electrical contact section on the other side, well, obviously the pro just puts the contacts to the bare, uncleaned desk!
It took the pro well over an hour to perform all the cross-tests I had already done and documented the other day and, I’m leaning towards saying “obviously”, his work did not make a difference … well, apart from maybe adding some dust to every component since he just put them on the desk. So what’s the next step? Replace the mainboard. Hurray – that’s what you came for!
Other than having all the CPUs and RAM on the desk, the actual replacement of the mainboard went smooth and I wouldn’t have done it differently. Unfortunately, the new mainboard still did not get the machine to the POST. So the problem needed further investigation. Since we had already cross-tested all the CPU and RAM combinations (twice!) and he did not have any other spare parts, he said he had to postpone the repair to the next business day … and then started packing. 1 CPU and 14 RAM modules still lying around on the desk, unprotected, and he just wants to leave. “Na, that’s fine. The next guy has to take apart everything again anyway”, so he’d just leave it like that.
I asked a colleague to come over and have a second look at this while the pro was still there in order to have a second witness that it was not me who produced this mess and then the pro just left.
Maybe the pro thought he was Chuck Norris – he wouldn’t need an antistatic mat or a ground connect for sure!