I know it seems unreal to wax nostalgic for head crashes, but at least then you knew beyond a shadow of a doubt the drive was bad. Today’s drives seem to half-fail. I was reminded of this story for two reasons. I listed this machine on craigslist, and it appears I have a drive from another machine failing the same way. Someone emailed me wanting to communicate off-list about the machine, that isn’t going to happen. I did send a response back throubh craigslist for them. It reminded me of the story.
When I first started writing about Manjaro Linux this was one of the machines I was using. I pulled it off my BOINC rack and used it for the experiment. There was a time this was my main machine and I had it beefed up with disk storage. It’s still beefed up with RAM but only has 1TB for storage now.
I would kick off the 3000+ module CopperSpice builds and go in for the night. Come out in the morning and the machine would be shut down. Cuss and holler about Manjaro power saving settings. Spend hours surfing the Internet looking for how to turn every last one of them off. Machine would work fine all day while I was using it then shut down overnight when running a long build.
Yes, I should have read my own blog before trying to diagnose the problem. Back in the MFM days we used to have “sticking hard drives” due to some grease cooking inside the drive motors and becoming more like half dried soda pop than lube. I wasn’t present when this happened so the hard drive wasn’t even under consideration.
Work from Most Likely to Least Likely
The power supply was old. In went a brand new 500W power supply. Damned if the same thing didn’t happen if I left a big build running at night. If it was idle all night it stayed alive. I had a UPS plugged in so a minor hiccup in the power grid wasn’t the culprit.
My original motherboard was a 970Extreme. The original version before the ATX case standard was a solid standard. The one corner where I had to connect the SATA cables didn’t have any support because where it had a hole for a stand-off, there was no slot. I always wondered if the wiggle connecting a cable would ever catch up with the mobo?
Hoping against hope I replaced the SATA cables. Nah, same problem, but only if I had a build running when I left. It didn’t die if I kicked off the build during the day and went to work on another machine. This was one of those annoying things that would only happen when I wasn’t around.
Next I booted from SystemRescueCD and let the memory tests run all afternoon, all night, and until about noon the next day. Yeah, no such luck. Couldn’t be something simple to fix like a memory module.
Of course I wasn’t smart enough to read my own blog.
I started poking around on eBay an found someone selling a 970Extreme4 board. It had some extra features, had to be many years newer given the manufacturing history of the 970Extreme. Got a good deal. Board arrived. Went through pain of taking mobo out but put it in a safe place “just in case.”
Same ^)(*&)(*ing thing happened!
A New Release of Manjaro
As luck would have it a shiny new release of Manjaro came out. I held out faint hope that this spastic system shutdown would be a bug they fixed. I install the new release, start it up, get all the software I need installed and the thing dies shortly after reboot. No head crashes can be heard. Sometimes it just hangs with the last screen displayed, other times it shuts all the way down.
Sadly this is an improvement. Now I’m actually in the room when it happens. I try many different things and got to a stage where I couldn’t complete the install. All this time the drive didn’t show any errors. Used all kinds of stuff and no errors were recorded.
When drives used to have head crashes there was no failure to diagnose. It sounded like Hell either briefly or for a while, then it just stopped. You couldn’t get it to spin no matter what. This was not that.
In desperation I pulled a 2TB drive off the shelf and stuck it in. Manjaro installed. Everything built and ran.
You know I didn’t just leave it at that. I stuck the drive in a USB cradle and kicked off a surface format.
Barely got 120GB in before it got a non-response or some type of error like that. No bad sectors or error codes logged. Just drive stopped responding. It appears some drives have a problem with “wiring” to the arm motors. It has a little bit of play and wears a spot of insulation off. When the arm gets to a certain point the bare spot grounds out. The power supply was turning the computer off because an overdraw situation was recognized.
I really miss good old fashioned head crashes.
Someone got a sweat deal on a 970Extreme motherboard on eBay.
Now that I need to make space on the shelf for other newer and higher end machines, someone on craigslist is going to get a sweet deal on this machine. I am going to miss it, but I gotta stop building machines I fall in love with.