I booked to take a week off of work last week – no plans to go anywhere, but just wanted a bit of a break. It was a glorious week, with lots of sun, but not too hot, and I managed to catch up on some outstanding jobs at home, such as painting the windows. I also had the chance to sit around and just relax with a glass of wine or two….
So back to work on Monday this week. I thought that I would get an early start as there are a number of projects on the go and I wanted to get a few things out of the way. When I arrived, there was note on the door – the inventory clerk had had problems getting on the system, so had left a note for us to investigate.
When I checked the server room, everything was off and the room was absolutely boiling – we normally run at around 22-24 degrees C as we find that’s a nice temperature to work in, the servers are OK with that and it uses less power to cool the place down. I quickly checked and everything had shut down including the air conditioner which wouldn’t even re-start.
I looked at the UPS and that was showing power going in, but nothing coming out. I looked but couldn’t see a problem so grabbed a couple of power extension leads from our office and ran them around so that we could get a couple of systems running. Priority number 1 was the DHCP / DNS server so that we could get network services and that was the first one running. Next one was email – no problem there, it started up fairly quickly. But with the room so hot, I had to find a way to get some air movement. Even with all the windows and doors open, the room was still close to 40 degrees.
I pinched some fans from the HR office as a quickfix, and after about 20 minutes the maintenance manager came in. He did a quick check on the air con unit and discovered that the power breaker in the mains supply in the factory had tripped out – he reset this, but when the unit started up, it wasn’t cooling anything down. He contacted the service company who sent an engineer down later.
With the rest of my staff in, we started moving a couple of the servers – we have small backup room at the other end of the building so were able to put a couple of them down there as a temporary measure. By about 9:00 am we had most of the system running so that people could get on with the daily work.
When the engineer from the aircon company turned up, he identified that the compressor had failed and needed to be replaced. It took a couple of days to get this, only for him to then discover that anpother part had failed causing all the refridgerant gas to leak out. This is what caused the aircon to fail – and as a result everything over heated.
We checked the UPS settings as it is supposed to send an alert for various events, and it turned out that every event was ticked except the one for temperature. Doh! Basically the device had gone up to 60 degrees C and then just shut everything down. In addition, a switch on the device had tripped preventing any outgoing power.
So now we are almost back to what passes for normal – we have to make time to come in one Saturday to put everything back in place as it takes longer to build a rack up than it does to strip it down. But the aircon is cooling away nicely and hopefully, now we’ve ticked the box, it will warn us of any similar event in future.