Tales of the IT Helpdesk: Servers

Showing posts with label Servers. Show all posts

Thursday, 26 August 2010

Cool!

Some years ago, we undertook a small experiment with our server room. We had heard that other people were reducing the amount of A/C cooling they used and we wanted to see if it was appropriate for us.

Like a lot of other places, our small server room was kept cool to keep the servers cool; if we were to spend any length of time in there, we would need to put on a jumper or even a fleece to stay warm, as the room was around 10 degrees centigrade. The A/C units were running non stop, and we wanted to see if we could reduce the electric we used.

Essentially, we made a load of measurements to get a base line. These included the core temperature of the UPS, some measurements of the servers and various places within the server room. We were fortunate that our engineering manager had a device that we could borrow for this as he was conducting a number of tests to help the company work towards ISO14001.

He also had a device that allowed us to measure the amount of power drawn by various devices – we seemed to get a couple of slightly odd readings, but when we discounted those, the average values appeared to match what would be expected. We therefore assumed that the errors we had were down to incorrect use.

Having got our base line values, we then started to increase the ambient temperature of the room, and examine what affect this had. Each time, we would leave the changed settings for a couple of weeks to see what would happen; in each case, there was no sign of distress on the servers, so we were able to increase the temperature again.

After some time, we found that the “sweet spot” was between 20 and 24 degrees centigrade. Above 24 degrees, we would see the fans in the servers starting to work much harder and draw more power. Below 20 degrees, the A/C was still running almost all the time. However, in that range, we found that we had the A/C unit running at its least power draw whilst the servers ran at a comfortable level.

We found that in the racks, we had a few “hot spots”; places where the temperature was quite a bit higher than the ambient temperature of the room. We were told that this is normal and generally considered a good thing; these create a thermal current that allows the cooling to happen naturally. The interesting thing was that although the ambient temperature increased by 12 degrees, the hot spots only increased by 3-4 degrees.

Part of the work meant that we had to make sure that the racks were properly positioned in the room to allow for adequate air flow, and the direction of air from the A/C also had to be optimised to prevent “air curtains” forming at various places. We also had to make sure that things such as blanking plates were used to ensure a properly controlled air current within the racks.

Although this all sounds very grand, the room is quite small and most of the work was done in between our normal activities. We were able to make use of some additional advice from the A/C supplier, but that was relatively minor. The total amount of work required was actually quite small, but the results have been very good. We have seen a reduction in power consumption of just under 50% for the server room as a whole – which translates into significant cost savings.

I’ve added a link to a resource that I would recommend to anyone wanting to do work on their server room facilities. It is primarily aimed at North America, but there are some bits that are specifically for the European market. It will take some time to go through all of it, but I consider that it would be time well spent.

http://www.schneider-electric.com/sites/corporate/en/products-services/training/energy-university/energy-university.page?tsk=77518T&pc=26947T&keycode=77518T&promocode=26947T&promo_key=26947T

The really good thing - we now have a server room that we can work in, in reasonable comfort all the time!

Friday, 29 January 2010

Hard driving SQL

We have been working on installing an SAP ERP system for some time now. It went live in the latter part of 2009, and almost immediately we started to get some performance issues. After some discussions, we were advised that we should move a number of components form the SQL server to separate disks.

The server had originally been set-up to the specific instructions of the system integrators, and they had carried out the installation of their software. We had 2 logical drives; the operating system on the C: drive and the rest of the product on drive D:.

Essentially, they now advise that we should put the paging file, tempdb files, and transaction log files all on separate logical drives. This does make sense; with the extra drives, there will be less data being processed at the same time by the same equipment. However, the server we have is an HP Proliant DL380 with space for just 6 drives. As all the slots are full, we can’t physically add any more to the existing device.

However, there is a way around this; HP sell external disk arrays which can be added to an existing server. In our case, we obtained the MSA 20 unit which hold 12 SATA drives and this is connected to an HP 6400 SmartArray controller card. We ordered all of the required equipment back before Christmas, but unfortunately we had a series of problems getting the hardware. The bad weather didn’t help as we are a bit out on a limb, but the various bits were coming from different depots, so weren’t despatched together.

Laste week after all of the equipment had finally turned up, ee set-out to do a test of the process of adding the hardware and this went through fairly well. It toook us about 5 hours as we wanted to double check everything at each stage to make sure it worked; we had not had the chance to do something like this before and wanted to be certain it would work. We made notes of the steps and waited for the Sunday so that we could make a start on adding the new hardware to the main system.

The controller card was very easy to add. Pop open the cover, lift out the holder, insert the card and replace the cover. I also connected the cable to the disk array at the same time as I found that easier than trying to fiddle about in the back of the rack trying to make the connection. The slot that the cable uses on the back of the card is quite small and difficult to reach when the server is back in place.

When we fired up the server, it ran through the normal POST routine, and it quickly identified the new Smart Array device. It took a while for the disks to initialise; about 12-15 minutes for them all. However, we then hit our first snag; when it reached the end of the initialisation, it suddenly crashed and re-booted. Funny thing though, when the server restarted, it went back to the initialisation routine and then completed perfectly.

It was then necessary to set-up the logical drives and this is really easy to do. Within the configuration utility, just select the physical drives, the type of RAID and away you go. We chose to put 3 drives at a time in a RAID 5 configuration. It should give the space we need, the protection that it wanted and we get 4 logical drives (12 HDD divided by 3 = 4). With all 4 done, we could then re-boot the server, and see the new drives in the disk manager – we set it to create a new partition on the logical drives and format appropriately.

All of this took us about an hour, perhaps just a bit over. We then moved the paging file and set it to a slightly larger size than before – a quick reboot and still everything was going well. We copied the tmpdb folder to a new drive and then used a SQL script that we had found for dropping it and then re-attaching to the new location. It took literally only a few seconds to do and we were starting to get really cocky. Then it all went wrong.

We stopped the SQL service to copy the transaction log over – all 38 GB of it! We then started the copy process and it took ages. It seemed to copy about 8GB and then it would pause for ages (almost 20 minutes), before then carrying on. We got a point where it had reached around 12-14 GB and the damn server blue screened (one of the few occasions that we have seen Windows Server 2003 do a BSOD).

It turned out to be a paging fault error – once started we modified the paging file to put it back to the same minimum size that it had been, although we left it on the same max size. I restarted the copy process and we waited.. and waited… and waited…. and waited…..

Evetually after about another two and half hours, the copy process finished. We then ran the SQL commands to change the database to point to the new trans log location and once done, we verified that this was correct. We then ran up the ERP to make sure that it worked and it was good. By this time, it was well after 1:00 pm – we quickly finished everything off and locked up, then headed off to a local watering hole for Sunday lunch on the company.

And just to finish the story off, the technician’s wife works at that hotel. Whilst we were eating, she sent a note through from the back room, demanding to know where her dinner was. So a small piece was cut off of the dinner and put on a small plate to be sent out to her – 5 minutes later a message came back demanding to know where the ketchup was!

Wednesday, 5 August 2009

Terminal headaches

We have been trying to implement some new software for the CRM – the product has been used by one of our sites for some time, but not on the other sites. They had tried to use it before, but it’s not designed to be used across a WAN, so it had been set-up as multiple databases and when they started getting issues, they just stopped using the product.

The company concerned have issued a new version and our sales people have seen it and really like it. The vendor has produced a modified client GUI to run in a web browser – the idea is that those users on the remote sites would make use of that and so we could run a single database for all sites.

Well, that WAS the idea – the software runs OK locally, but when it was running through the browser, it was not as fast. Although it was usable, there was a definite speed issue, and we were worried that the users on the other site might not be convinced enough to use the product if the speed was poor.

It then occurred to me – the database was installed on the server and we also had a copy of the client software installed on the server so that we could test it was running as it was being set-up. I did a quick RDP to a server on the other site, the from there did another RDP back to the server on our site. The speed of operation was good – as far as I could tell, the speed was the same as if we were running it directly at this site.

So I set-up some shortcuts and emailed them to the users at the remote site, and then talked them through how to save and use the shortcut. They agreed that this worked well and they were really happy with the speed of operation. But then we hit a snag – only 2 users at a time. As we are talking about having some 20 remote users, then there is clearly a bit of a problem.

Now my predecessor had bought volume licences for a lot of software which included some terminal server licences, but unfortunately, none of the paperwork specified what was what. I found the paperwork ages ago and set-up a profile on eOpen to manage all of the various items. https://eopen.microsoft.com/EN/default.asp - this is a great resource and I suggest that you check it out if you don’t already use it. It allows you to see what the various bits of paper refer to and it gives you details on date of purchase, vendor, type of licence, quantity etc.

However, when I double checked, the Terminal Services licence server had been setup and the licences applied – so that wasn’t the problem. I then searched through the various bits and pieces and subesquently realised where it was all going wrong. The server that the software was installed on was set to use Remote Desktop as the licensing mode, not the correct Terminal Services mode. A quick couple of clicks and problem solved.

So now the staff at the remote site can all connect to the server and all use the CRM software. It seems to run just as quickly when half a dozen of them are using it – so they are all happy!

Friday, 3 July 2009

Hot, hot, hot...

I booked to take a week off of work last week – no plans to go anywhere, but just wanted a bit of a break. It was a glorious week, with lots of sun, but not too hot, and I managed to catch up on some outstanding jobs at home, such as painting the windows. I also had the chance to sit around and just relax with a glass of wine or two….

So back to work on Monday this week. I thought that I would get an early start as there are a number of projects on the go and I wanted to get a few things out of the way. When I arrived, there was note on the door – the inventory clerk had had problems getting on the system, so had left a note for us to investigate.

When I checked the server room, everything was off and the room was absolutely boiling – we normally run at around 22-24 degrees C as we find that’s a nice temperature to work in, the servers are OK with that and it uses less power to cool the place down. I quickly checked and everything had shut down including the air conditioner which wouldn’t even re-start.

I looked at the UPS and that was showing power going in, but nothing coming out. I looked but couldn’t see a problem so grabbed a couple of power extension leads from our office and ran them around so that we could get a couple of systems running. Priority number 1 was the DHCP / DNS server so that we could get network services and that was the first one running. Next one was email – no problem there, it started up fairly quickly. But with the room so hot, I had to find a way to get some air movement. Even with all the windows and doors open, the room was still close to 40 degrees.

I pinched some fans from the HR office as a quickfix, and after about 20 minutes the maintenance manager came in. He did a quick check on the air con unit and discovered that the power breaker in the mains supply in the factory had tripped out – he reset this, but when the unit started up, it wasn’t cooling anything down. He contacted the service company who sent an engineer down later.

With the rest of my staff in, we started moving a couple of the servers – we have small backup room at the other end of the building so were able to put a couple of them down there as a temporary measure. By about 9:00 am we had most of the system running so that people could get on with the daily work.

When the engineer from the aircon company turned up, he identified that the compressor had failed and needed to be replaced. It took a couple of days to get this, only for him to then discover that anpother part had failed causing all the refridgerant gas to leak out. This is what caused the aircon to fail – and as a result everything over heated.

We checked the UPS settings as it is supposed to send an alert for various events, and it turned out that every event was ticked except the one for temperature. Doh! Basically the device had gone up to 60 degrees C and then just shut everything down. In addition, a switch on the device had tripped preventing any outgoing power.

So now we are almost back to what passes for normal – we have to make time to come in one Saturday to put everything back in place as it takes longer to build a rack up than it does to strip it down. But the aircon is cooling away nicely and hopefully, now we’ve ticked the box, it will warn us of any similar event in future.

Tales of the IT Helpdesk

Thursday, 26 August 2010

Cool!

Friday, 29 January 2010

Hard driving SQL

Wednesday, 5 August 2009

Terminal headaches

Friday, 3 July 2009

Hot, hot, hot...

A link to our company video

Followers

Blog Archive

About Me

Labels