Friday 29 January 2010

Hard driving SQL

We have been working on installing an SAP ERP system for some time now. It went live in the latter part of 2009, and almost immediately we started to get some performance issues. After some discussions, we were advised that we should move a number of components form the SQL server to separate disks.

The server had originally been set-up to the specific instructions of the system integrators, and they had carried out the installation of their software. We had 2 logical drives; the operating system on the C: drive and the rest of the product on drive D:.

Essentially, they now advise that we should put the paging file, tempdb files, and transaction log files all on separate logical drives. This does make sense; with the extra drives, there will be less data being processed at the same time by the same equipment. However, the server we have is an HP Proliant DL380 with space for just 6 drives. As all the slots are full, we can’t physically add any more to the existing device.

However, there is a way around this; HP sell external disk arrays which can be added to an existing server. In our case, we obtained the MSA 20 unit which hold 12 SATA drives and this is connected to an HP 6400 SmartArray controller card. We ordered all of the required equipment back before Christmas, but unfortunately we had a series of problems getting the hardware. The bad weather didn’t help as we are a bit out on a limb, but the various bits were coming from different depots, so weren’t despatched together.

Laste week after all of the equipment had finally turned up, ee set-out to do a test of the process of adding the hardware and this went through fairly well. It toook us about 5 hours as we wanted to double check everything at each stage to make sure it worked; we had not had the chance to do something like this before and wanted to be certain it would work. We made notes of the steps and waited for the Sunday so that we could make a start on adding the new hardware to the main system.

The controller card was very easy to add. Pop open the cover, lift out the holder, insert the card and replace the cover. I also connected the cable to the disk array at the same time as I found that easier than trying to fiddle about in the back of the rack trying to make the connection. The slot that the cable uses on the back of the card is quite small and difficult to reach when the server is back in place.

When we fired up the server, it ran through the normal POST routine, and it quickly identified the new Smart Array device. It took a while for the disks to initialise; about 12-15 minutes for them all. However, we then hit our first snag; when it reached the end of the initialisation, it suddenly crashed and re-booted. Funny thing though, when the server restarted, it went back to the initialisation routine and then completed perfectly.

It was then necessary to set-up the logical drives and this is really easy to do. Within the configuration utility, just select the physical drives, the type of RAID and away you go. We chose to put 3 drives at a time in a RAID 5 configuration. It should give the space we need, the protection that it wanted and we get 4 logical drives (12 HDD divided by 3 = 4). With all 4 done, we could then re-boot the server, and see the new drives in the disk manager – we set it to create a new partition on the logical drives and format appropriately.

All of this took us about an hour, perhaps just a bit over. We then moved the paging file and set it to a slightly larger size than before – a quick reboot and still everything was going well. We copied the tmpdb folder to a new drive and then used a SQL script that we had found for dropping it and then re-attaching to the new location. It took literally only a few seconds to do and we were starting to get really cocky. Then it all went wrong.

We stopped the SQL service to copy the transaction log over – all 38 GB of it! We then started the copy process and it took ages. It seemed to copy about 8GB and then it would pause for ages (almost 20 minutes), before then carrying on. We got a point where it had reached around 12-14 GB and the damn server blue screened (one of the few occasions that we have seen Windows Server 2003 do a BSOD).

It turned out to be a paging fault error – once started we modified the paging file to put it back to the same minimum size that it had been, although we left it on the same max size. I restarted the copy process and we waited.. and waited… and waited…. and waited…..

Evetually after about another two and half hours, the copy process finished. We then ran the SQL commands to change the database to point to the new trans log location and once done, we verified that this was correct. We then ran up the ERP to make sure that it worked and it was good. By this time, it was well after 1:00 pm – we quickly finished everything off and locked up, then headed off to a local watering hole for Sunday lunch on the company.

And just to finish the story off, the technician’s wife works at that hotel. Whilst we were eating, she sent a note through from the back room, demanding to know where her dinner was. So a small piece was cut off of the dinner and put on a small plate to be sent out to her – 5 minutes later a message came back demanding to know where the ketchup was!

No comments:

Post a Comment