Friday, 29 January 2010

Hard driving SQL

We have been working on installing an SAP ERP system for some time now. It went live in the latter part of 2009, and almost immediately we started to get some performance issues. After some discussions, we were advised that we should move a number of components form the SQL server to separate disks.

The server had originally been set-up to the specific instructions of the system integrators, and they had carried out the installation of their software. We had 2 logical drives; the operating system on the C: drive and the rest of the product on drive D:.

Essentially, they now advise that we should put the paging file, tempdb files, and transaction log files all on separate logical drives. This does make sense; with the extra drives, there will be less data being processed at the same time by the same equipment. However, the server we have is an HP Proliant DL380 with space for just 6 drives. As all the slots are full, we can’t physically add any more to the existing device.

However, there is a way around this; HP sell external disk arrays which can be added to an existing server. In our case, we obtained the MSA 20 unit which hold 12 SATA drives and this is connected to an HP 6400 SmartArray controller card. We ordered all of the required equipment back before Christmas, but unfortunately we had a series of problems getting the hardware. The bad weather didn’t help as we are a bit out on a limb, but the various bits were coming from different depots, so weren’t despatched together.

Laste week after all of the equipment had finally turned up, ee set-out to do a test of the process of adding the hardware and this went through fairly well. It toook us about 5 hours as we wanted to double check everything at each stage to make sure it worked; we had not had the chance to do something like this before and wanted to be certain it would work. We made notes of the steps and waited for the Sunday so that we could make a start on adding the new hardware to the main system.

The controller card was very easy to add. Pop open the cover, lift out the holder, insert the card and replace the cover. I also connected the cable to the disk array at the same time as I found that easier than trying to fiddle about in the back of the rack trying to make the connection. The slot that the cable uses on the back of the card is quite small and difficult to reach when the server is back in place.

When we fired up the server, it ran through the normal POST routine, and it quickly identified the new Smart Array device. It took a while for the disks to initialise; about 12-15 minutes for them all. However, we then hit our first snag; when it reached the end of the initialisation, it suddenly crashed and re-booted. Funny thing though, when the server restarted, it went back to the initialisation routine and then completed perfectly.

It was then necessary to set-up the logical drives and this is really easy to do. Within the configuration utility, just select the physical drives, the type of RAID and away you go. We chose to put 3 drives at a time in a RAID 5 configuration. It should give the space we need, the protection that it wanted and we get 4 logical drives (12 HDD divided by 3 = 4). With all 4 done, we could then re-boot the server, and see the new drives in the disk manager – we set it to create a new partition on the logical drives and format appropriately.

All of this took us about an hour, perhaps just a bit over. We then moved the paging file and set it to a slightly larger size than before – a quick reboot and still everything was going well. We copied the tmpdb folder to a new drive and then used a SQL script that we had found for dropping it and then re-attaching to the new location. It took literally only a few seconds to do and we were starting to get really cocky. Then it all went wrong.

We stopped the SQL service to copy the transaction log over – all 38 GB of it! We then started the copy process and it took ages. It seemed to copy about 8GB and then it would pause for ages (almost 20 minutes), before then carrying on. We got a point where it had reached around 12-14 GB and the damn server blue screened (one of the few occasions that we have seen Windows Server 2003 do a BSOD).

It turned out to be a paging fault error – once started we modified the paging file to put it back to the same minimum size that it had been, although we left it on the same max size. I restarted the copy process and we waited.. and waited… and waited…. and waited…..

Evetually after about another two and half hours, the copy process finished. We then ran the SQL commands to change the database to point to the new trans log location and once done, we verified that this was correct. We then ran up the ERP to make sure that it worked and it was good. By this time, it was well after 1:00 pm – we quickly finished everything off and locked up, then headed off to a local watering hole for Sunday lunch on the company.

And just to finish the story off, the technician’s wife works at that hotel. Whilst we were eating, she sent a note through from the back room, demanding to know where her dinner was. So a small piece was cut off of the dinner and put on a small plate to be sent out to her – 5 minutes later a message came back demanding to know where the ketchup was!

Wednesday, 6 January 2010

New year plans

So the holidays are over and we are all back to work – well almost. Unfortunately, the bad weather has caused some disruption, as a number of staff can’t get into work. Although that hasn’t affected IT staff, we are having to a do few things to help others out. Bet we don’t get any help from them when we need it later in the year!

I like to plan out what work we have to do – preferably at least a few months in advance. As such, I have a list of jobs and priorities against them and this gets updated throughout the year. At the moment, there are a large number of items for the next 3 months and quite a few for the second half of the year.

We are planning to go on a couple of specific training courses, there are some hardware and software upgrades, a couple of events that I feel would be appropriate for myself or my staff to attend and there are a number of jobs that need to be done as part of rolling maintenance programmes. We also have several projects under way and the various steps need to be arranged in the correct sequence and fitted in amongst the other work – plus of course we have the occasional problem that needs to be supported.

Unfortunately, there are several jobs that we cannot yet schedule – we are waiting for information from other people. One of our sites is proving to be a bit too small to handle the work load, so the company are looking at alternative locations. However, the senior managers can’t decide which of the newer sites would be most appropriate, so we can’t yet arrange for any work to be done that is required. Of course we know full well that when they do finally decide, they will expect all of the work to be complete within a few days!

In fact that move is going to be a much bigger task than they anticipate – once the decision is made they will then argue over the layout of the place and almost certainly, will change what they want on a daily basis. We will be cabling up the site for a network ourselves – it saves the company quite a bit of money although it does take up a bit of time. I’ve designed a particular method of network architecture that really works for us, and provides a great deal more flexibility and scalability than the way that these installatins normally get done. Most of the people doing cabling appear to be electrical installers, and they think CAT 5e can be treated like standard 2 core and earth and they seem to have a real problem if you ask for work to be done in a particular way.

On top of that, we have get the telephone lines moved, get an ADSL connection and move all the IT equipment ourselves – the last time we had a move, we also ended moving all the desks and cabinets as well. The staff seemed to think that they could just close down the PCs, put on their coats, pick up their handbags and walk to the new site to find the desks all set up, the PC installed and turned on for them! They were rather upset to find that they were expected to do some of the work themselves!

So January is looking to be quite a busy month, what with one thing and another. Happy new year!

Monday, 14 December 2009

Iiiittttsss Chriiiiissssttttmmmaaaaaassss!

Somone mentioned the old Slade hit from the 70's and I haven't been able to get the damn tune out of my head all day! I think that it's going to drive me crazy! (Mamaa, weer allll crazeeee now!)

Many years ago, on 24th December, I would stay right to the end of the day, and last thing would shut down all of the servers. No-one would be back into work until the first week of January, so it seemed pointless to burn all that power for no reason. Plus it gave the equipment a chance to be shutdown properly and restart. This doesn't always hurt as it can clear out any rubbish in memory.

The trouble was that the CEO felt lost without his email - after we gave him VPN access, he wanted to be able to check his email on Boxing Day, just because he could. Then of course, he wanted to be able to check the sales figure - why? There have been no sales and won't be for 2 weeks - but he wants it, so he gets it. And of course, that means all of the ERP systems have to be running. By the time that you work out which systems he might possibly want, it's easier just to leave them all running. (And of course, you know that he is going to phone up to check if the figures have been updated!)

So we don't shut things down anymore - and that means we have to keep an eye on systems to make sure that nothing untoward is happening. As you can imagine, the WAGS take a dim view of this - it only takes a few minutes to logon and make sure that each of the servers is up and running, but the amount of time is not the issue. We have automated alerts to let us know if specific events occur, but it's not quite the same and there is always a possibility that the relevant alert doesn't get through.

So the laptop is going to be hidden away somewhere, and an excuse made to either "take a nap" or "pop down the pub" - then a quick logon just make sure it's still all OK.

Whatever; we are fast approaching the holidays and the end of yet another year (where does the time go?) From my staff and I, the very best wishes to all the readers of this blog and to all the hardworking IT staff wherever you are. Have a great Christmas and try to enjoy whatever time you are allowed to take off. See you all in 2010!

Tuesday, 1 December 2009

Up in the clouds

One of the hot topics in IT at the moment is “cloud” computing. Effectively, outsourcing your hardware to a dedicated data centre. A lot of people try to convince me that this is the way forward, that everything should be put “on the cloud” and that this will save astonishing amounts of money. I’ve seen some of the calculations and I am not sure that they always stand up to scrutiny.

For example, I looked at a Dell PowerEdge unit – the cost to buy outright (£1,200) was a bit higher than the cost to rent in a data centre for a year (£700), but obviously over a longer period such as 4 years, it would work out cheaper. There is an advantage to the cloud offer in that they would replace the equipment (probably with newer equipment) at a set point, but then it doesn’t appear on the asset ledger in the company accounts, which upsets the beancounters.

Of course the purchase price doesn’t include the Operating System, whereas the cloud offer usually does (but not always); and there is the cost of electric to run the item and to provide cooling which have to be factored into the equation. There is also a need to provide anti-virus protection, patch updates, data backups etc. Again, that is not always included in the price of the hosting contract and so might need to be added to their quoted price – something that is always clear.

In addition, there is the cost of managing the unit – and they don’t always provide all of the management services that might be needed. In most cost comparisons, they show a figure for on-site management (and I sometimes feel that these figures are inflated a bit) - but then they don’t include similar values in the cloud offer even though it would be appropriate to do so, making the comparisons meaningless.

Suppose the 4 year basic cost of renting the server in a data centre would be £2,800 – reading the small print of some hosts, adding in the other items could take it to as much as £4,500. My calculations show the internal cost of the device for keeping it on site could be about the same, perhaps just a little more. Certainly the outsourced system might still be cheaper, but not by that much.

Then there is another point – what happens when things go wrong. It doesn’t happen that often, but when it does, the PTB want to know that someone is working on the problem. They like to be able to go into the server room, and for staff to point out flashing lights, explain what is happening – it gives them enormous comfort to see that someone is on the job and that the problem will be resolved evetually. This can’t happen with an outsourced system – even with numerous phones calls, they just don’t get the same level of reassurance, and you cannot put a price on that.

Now I will accept that I have used very generic figures – and to be blunt, most numbers can be manipulated to show pretty much anything that you want. Ultimately, it should be down to each individual case to be decided on it’s own merits. If it makes sense to keep it in house, then do so; if it is cheaper to host outside then that has to be the right decision.

For example, we have our company websites hosted externally – the cost is far cheaper than we could do it for as we don’t pay for a whole server box, and in addition, we don’t have to provide 24 x 7 support which would really rack up the support cost. However, we maintain our own CRM system – we checked it against SalesForce.com and our internal system works out at half the cost over 2 years. We also maintain our own ERP system – we were offered the chance to have it outsourced, and the cost of the management fees per year alone was more than the wages of our entire IT department.

So I suppose my advice would be to look at the numbers very carefully – make sure that you are really comparing like for like. Then think about the importance of the systems to the business and what would happen if the external system failed and how much of an issue it would be. If the risk is acceptable and the figures check out, then by all means outsource it. But I would strongly suggest that for many people, cloud computing is not the great panacea that it is made out to be, and that it would be appropriate to think carefully before rushing headlong into a situation just because it is the latest, greatest thing.

Thursday, 26 November 2009

Temporarily offline - working from home

I went up to London to a training session on Monday of last week. It was a really good session (better than I had hoped for) and I thought it well worth while. Unfortunately on the Tuesday afternoon, I started to feel a little unwell - shivering, shaking and sweating. By the end of the session, I was feeling really bad, and the trip home was a real struggle. I eventually got home very late (almost midnight) and I literally collapsed into bed.

It was a rough night - hot & cold sweats. The next day I felt more ill than I have done in a very long time. I had thought about grabbing my laptop to do some work, but I couldn't get up the strength to go downstairs to get the bag. It wasn't until the Thursday that I actually felt well enough to do more than stagger a few steps. When I did get back online, I quickly cleared a small backlog of emails, dealt with some enquiries over access permissions, and processed some internal items.

For most people in IT, this is actually quite a straight forward situation - there really is nothing particularly unusual about it. Within our company, most senior managers, departments heads and the sales people are more than capable of working from home for several days, perhaps even a week or two. We have also started putting together some processes to allow some of the other staff the option to be able to work from home - driven partly by a need to ensure business continuity, but also to allow a more flexible working pattern.

However, when you look at a lot of companies they just don't have the faciltiies for this. There is still a real antipathy towards the idea of remote working, and it is seen as less than desirable. Yet there are so many benefits - reduced travel costs / environmental impact, better work / life balance, the opportunity for staff to cover a longer working day, more productivity and the option for some people to hold a job when otherwise it might not be possible due to family committments or health issues.

Will this situation change? I think it will as many of these companies will start to find that they have to adapt to these new patterns of working. But I suspect that it may still take many years before everyone gets the option. A real shame - but I suppose that is just a reality of life.

In the meantime, I'm now back to work and it's almost as if I hadn't been off.

Friday, 13 November 2009

Microsoft Data Protection Manager Server 2007

I written about this software before, but my staff and I think that it is such an awesome utility, I’m going to post some more comments about it. Quite simply, it is the best product that Microsoft have produced in quite a while, but for some strange reason, they just don’t promote it. As we are using it and it works so well, I thought that it would make sense to share some of our experiences.

So what is DPM Server 2007 and what does it do? Essentially, it allows you to backup servers and workstations using a disk-to-disk process, then a disk-to-tape process for longer term storage. In days gone by, almost everyone used a tape backup process as standard – but there are some serious issues with this.

Tapes stretch, or suffer degradation which makes them less reliable. Add to that, people have to change the tapes over (and sometimes they get tapes mixed up) and if you have to rely on non-technical staff on remote sites to change tapes (as we do), then you’ll know that they often forget to do it. Regularly, the backup software throws a wobbly so nothing gets backed up; and they don’t know how to check this or correct it, so they change tapes without anything being written to them.

Even if all has gone well, the recovery process can be awkward. First you have to make sure that you have the tapes (or even the right ones), someone has to change them back over, and sometimes you have to then inventory the tape to find the relevant file before you can recover that. Add to that, if it is a database, then you have to try to work out which bit you are going to receover – the actual file, the transaction log; it can get quite complicated.

The problem is of course that people do delete or over write files – this happened this morning with one of the design office guys at one of our remote sites deleting some drawings that another person had worked on yesterday. To recreate them would have taken probably the best part of a full day, and they are actually needed for a meeting with a customer, so they were keen to get them back as quckly as possible.

The recovery process is so simple with DPM Server, that it is almost embarassing. In the recovery console, point to the relevant server, open the drive and navigate to the file / folder. Click on it and select recover – then choose the options, such as restore to new location or overwrite, original permissions or new permissions, etc. Click start and wait for a about 1 minute while it starts the recovery process, then watch as the files are recoved. In our case, about 18 Mb of data restored in just over 2 minutes to a remote site. No need to panic, no swapping of tapes, no need for staff to run around like headless chickens.

As you may imagine the staff at our remote site were pretty greatful – we’ve told them that they owe us a few drinks the next time that we are up there (and you better believe we intend to collect!). But in all seriousness, the DPM Server makes the backup and recovery process so straight forward that our lives are considerably less stressed as a result. Anyone responsible for the data integrity of a business should really consider looking at using this product – you will make your life a lot easier.

Wednesday, 28 October 2009

BCS South West

A couple of weeks ago, I attended an event organised by the BCS South West (but forgot to post this write up!) – it was a presentation entitled “Towards Onlince Safety” given by Ken Corish, an Education Advisor. Although primarily intended for parents of school children, I felt it had a lot to offer those of us working in other areas such Commerce or Industry, and thought that many of the points made by Ken were highly relevant.

The presentation notes can be downloded from the BCS website: http://www.bcssouthwest.org.uk/server.asp?page=pastevents (Towards Online Safety). These give a really good overview of the current situation and how the issues are being tackled. However, watch out for a couple of the pages as they contain some really bad language – just bear in mind that the screenshots are of real pages created by children on social networking sites and you might be a bit surprised.

In addition there were a couple of short videos shown that were created by CEOP as part of the process of educating young people about the potential problems – these have been shown in many schoools and I would suggest that if you are a parent, you might want to see these for yourself. They can be downloaded from:

http://www.youtube.com/watch?v=-IOOn2wR8bU (Where’s Klaus?)

http://www.youtube.com/watch?v=vp5nScG6C5g (Think U Know: Girls)

http://www.youtube.com/watch?v=q4vyRBMjEv8&feature=related (Think U Know: Tom’s story)

http://www.youtube.com/watch?v=4w4_Hrwh2XI&feature=channel (Think before you post)

http://www.youtube.com/watch?v=CE2Ru-jqyrY&feature=related (Once posted, you lose it.)

Ken made the point that many adults don’t understand some of these issues, and so how can we expect children to. However, it’s also clear that many adults know little about online safelty or think that it is someone else’s responsibility. Whatever your view, it is important that the message does actually get around to everyone.

The Internet can be a great place – there is a lot of really good information available, you can achieve a lot and make great friends particularly if you are reasonably savvy. But it has its darker side and sadly, there are some really nasty people out there. However, that reflects real life and we should make sure that the more vulnerable people (and not just the youngsters) are properly educated to make sure that they stay safe.