Monday 29 September 2008

You are the weakest link (r)

You know, one thing about blogs is that it is hard to make plans about what you will write. I was all set to talk about mechanisms used by bad people to get software to run on your machine. Maybe that software would be a bot or maybe it would be a remote admin tool or maybe a bot that has remote admin facilities… but all bets are off once malicious software is running on the system. The 10 immutable laws still apply. Which 10? These 10. .

To save you looking, here they are in full. They are things of beauty to me.

Law #1: If a bad guy can persuade you to run his program on your computer, it's not your computer anymore
Law #2: If a bad guy can alter the operating system on your computer, it's not your computer anymore
Law #3: If a bad guy has unrestricted physical access to your computer, it's not your computer anymore
Law #4: If you allow a bad guy to upload programs to your website, it's not your website any more
Law #5: Weak passwords trump strong security
Law #6: A computer is only as secure as the administrator is trustworthy
Law #7: Encrypted data is only as secure as the decryption key
Law #8: An out of date virus scanner is only marginally better than no virus scanner at all
Law #9: Absolute anonymity isn't practical, in real life or on the Web
Law #10: Technology is not a panacea

Notice that the word “Windows” does not appear in this list. There is nothing specific to any operating system here. These apply to Linux and Unix and any other system that you care to name.

Clearly law #1 applies. If there is someone else’s program running on your system then you have less control over it than you should and someone else has more than they should. This isn’t quite as true as it used to be. If a program is running in a sandbox or with reduced rights and there are no elevation of privilege vulnerabilities then it may be that it can’t do any real harm without a little social engineering. Of course, a lot of installers work by using a little bit of social engineering.

Law #2 is really an extension of law #1. Pretty much all of the operating system that people see is user mode programs so there is no practical difference between a hacked OS component running with your rights and wholly new component. Of course, in kernel mode, a malicious component completely owns the box.

Law #3 is a good one. I had cause this weekend to bypass the security on a network of Vista machines. I did it for perfectly legal reasons, with the owner’s permission and without using any trade secrets. It was an extortion situation. How did I do it? There are a number of ways if you have unlimited physical access. Of course, it is a heck of a lot harder if the system is secured with Bitlocker. Some of the techniques involve a screwdriver and some just need some fiddling about. Why was it needed? The owners were the victims of social engineering. I referyou to rule #6.

Law #4 relates to letting people upload programs to your website. Well, the website is just a computer when all is said and done so this is something of a rehash of law #1 – except with a twist. If he makes it content that can be downloaded to others, you have potentially allows hundreds or thousands of other systems to be infected. These are likely to be the systems of your customers, the nice people who give you money for things. They are not good people to upset. Hackers are big fans of this approach, not least because people are more likely to trust components downloaded from your site than some previously unknown site. Social engineering is a big factor in this too.

Law #5 is perfect. Weak passwords trump strong security. Amen, brother. If you are like me, you will look away when people type their passwords but I bet that you know a few that belong to friends or family or colleagues. Spouse’s name and the year of marriage? Eldest son? Youngest daughter? Pet’s name? Would the information be on their MySpace or Facebook page? The easiest of all were the 4 digit passwords that travel agents used for the old teletext services that they used way back in the day. There was one 4 digit number that every ABTA travel agent knew – their ABTA number. It was displayed on the wall .Oops!

Law #6: A computer is only as secure as the administrator is trustworthy – ah yes. Who watches the watchers? QUIS CUSTODIET IPSOS CUSTODES is the original from the Roman poet Decimus Lunius Luvenalis who died in the second century AD. If your administrator feels aggrieved then passwords and biometrics will not serve you. Of course, he might be a wonderful person who loves the organisation but there are ways of turning a man. Better to have two, one to watch the other. Of course, that reminds me of the old Russian joke. Why do KGB officers go around in threes? One who can read, one who can write and one to keep an eye on the two intellectuals. Is this really one for social engineering rather than a technical point? Yes, seems to be.

Law #7: Encrypted data is only as secure as the decryption key. This is true. There is a technique known as rubber hose cryptanalysis. It is a simple technique. You beat the person who knows the key until they tell you. A variant much loved by a certain section of society is to kidnap the family of the person that you want to control. Security is not always a field that shows you the best that people have to offer. A simpler and more common vulnerability is to simply have the key written down. This is a good sensible thing to do. It is important to store it somewhere safe though. A post-it note under the keyboard is not a safe place unless it is a very secure facility and even then… Anyhow, a pure social engineering point again.

Law #8: An out of date virus scanner is only marginally better than no virus scanner at all. Ah, a technical point at last. Older viruses are not much of a risk these days. You won't get them on email because the server will filter them out. You are unlikely to find them on a website because it makes more sense to put a recent one up there instead. Even if you somehow got Sasser onto a modern PC, it couldn't spread because it relies on vulnerabilties in products that have long since been replaced as obselete. Also, most malware is fairly new because the rate at which variants are written is ever increasing. Of course, you do need to check for the older ones as well but they are a minority case.

Law #9: Absolute anonymity isn't practical, in real life or on the Web. This is the one that has weathered the storms of time least well. It is still true but the key word here is “absolute”. You can use an anonymous proxy if you would like. There may be records kept by the proxy provider though and there are forensics to be examined on the local PC – though there are ways around that. Some proxy providers claim not to keep records. Some promise that all logs get wiped. Of course, there may be a record that your system connected to that proxy. Personally, I don’t worry too much about this since I am pretty open. People know where to find me. My phone number and address are not hard to locate. The scary people would find out anyway. I only conceal information that is not mine to share.

Law #10: Technology is not a panacea. Ah, how true. You can not make a system fool proof because fools are so ingenious! The better the security of the technology, the more you target the user. Social engineering is such a great tool.

Of course, that doesn’t mean that a buffer overrun is not going to allow a worm to spread across the world in hours or days. We need to guard the doors and the windows of the house. However, it does occur to me that it is harder to apply a service pack to people than servers. We need to educate people but we also need to make it easier to do the right thing and harder to do the wrong thing.

These are interesting times, my friends.

Signing off,
Mark Long, Digital Looking Glass

Thursday 25 September 2008

Ways to attack your users

So, hacking. What is it? Well, let’s try a dictionary definition.

The Mirriam-Webster online diction says:

to write computer programs for enjoyment b: to gain access to a computer illegally

Ok, I would argue that a program can be hacked together for a number of reasons, not just for enjoyment and that a program written for enjoyment may be created with properly structured methods. The second definition holds more water though maybe you could argue that penetration testing is hacking and not illegal except perhaps in Germany. The German government passed a law making the possession of certain software tools illegal in the same way that it is illegal in many places to wander around the streets with a set of lock picks and a crowbar. Well, I can see some good points in this law but some of the tools that hackers use and that security professionals use are the same bit of binary. This is a tricky one. If you pass a law designed to control criminals and honest men alike, the criminals will break the law (it is a job requirement) and the honest men will either obey it to no good purpose or become criminals and break the law.

Anyway, hacking is gaining access to a computer system against the user’s wishes. Perhaps that is a better definition. However, a new study casts some doubts on that definition too. When is access to the computer granted? Well, from the point of view of the computer, when the access is requested or initiated from a legitimate and authorised user account. That makes sense; how else could it decide? What does this mean in practice? That code is running in the context of the user. This could be a cross side scripting attack or a buffer overrun but let us consider the most common case, the case that is occurring on your computer right now. This page has a tiny and quite harmless bit of Jscript running which is part of the navigation bar logic. If you are running on an older operating system then it will be running under your user account. If you are running Vista or Server 2003 or Server 2008 and you haven’t monkeyed around with the security settings for the browser then it will be running in a more limited context but my code (well, OK, technically Google’s code) is running on your computer. Of course, even if the script were malicious, there are only certain things that script on a page can do. The really powerful things require you to do them from a binary format executable, an EXE or a DLL or something else like that. A script can use some of these that are marked as safe for scripting and a lot of security updates in the past have simply been to mark some component as not being safe for scripting. A script can’t add a new binary. That requires user action.

What exactly is “user action”? It is clicking on a button saying that the user trusts the component that wants to install or following a link or just clicking OK on a dialog. A script can open a window and display HTML. That is a perfectly legitimate thing for a script to do and a lot of the web wouldn’t work if it couldn’t – you have to allow popups from some sites. However, what happens if it creates a Window that looks for all the world like a legitimate dialog? Oooh… well, you have to rely on the user spotting that it isn’t the real deal.

How likely is the user to always spot that a dialog is not the real deal and click on it anyway? According to a study at the North Carolina State University, users who had been specifically warned and who were being careful successfully spotted the fakes 37% of the time. Yes, just under two thirds of the spoofs were accepted as real. There are some more details here at the NCSU site

Most malware (and remote access tools, the holy grail of hacking, are just another type of malware) is installed by the user inadvertently.
How can these windows be used? Well, there are a number of ways. One of the most common is to have a dialog with a bitmap on it that looks like a dialog but the whole thing (including the window borders, the close button and all) is just a big button that takes you somewhere that you didn’t want to go. A popular use is to display what looks like a system warning that you have malware. Follow the helpful link and it will try to download an application. Most users approve the download because “Windows” asked them to install it so it must be safe. “Windows antivirus 2008” and the 2009 version do exactly that. Pop onto Yahoo answers some time to see how many people clicked “yes”. There are multiple sites offering removal instructions but the ones at Bleeping Computer seem pretty good to me.

Antivirus 2009 doesn’t just use that technique though. It also uses the good old codec download trick. This is very much the same principle. A video is created which pop up a dialog or displays in the video window that it needs a codec and handily gives you a link to the codec. The first malware to do this was our old friend Zlob. There is no honour among thieves and the idea has been copied widely. Does the link actually take you to a codec? ‘fraid not.

Next blog or within a few entries at worst, I will be talking about how downloadable components are spoofed.

Now, if you are a developer and you have read this far, you may be wondering what was of value to you in this blog. Well, that is a decision that you have to make but consider that the users who fall for these basic tricks are probably very like the people who run your application. Scary thought, eh? Someone has to have a healthy level of paranoia and it seems clear that it had best be you.

Signing off

Mark Long, Digital Looking Glass Ltd

Wednesday 24 September 2008

What is your identity worth?

What would make you tell someone your user name and password?

A new study by Symantec suggests that £5 (around $10) is enough to convince most people. From Sky News:

“In a survey, almost 60% of people were prepared to divulge their computer password when asked by a stranger in the street.

Forty-five percent revealed they used either (sic) their birthday, their mother's maiden name or the name of their pet as a password.

The survey was an experiment by internet security firm Symantec to test just how much personal data people would give up.”

I can’t say that I am much surprised. In the past, people have been known to give up their passwords for as little as a chocolate bar and we are not even talking about good chocolate here. These are the users of the systems that we design or install. Is anyone else thinking that two factor authentication doesn’t seem all that expensive anymore?

All that said, there is a dark and cynical part of me that wonders. If someone asked me for my password in exchange for a reward, I would gladly tell them. It wouldn’t be my real user name or my real password though. I would be very, very interested in finding out who was asking the questions. I would also be reluctant to eat candy from strangers.

Nobody ever said that working in security made you a nicer person.

Oh, I know that this is not about the anatomy of hacks. I caught a (biological) virus over the weekend and that has rather thrown off my plans.

Signing off,

Mark Long, Digital Looking Glass

Friday 19 September 2008

Questions, all your many questions

I like questions. I like ones that can’t be immediately answered quite a lot because research is always interesting. It seems that a lot of people struggle to find things out. Sometimes there is too little information available and you have to dig and delve and extrapolate. Sometimes the key facts are buried in a blizzard of information that makes finding a needle in a haystack seem like a trivial operation. I like research all the same.

However, the sort of questions that I have had recently have been a bit different. They have been very business focussed. They have come from some places that I wouldn’t have expected them to come from as well. One from New Zealand, another from Hungary, a couple from the US and so on. I have answered each of the people who asked individually but I will also answer here because if one person has a question, it is likely that there are others who want to know but who haven’t asked.

So, to the questions:

Q1. Can you teach me to hack and do you know of any vulnerabilities in X software?
A1. Can I? Yes. Am I going to? Uh, no. I can point you to resources such as CEH (Certified Ethical Hacker) training and I am happy to explain any points that are unclear but I don’t have a stock training program for this and I would have to tread a little carefully there because of ethical and legal considerations. If I did know of any vulnerabilities, I certainly wouldn’t be mentioning them to anyone until they were public and preferably fixed.

Q2. Can you make my system totally secure?
A2. Absolutely. Just remove the power cable and weld bars across the door. If it has to be online and doing something then I can certainly make it a good deal safer for you. The risk will never be zero but I can and have in the past made systems much less vulnerable to attack. If your system is not an easy target, it is likely that attackers will move on to an easier target.

Q3: Can you teach me to debug?
A3: I don’t have specific training although given the number of requests, I may consider creating some. I can certainly show you the tricks that I know.

Q4: Will you break into such and such a system?
A4: What an interesting request. If you give me your name and address and a time when you will be home, some friends of mine will be happy to call and discuss this with you. Pay no attention to the flashing blue lights on their cars.

Q5: My system has an intermittent problem. Can you help us to troubleshoot it?
A5: Sure can. It might take a while but I there is no charge for waiting for something to happen, only for when I have to do stuff.

Q6: Why is onsite work more expensive?
A6: Because it is harder to juggle other commitments around work on your site. Work done remotely can be done at odd times of the day and night. However, I know that it is desirable to have someone onsite for political reasons and for face to face discussions. Typically, a short onsite visit to gather data and discuss a plan of action is useful and the rest of the work can be done remotely saving you money.

Q7: What geographic area do you cover?
A7: If planes fly there or there is a network link of some description and we have a language in common, I can help. I am happy to do remote work to anywhere in the world. If you want me to book the travel, it will be business class. If you book the travel, you get to choose.

Q8. Can I hire you or another consultant to help us find a particular bug?
A8: If it is legal and ethical, you can hire us to do pretty much anything you want. As for finding a specific problem, it often turns out that a symptom has multiple causes. A classic example of this is performance issues where removing one bottleneck means that you hit another one. In this sort of case, fixing the problem is an iterative process. That is why we quote some problems just with an hourly rate.

Q9: What is the limitation on what we can do with the free 2 hours?
A9: You can use them just like paid for time. Each new client gets 2 hours per gratis. That doesn’t mean that you get 2 hours free when you buy 10 hours. It is 2 free hours and there are no conditions on that. You can even have them onsite if you are willing to pay travel costs and the flight times are not silly. If the job takes less than 2 hours, you get it for nothing. Think of it as a try before you buy. The only possible drawback is that free work doesn’t get priority over other paid work so you might have to wait a bit.

Finally

Q10. I want something that isn’t listed on the site. Can you do that?
A10: Like it says, if it is legal and ethical and we can do it for you, yes, sure, anything that you want.

Next blog, back to technical stuff. I might talk about the anatomy of some of the more interesting hacks that I have seen in the past few months.

Signing off,

Mark Long, Digital Looking Glass Ltd

Monday 15 September 2008

Who should fight the botnets and rogue sites?

I know a very cleaver chap in another security business and he posed an interesting question the other day. Who is responsible for protecting companies and individuals against online crime, specifically the threat from botnets? My view is that the answer is rather more complex than it might seem.

The Police? Well yes, clearly they have an important role to play. Which police though. British? American? Belgian? Policing in the real world hasn't fully caught up with the international nature of the internet.

Some have called for more government intervention- though that leaves open the question of which government. Leaving that aside, I can see a role for national action against botnets. Organisations such as the Russian Business Network are able to successfully run phishing sites for months despite all legal attempts to get the operation shut down. Botnet control channels have also been hosted there. Clearly, commercial and civil law are not enough here and political pressure is needed to get action from a government that appears to support such operations. However, imagine a situation where large scale botnets were being run from a failed state that had no government to speak of. Who would you address such concerns toward? In such instances, there is a case for border controls on the internet, cutting links with certain ranges of IP addresses or certain types of traffic at the borders. A scary thought in the free world of the web but there are as many spiders as there are butterflies.

Some have suggested that the operating system and the ISPs should offer more of a solution – block the installation of malware and stop people going to website that could harm them. The argument is that people don’t have the skills to protect themselves online and someone else has to make the decisions. Well, yes, I see the argument. However, anything that limits the ability of people to use the web should be considered most carefully, not least because people loathe external control and would move to products that offered less protection. The price of price of liberty is eternal vigilance if I may quote John Philpot Curran. The controls described above would need to be applied with a light hand.

A senior policeman has suggested that the solution requires a collaboration between industry and the police and quotes the capture of Al Capone as being largely down to the action of business. There is certainly a lot to be said for this – some of the larger corporates have at least as much power as some of the smaller governments. Of course, companies can’t do this by themselves and have to work with law enforcement and some other... well, governmental agencies. Of course, there are carefully defined if not terribly commonly discussed links between the larger vendors and law-enforcement and there is open co-operation between major vendors to combat the botnets – the Virus Information Alliance for one. The Storm Botnet was heavily trimmed by the Microsoft malicious software removal tool but you can never kill a decentralised botnet by killing the bots individually. It can only be a population control method. There is also the question of expertise. The police have some very savvy folks but it is difficult to keep up with what is happening in the industry and the police are always going to be overstretched. Days in court are days when the industry (White hat and Black hat alike) moves on. Collaboration with specialists and organisations that cross borders will always be needed, I think. Since I am in that field, I certainly hope that this will be always be the case :-)

Some say that user education is the key. People must defend themselves against attack and fraud much as they would in the offline world. Well, yes, again I agree that better security and better user education would help a great deal – after all, what is a company but a lot of people and some buildings? People and organisations have a major role to play in protecting themselves by not clicking on that link, not giving their bank details to the lawyer of the late Mr JOHN ADEMOLA and not buying from SPAM emails. However, the only way to do this is by user education and that is quite the trick with home users. If you are reading this, you are almost certainly pretty computer savvy. Your friends often come to you because their PC has broken again and you fix whatever they have done this time. Have they read any instructions? Nope. Did they read the online help? Nope. Will they resist any attempt to educate them? Yup. A couple of weeks back, I was removing yet another fine crop of malware from a PC – friend of a friend deal. They had got it from a file sharing solution that gave them access to free (if illegal) music, pornographic videos and “free” applications. I explained *again* these things were plague pits of malware and should be treated in much the same way as the free hypodermic syringes found in inner city alleyways. It was clear that I was speaking to deaf ears.

As for business, a lot of businesses still seem to think that putting in a firewall and an antivirus solution means that they have solved the problem. Well, those things help but when more than 3/4s of malware is installed by the legitimate users… well, you haven’t solved the problem yet. Against SPAM and phishing, you have done nothing at all.

I believe so strongly in user education that I will be speaking at some schools on basic self protection online.

So, if all these pieces are in place, will we have won? No more than we have won against conventional crime. Each part of the solution will reduce the impact of online crime but we are stuck with some level of crime. All we can do is choose how much – because the more protection we have, the more it costs and the more limiting it is.

I personally think that the costs of defending ourselves against crime will go up as more and more of the third world has access to the web because the disparity in living standards and the cost of living will make us such attractive targets. If you steal $300 from me, I will be very annoyed. That is a good chunk of a day’s work after taxes. If you live in Chad, that is 4 months income. People will go to a lot more trouble to steal 4 months income than you would be willing to expend to protect so little money. It is nearly a month’s income in the Ukraine – still well worth the effort. Oh, and if you were wondering, $300 is 6 month’s income in rural China. Given the distance, conventional industrial espionage, fraud and extortion haven’t worked over such a distance. With the world wide web… well, distance isn’t a factor any more.

To quote the old Chinese curse, we live in interesting times.

Signing off,

Mark Long, Digital Looking Glass

Thursday 11 September 2008

Sharing too much with the world

There is a lot to be said for good error messages. The more diagnostic information that I have, the easier it is to diagnose what went wrong. A full stack would be nice. The names of any resources used would be helpful. I love rich debugging information.

Of course, so do hackers.

There are a number of ways of attacking a website. You can try for directory traversal where you try to get the web server to serve up things that are not what the admin intended. In an extreme case, you could possibly persuade it to serve up the results of running an arbitrary command in a command shell. However, such vulnerabilities are rare though tools such as the Goolag scanner make it easier to find them. The easiest ways are normally cross side scripting attacks or XSS for short. SQL injection attacks are just a special case of cross side scripting.

Of course, if you were attacking a website, the more that you knew about the target, the better you will like it. I am not advocating security by obscurity because we all know that this doesn’t really work but I am all in favour of not making it any easier for the bad people than it needs to be.

So, imagine that we give really good error information in our error messages. It is great for your developers to know the exact SQL code that resulted when a malformed text field was appended into a SQL statement and what function failed but it also tells an attacker what is happening to the text that he entered and what tables he is working with and what database and data access technology you are using. It is great to have this level of detail when debugging but better to write this to the application log and not to the users screen. Even if you assume that every user is a nice person who has no interest in hurting you (and wouldn’t that be nice?), is the average website visitor going to find it helpful to know that his request caused SQL server to throw a specific exception because there was no valid SQL after the semicolon and that this occurred in MyOrg.Bibliophile.Inventory.GetBookListEx() and the first part of the SQL was accessing the Titles table? A generic “Oops – sorry, I couldn’t do that. Try asking for something else” type error would be better and look more professional.

Of course, you might give no error information at all to the user. What does that mean in practice? Well, they might just get a message saying that there was an Error 500 – Internal Server Error. Ok, tatty but serviceable. When you see an application give a response like this, it probably doesn’t have a great deal of error handling and just failed the request. Hopefully it survived the error and the webserver didn’t restart. If it did then you have a vulnerability to a denial of service since it is trivial to fire of multiple requests a second and you can’t normally cycle the server that quickly. You probably need to dig deeper if someone has been able to break the server that way.

What happens if you do nothing at all with regard to unexpected exceptions? Well, that depends on what you are using for your web server and what is happening under the covers but let's look at ASP.NET since that is a pretty popular solution these days. If you don’t have anything specific in place, the web.config will determine what will happen. There is an overview of the file here but the critical setting for us is “debug=true” or “debug=false”. If debug is set to true, it will spit out lovely rich diagnostic information to our dear friends in far away countries. If it is set to false then the end user gets a generic “Oops – that didn’t work” type error. Oh, there are also some other very good reasons not to ever want debug=true in a production environment and the incredibly clever Tess Ferandez discusses them here and I will not steal her thunder other than to say that it will kill your performance and scalability.

Oh, and why would a production server ever be set to debug? Typically because the application was copied verbatim from the development server and no-one noticed the scalability issues when there was insufficient load testing. I have seen that one a few times.

Anyway, I shall leave you with the wonderful words of Will Durant:
"Nothing is often a good thing to do and always a good thing to say."
Signing off

Mark Long, Digital Looking Glass

Tuesday 9 September 2008

Shooting trouble and the breeze

I have been thinking about problem solving and how to teach it in the last few days. It seems to be a difficult subject, not least because there is no general right way with all other ways being wrong. It seems that we all find our own right way according to how our minds work but there are some common traits and I would like to discuss those for a while.

You might be wondering why I am pontificating on these in a technical blog. The reason is that so many of the things that I do are essentially problem solving. When you sit down to write a program, you are trying to solve a problem or there wouldn’t be much point in writing it. When you debug a program, you are trying to solve a problem, namely that the program doesn’t work the way that you want. Even reverse engineering is a series of problems although this is subtler. Engineers tend to be problem solvers though I could name quite a few managers who would argue that engineers were actually problems in themselves :-)

Anyway, problem solving is very much an engineer thing and this is a mixed blessing. We tend to try to solve all problems even when no solution is requested. If you go to an engineer to complain that your relationship with your spouse is not going well, they will try to offer advice even though most are far from expert at relationships themselves… and all you wanted was an audience and not advice. Never the less, a lot of time and effort goes into turning ordinary well balanced people into engineers because someone needs to make sure that the lights work and all that other technical stuff.

A technique that is often taught is logical decomposition. – breaking a problem down into smaller and smaller parts until each part is trivial and therefore solvable. There is a lot to be said for this and it has been a mainstay of programming for many years. It is perhaps the perfect technique for ISTJs if you are familiar with Myers-Briggs personality types. It is very good at the sort of problems that it is good for and useless for the rest – but programming is generally an area well suited to this approach. It fails miserably with problems such as “How do I travel faster than light?” or “How do I travel in time?” because those problems don’t break down neatly into smaller problems. A lot of the skill in using decomposition for programming problems comes in knowing where to put the boundaries when looking at the problem. One major weakness of the approach is that there is no “big picture” analysis and this may be a problem. There tend to be a great many pieces at the end of the day and managing those can be a problem in itself. However, most programmers/software engineers/code monkeys et al tend to be most skilled with this approach as they have been taught to see it as *the* problem solving technique.

How might we solve a performance problem? Well, there are a few things that we would do initially. The first is to find out what sort of performance problem it is. Are we CPU bound? If so, we probably have a poor choice of algorithm and need to improve it or (more often in my experience) do less unnecessary work. Are we disk bound? Better hardware or better caching can help there. Maybe there is a lot of contention for resources and the system is blocked on that – common when you have a multiprocessor beast of a box and a highly contended mutex or something like that. Record locking on a database is another common scenario that looks like this. Whatever the cause, the solution seems to decompose into two steps:

1. Find out what it is doing.
2. Find a way of making it not be a problem.

Ok, there are quite a few ways of trying to work out what it is doing. You can step through it (in big or small lumps depending on how well you understand it), repeatedly dump the process state and examine it that way or add tracing logic or instrument the code in some way – Perfmon often being a good start. None of these decomposes terribly well into a set of repeatable steps and maybe that is why it is hard to get a programmer to be a good debugger. Fixing it often involves coming up with a better solution to the problem than the original implementation. This generally relies on knowledge of how things work under the covers. Against, decomposing the problem is of limited value here.

Ah, but wait, I hear you say… this is only one sort of problem solving and there are many others. This is of course true. The approach taken for different problems is different again but it seems that flexibility is the key to so many of them. Even in as limited a field as IT, there seem to be too many approaches for most organisations to be able to cover all the required bases well. That means a lot to teach and learn, even for a subset of the skill of troubleshooting which is itself a subset of problem solving.

As well as trying to understand what is required to troubleshoot so that we can work out what skills need to be passed on, there have been a lot of very clever people trying work out how to get troubleshooting done by people without the skill. So many internal help desks or technical support lines are staffed by nice, intelligent and reasonable people in low cost labour markets who have been given minimal training and a script. In fairness, if the script is well done then it solves most problems. However, when it fails to solve the problem, its failure is absolute. There is always residual need for flexibility and deep technical skills and those are expensive to maintain. It is hard to justify the cost until you need those skills and then any price seems much more reasonable.

Maybe it makes sense to hire in those skills as needed. I certainly hope so as that is a fundamental part of the business model for Digital Looking Glass. So far, so good but your views, as always, are welcomed.

Signing off,

Mark Long, Digital Looking Glass Ltd

P.S. The site has been redesigned if you would like a look.