Practical Computer Science Theory

There are two useful pieces of computer science that are fairly easy to understand. These are the halting problem and P versus NP.

Simply stated, the halting problem says that we can’t (in general) tell whether or not a given program halts. This is important because the two steps in formally proving that a program is correct are to verify that it produces the correct output if it exits (halts) and to show that the program halts for all inputs.

A basic form of the proof of the halting problem starts by imagining that we have a program, called H, which can determine whether or not an input program P halts for all inputs. Then, imagine that we construct a new program H2. H2 takes an input program P and runs H on P to see if P halts. If P halts, then H2 loops forever. If P doesn’t halt, then H2 exits immediately. Now, feed the source for H2 into H2. That creates a contradiction–if H2 halts, then H2 will loop forever. If H2 doesn’t exit, then H2 will quit immediately. Therefore, there is no program H, which means we can’t tell (in general) whether or not a given program halts.

So this means that we will have problems trying to prove programs correct, but what are the practical implications? It turns out that a lot of checking that we would like to have the compiler do is equivalent to solving the halting problem. For example, it would be nice if a C / C++ compiler could detect any dereferencing of a NULL pointer. However, this is equivalent to solving the halting problem. Imagine that you have a new compiler, called N, which can determine whether or not a given program P dereferences a NULL pointer. Run N on P. If P does not dereference a NULL pointer, then create P2, which is P plus a NULL pointer dereference right before P exits. Now run P2 through N. If P2 dereferences a NULL pointer, then P2 halts. Otherwise, P2 does not halt. Thus, checking for dereferencing NULL pointers is equivalent to solving the halting problem, which means that it can’t be done (in general). The same goes for all sorts of other useful checks.

The other part of computer science theory I want to talk about is P versus NP. Some people call these “Possible” and “Not-Possible”, but the letters actually stand for Polynomial and Non-Polynomial. This refers to the complexity of the best known solution.

There is an entire class of problems called NP-complete. These problems are isomorphic to each other. If you can solve one of them, you can solve them all. For these problems, then best known solution is non-polynomial, which means that they become computationally intractable at very small sizes. One of the classic problems is the travelling salesperson. A salesperson wants to minimize the cost of travel between various cities (cost can be the cost of plane tickets or the time it takes to travel between the cities). Finding the optimal itinery to visit each city is NP-complete. Basically, the only completely correct algorithm is to check every possible path to see which is the cheapest.

If you find that the problem you are working on is NP-complete (equivalent to one of the known NP-complete problems), then you have two choices. You can stop looking for a good solution, or you can get ready to receive your Turing award and the appreciation of many programmers.

– Scott

Leave a Reply

You must be logged in to post a comment.