This article contains the following executables: ALLEY.ARC
Want to start an argument? Ask a roomful of programmers to define algorithm, then stand back and get ready to duck. Everybody, it seems, has their favorite interpretation. Robert Sedgewick defines algorithms as "methods for solving problems that are suited for computer implementation." Donald E. Knuth identifies the word's origin as the name of a Persian textbook author, "Abu Ja far Mohammed ibn Musa al-Khowarizmi." After suggesting that "an algorithm must be seen to be believed," Knuth describes an algorithm as "a finite set of rules which gives a sequence of operations for solving a specific type of problem." The Encyclopedia of Computer Science and Engineering wins the prize for brevity, calling an algorithm simply a "method of solution."
I like that one best--it's as neat as a paper clip. No definition, however, will convince some programmers that algorithms are more than just academic exercises. Real programmers don't study algorithms, critics might say; real programmers just wing it. That's the view I aim to counter in this column, "Algorithm Alley." Studying algorithms is a great way to improve your programming skills while building a personal library of problem-solving tools. Implementing them can also be challenging and fun. Each month I'll present one or more algorithms, showing their inner workings in pseudocode, focusing on practical applications and problem solving, and listing example programs in Pascal.
Why Pascal? Several reasons. Pascal is widely available--most programmers have a Pascal compiler hanging around. Pascal is algorithmic in nature--algorithms and their implementations resemble one another. Pascal programs are easily ported to other languages--whether you program in C, C++, Basic, assembly language, or object-oriented Awk (just kidding), you can use the algorithms in this column. This is not a column about programming in Pascal. It's a column about programming with algorithms, illustrated using Pascal.
You might think of me as DDJ's algorithm gourmet. I dish out the recipes; it's up to you to cook the turkey and season the sauce. If you can do it better, let me know. If you want to provide detailed analysis of an algorithm or suggest a new one to cover, be my guest. Go ahead, show me your code. Together, perhaps we'll both learn some savory techniques.
Before jumping into this month's algorithm, it will help to define a few ground rules. Borrowing from a variety of sources, I came up with four elements of a proper algorithm. To be an algorithm, a "method of solution" must have:
Side Streets
A simple example illustrates these characteristics, and also introduces the format I'll use for this column's algorithms. Example 1, Algorithm #0, shows the pseudocode of an algorithm for finding the square root of a value, using a formula known as the "Newton-Raphson technique." Algorithm #0 isn't necessarily the best way to find a square root, but it has all the requirements of a complete algorithm. (By the way, I'll number algorithms sequentially for reference.)
begin
y <-- 1; e <-- 5E-5;
z <-- y * y;
while |(z - x)| >= e do
begin
y <-- ((x / y) + y) / 2;
z <-- y* y;
end;
Write ("The square root of ",x,"=",y);
end.
Example 1 follows several commonly used pseudocode conventions. Language structures such as repeat-until and while-do are the same as in Pascal. Semicolons terminate statements. A left-facing arrow assigns the value at right to the symbol at left. An asterisk stands for multiplication, as it does in most programming languages. Vertical bars mean "absolute value." I'll use scientific notation for floating point, unless a decimal representation makes better sense. In the example, 5E-5 equals 5x10{-5}. Character strings are double quoted.
Algorithm #0 is simple to understand. To find the square root of x, the method sets a test value y to 1, then tests whether y{2} minus x is greater or equal to some small value e. If not, y holds the square root of x, accurate to within e. Otherwise, the method sets y to the next test value, using the algorithm's formula to move y inevitably toward a solution.
Can you spot the flaw in Algorithm #0? The method works only for positive nonzero values of x. A more complete algorithm might use an If statement to test whether x is less than or equal to zero. Refinements like these are what make the study of algorithms interesting--and, at times, frustrating.
Listing One (page 144) implements Algorithm #0 in Pascal. The program closely matches the pseudocode. Run the sample, then rewrite it in your favorite symbolic tongue, Use the Pascal implementation to check your results. Try any improvements that come to mind. For example, consider how you might do away with the duplicated statement that squares y.
Of course, if your language has a square-root function, there's little reason to use Algorithm #0. The method might come in handy, however, in an assembly language program with no access to a math library. That suggests another benefit of collecting algorithms: The methods you have little use for now might be invaluable somewhere down the road.
Next, let's try an algorithm with more potential. You can even use this one to help your compiler run more efficiently.
Many sorting algorithms operate on independent data elements. The QuickSort algorithm, for example, can alphabetize an array of strings regardless of their original order.
Not all sorting techniques, however, are designed to put all of your ducks in a row. Some methods rely on a data set's partial ordering, preserving existing relationships between elements. One such method is called topological sorting.
Partially ordered data sets are commonplace. A university's courses, for instance, typically list others as prerequisites. Topological sorting can arrange a student's course selections so that all prerequisites are taken in the proper sequence.
Topological sorting can also help a compiler run more efficiently. Listing dependencies between a program's modules, and then sorting that list topologically gives the most-efficient module order, so the compiler opens the fewest number of header files (or units in Turbo and UCSD Pascal) at a time. More on this later.
Figure 1 shows a graph of a partially ordered data set. The circled letters might represent university courses or program module names. Read arrows as "precedes." Thus B precedes C, F is a prerequisite of G and E, and so on.
The goal is to rearrange the jumbled graph in Figure 1 into a linear sequence, so it resembles Figure 2. Reading from left to right, the letters are topologically sorted so that prerequisites always come first.
A linked list is convenient for representing partially ordered sets in a computer's memory (see Figure 3). Head and tail pointers identify the list's beginning and end. Items appear along the top row. A secondary list of "follower records" shows each item's successors. In the figure, item C has two successors, G and D. Compare this figure with the graphs in Figure 1 and Figure 2. Note that item E precedes nothing.
The topological sorting algorithm first locates all "leader" items with no predecessors. There must be at least one; otherwise, the list is not partially ordered, and it can't be sorted. Deleting and outputting the leaders results in a new, partially ordered list, again having at least one item with no predecessors. The algorithm repeats, deleting and outputting items, until none remains.
Example 2 shows the pseudocode for Algorithm #1, Topological Sort. Comments are in braces. First, the method initializes an empty list. Then it reads elements, designating pitem as the predecessor of item. The items are linked into a list by an unspecified Search function, and a follower record is created to keep track of pitem's successor.
{Initialize}
New(head);
tail <-- head;
{Input data}
Read(pitem);
while not Eof(input) do
begin
Read(item);
a <-- Search(pitem);
b <-- Search(item);
Create new follower record f;
Identify f as item at b
Link f into chain at a
Read(pitem);
end;
{Find leaders}
a <-- head;
head <-- nil;
while a <> tail do
begin { Search list }
b <-- a; a <-- a^.next;
if b has no predecessors then
begin { Link b into list at head }
b^.next <-- head;
head <-- b;
end;
end;
{Sort and output}
b <-- head;
while b <> nil do
begin { Search list }
c <-- b^.chain;
b <-- b^.next;
while c <> nil do
begin { Output item }
Write(b);
a <-- c^.id;
if a has no predecessors then
begin { Link a into leader list
at b }
a^.next <-- b;
b <-- a;
end;
c <-- c^.next;
end;
end;
After the algorithm reads all elements, the list resembles Figure 3. The algorithm then finds the leaders--those items having no predecessors--and links them into a new list at the head pointer. The final stage outputs and deletes the leaders, then searches for new ones (sort of like what we do in this country at election time). Eventually, the list is depleted, and the algorithm ends.
Listing Two (page 144) implements Algorithm #1 and includes a Search function that builds an element list in memory. I designed the sample program to read string elements in pairs, separated by blank lines and stored in a file. To test the program, I extracted a list of module names from a medium-size program: a game that I wrote named "Mancala" (not shown here). With each word on a separate line, the list looks something like this: UGlobals UEval <blank line> WinTypes UGlobals <blank line> WinProcs UGlobals <blank line> Idents UGlobals <cr>.
From this list, module UEval depends on (uses) UGlobals. The UGlobals module uses WinTypes, WinProcs, and Idents. Of course, the actual list is much longer. Sorting the list topologically gives the most-efficient module ordering: Idents, WinTypes, WinProcs, UGlobals, and Eval.
Using this output, I can reorder each source file's include or uses statements to help the compiler use fewer files and make better use of RAM. (I assume that either the compiler is smart enough not to reread headers that it has already seen, or you have used the common C programmer's trick of defining symbols to prevent including the same headers more than once.)
Listing Two uses two records to store items on a linked list, and to keep track of item relationships. The Leader record holds a key (representing data stored in this element), a count of its predecessors, a pointer to the next Leader record, and a chain pointer to the Follower records that keep track of an item's successors.
Follower records address their Leader with a pointer named id. Another pointer addresses the next Follower. Traversing this secondary list locates all of an item's successors--duplicating in code the arrows from the graphs in Figure 1 and Figure 2.
The sample program also keeps a global itemCount. If this count is nonzero after the program deletes all items from the list, the data set was not partially ordered, and the program displays an error message.
To run the program, create a text file of dependent items with a blank line between each pair. Follow the last line with a single carriage return. Feed the input file to Listing Two. For example, under MS-DOS enter a command such as topsort<sample.dat.
It's my pleasure to introduce this new column, and I'm looking forward to many months ahead. Drop me a line in care of DDJ. I welcome all comments, and if you have an algorithm to share, send it in. That'll be right up my alley.
_ALGORITHM ALLEY_ by Tom Swan[LISTING ONE]
{ sqrroot.pas -- Algorithm #0: Square Root by Tom Swan }
program SquareRoot;
var
e, x, y, z: Real;
begin
Write('Value? ');
Readln(x);
if (x <= 0) then
begin
Writeln('Error: Value <= 0');
Exit
end;
y := 1.0;
e := 5E-5;
z := y * y;
while abs(z - x) >= e do
begin
y := ((x / y) + y) / 2;
z := y * y
end;
Write('The square root of ', x, ' = ', y)
end.
[LISTING TWO]
{ topsort.pas -- Algorithm #1: Topological Sort by Tom Swan }
program TopSort;
type
PLeader = ^Leader; { Pointer to Leader records }
PFollower = ^Follower; { Pointer to Follower records }
TKey = String[40]; { Item keys read from file }
Leader = record { Leader record }
key: TKey; { Data in this record }
count: Integer; { Count of key's predecessors }
next: PLeader; { Next Leader record in list }
chain: PFollower; { Pointer to first Follower in chain }
end;
Follower = record { Follower record }
id: PLeader; { Pointer to following Leader }
next: PFollower; { Pointer to next Follower in chain }
end;
var
head, tail: PLeader; { Leader list head and tail pointers }
itemCount: Integer; { Total number of items in list }
{ Search list for item. Return pointer to its Leader record }
function Search(item: TKey): PLeader;
var
h: PLeader; { Pointer to head of list }
begin
h := head;
tail^.key := item; { Create sentinel at dummy record }
while h^.key <> item do
h := h^.next;
if h = tail then
begin { Insert new item at head of list }
new(tail);
itemCount := itemCount + 1;
h^.count := 0; { No predecessors for new item yet }
h^.chain := nil; { No follower chain }
h^.next := tail { Link new record into list }
end;
Search := h { Return pointer to item's record }
end; { Search }
{ Read data into list and construct follower chains }
procedure InputData;
var
pitem, item: TKey; { Predecessor and item }
a, b: PLeader; { Leader record pointers}
f: PFollower; { Pointer to Follower record }
begin
Writeln('Input data:');
Readln(pitem);
while not eof(input) do
begin
Readln(item);
if not eof(input) then
begin
Writeln(pitem, ' << ', item);
{ Find or insert predecessor and item into list }
a := Search(pitem);
b := Search(item);
{ Construct follower and link into chain }
New(f);
f^.id := b; { Address following item in chain }
f^.next := a^.chain; { Link new follower into chain }
a^.chain := f;
b^.count := b^.count + 1; { Increment predecessor count }
Readln; { Read blank line between item pairs }
Readln(pitem) { Read next item if any }
end
end
end; { InputData }
{ Find leader records with no predecessors }
procedure FindLeaders;
var
a, b: PLeader; { Leader record pointers}
begin
a := head;
head := nil;
while a <> tail do
begin
b := a;
a := a^.next;
if b^.count = 0 then
begin
b^.next := head;
head := b
end
end
end;
{ Sort and output records }
procedure OutputData;
var
a, b : PLeader;
c : PFollower;
begin
Writeln; Writeln('Output data:');
b := head;
while b <> nil do
begin
Writeln(b^.key);
itemCount := itemCount - 1;
c := b^.chain;
b := b^.next;
while c <> nil do
begin
a := c^.id;
a^.count := a^.count - 1;
if a^.count = 0 then
begin { Insert a^ in b-list }
a^.next := b;
b := a
end;
c := c^.next
end
end
end; { OutputData }
begin { TopSort }
New(head);
tail := head;
itemCount := 0;
InputData;
FindLeaders;
OutputData;
if itemCount <> 0 then
Writeln('Error in data set: not partially ordered')
end.
Example 1
begin
y <- 1; e <- 5E-5;
z <- y * y;
while |(z - x)| >= e do
begin
y <- ((x / y) + y) / 2;
z <- y * y;
end;
write("The square root of ", x, " = ", y);
end.
Example 2
{ Initialize }
New(head);
tail <- head;
{ Input data }
Read(pitem);
while not Eof(input) do
begin
Read(item);
a <- Search(pitem);
b <- Search(item);
Create new follower record f;
Identify f as item at b
Link f into chain at a
Read(pitem);
end;
{ Find leaders }
a <- head;
head <- nil;
while a <> tail do
begin { Search list }
b <- a; a <- a^.next;
if b has no predecessors then
begin { Link b into list at head }
b^.next <- head;
head <- b;
end;
end;
{ Sort and output }
b <- head;
while b <> nil do
begin { Search list }
c <- b^.chain;
b <- b^.next;
while c <> nil do
begin { Output item }
Write(b);
a <- c^.id;
if a has no predecessors then
begin { Link a into leader list at b }
a^.next <- b;
b <- a;
end;
c <- c^.next;
end;
end;
Copyright © 1993, Dr. Dobb's Journal