STRUCTURED PROGRAMMING

Parts isn't Parts

Having given it a lot of thought, I've drawn out a map of how I see the software-design equation, as shown in Figure 1. Keep in mind that this is just my view of things, based on my own experiences. You may see it differently -- but it works for me, and may help people who still don't have a clue about this stuff.

Design problems have sorted themselves out for me over the years in terms of the level of coupling of the components being used. The vertical axis of Figure 1 relates to this level of coupling, with the greatest level of coupling at the bottom of the map, and the least at the top.

Coupling can be tough to define if you aren't steeped in the lore of software design. Coupling is the degree to which the individual components share assumptions. The coupling between two adjacent statements in a program is 100 percent, because they share assumptions about scope, local and global variables and the general mission of the code sequence that they're part of. At the other end of the spectrum (and at the top of Figure 1), the coupling between two applications in an information system is probably closer to 5 percent, maybe less. The two applications share only a handful of very high-level assumptions about how they work together, and perhaps some additional assumptions about how data passes between them. Aside from that, they're highly independent entities, and don't even have to be running on the same machine or even the same kind of machine.

The Other Meaning of "System"

There's a source of confusion here. The word "system" has two very different meanings in programming parlance. In the PC world, "system-level programming" means working right down at the metal, hacking things like drivers, BIOS layers, and so on. Don't confuse this with what a lot of people call an information system; that is, a coordinated, ongoing process that includes multiple applications running sequentially or concurrently, on one or perhaps many different machines, with manual operations, data entry, output reporting, and perhaps several different levels of connectedness (does anyone aside from me abhor that awful nonword, "connectivity?") through different technologies among the several host processors.

People who have worked in UNIX or mainframe shops know what an information system is; many people who work solely on PCs do not. Much of UNIX programming, even on a much more modest scale, is done on the information system model; UNIX utilities can be strung together with one utility piping data into another with very little coupling between the utilities. PC platforms have lacked this level of operating-system intermediation until very recently. However, Microsoft's object linking and embedding (OLE) API, introduced with Windows 3.1, will allow serious application integration on the information-system model, right there on your PC. But that's another column or six; we'll get to it.

At the information-system level, I've seen nothing to match Ed Yourdon's method of structured design. The Yourdon scheme focuses overwhelmingly on the flows of data through an information system, and assumes an extremely low level of coupling between a system's components.

Keeping coupling to a minimum is a good goal to have, as long as you know when it simply i n't possible. Yourdon's structured design method breaks down when you start working on a single application whose components, for efficiency's sake or for other reasons (like the unavoidable internal coupling level of Turbo Vision), are tightly coupled.

I'm not going to recap structured design, Yourdon-style, here. It works best in massive systems running on several machines, and I don't think most of you walk that path. For something like a modest vertical-market application, I think the Yourdon scheme, while usable, quickly gets to be more trouble than it's worth.

Procedure-level Design

Down at the other end of things is procedure-level design, which is quite simply the design of program elements that do Just One Thing. This encompasses typical Pascal procedures and functions, object methods, and some simple filter-style utility programs.

In my experience, most people design a procedure in the following way: They define in a paragraph or two what the procedure must do (often without ever writing that definition down), then define the nature of the inputs and outputs, and finally draw a flowchart that steps through the statements and branches that implement the procedure's mission. When the time comes to actually write the code, they write it right from the flowchart.

This works. I did it a lot while I was writing Cobol, Basic, and some of the experimental in-house languages in use at Xerox in the late '70s. The flowchart is the bulk of the design, and love 'em or hate 'em, flowcharts have the advantage that they can be implemented in nearly any language, no matter how primitive.

Flowcharts have the massive disadvantage that they come to us from the dawn of time, and don't express the control-flow structures that define structured programming today. You can fake a for loop in a flowchart with some care, but there's no single symbol that represents a for loop, or a while loop, or anything more than steps-and-branches. Flowcharts are assembly language tools, and they have this nasty habit of making your Pascal code come out looking like some weird variant of assembly language.

Successive Refinement

I stuck with flowcharts for procedure-level design for a long time because they were what I had. Then I read a remarkable book called Programming Proverbs, by Henry Ledgard (Hayden Books, 1975). It described a method for designing procedures called stepwise refinement, a term I later learned was coined by Niklaus Wirth himself, the man who designed Pascal, Modula-2, and Oberon.

You may not be able to find this book anymore, but if you spot a copy down at Just Used Books 'N Things Etc., grab it. It's not product specific because it comes from a time when there were no products, and the perspective is certainly refreshing.

Successive refinement substitutes pseudocode for flowcharts as the end-product of a design task. Pseudocode is English-like verbiage that describes statement-level program action in structured, language-independent fashion. (At least for languages that implement the standard suite of control-flow structures.) There's no standard definition for pseudocode, otherwise somebody would write a compiler that compiled it directly to .EXE, and it wouldn't be pseudo anymore. What matters is that it be both logically correct and understandable.

As with flowcharts, pseudocode can be implemented in any structured language. It's a much shorter trip to real code than from flowcharts, since all the control-flow structures are there in the pseudocode in English-like form. In fact, the biggest problem in writing pseudocode is resisting the temptation to sprinkle it with actual program statements. If you really need genuine, language-independent pseudocode (and if you ever in any possible world would have to switch languages, it's a damned good thing to have in a drawer somewhere), you'd better watch yourself pretty hard. On the other hand, if you simply work in one language and that's all, you can make the transformation from pseudocode to real code a gradual one, and drop in the actual code statements at any point where they occur to you.

The Process

Successive refinement begins with a single, precise statement of what the procedure must do, preferably written in one sentence. Why one sentence? It's a trick I use to enforce a proper narrowness to the mission of the procedure. A procedure should not try to do too much. A single procedure that is, in truth, two or more procedures tightly coupled to one another inside a phony single-procedure shell will cause you no end of trouble later on.

Let's pull a simple example together here. Suppose in your struggles you unearth a need to determine how long the longest line in a given text file is -- and say you're still green enough so that you can't just code it all directly in the back of your head. Start with a concise statement of what the procedure must do:

Return the length of the longest line in the opened text file passed as a parameter.

They won't all be this crisp, and when they're not, I suggest suffering over that initial statement a little. Mistakes made early in the process can't always be corrected later. More often than not, a bad initial statement will cause you to paint yourself into a corner later on and force you to start from scratch.

Once you have an initial statement you can live with, begin to refine it. You refine it by breaking it down into its major component actions. Work in levels; that is, don't try to go from initial statement to finished pseudocode in one swell foop unless the proc is totally trivial. The understanding of the problem you gain in defining the pseudocode at each level will help you more crisply define the next level. In other words, work it through. Like it says on every paint can ever made, several thin coats are better than one thick coat.

To continue, take a stab at refining our initial statement:

Position the file to its first record.
Scan the file, replacing a maximum-length value with a new length value each time a longer one is found.
After the last record is read and tested, return the maximum-length value found.

Our initial statement had at least three statements inside it. Examine each of the new statements individually, to see if they make sense. If they do, refine again:

Initialize the maximum-length variable to 0.
Position the file to its first record.
While records are in the file, repeat this:
Read a record.
If its length is greater than the value of the maximum-length variable, replace the maximum-length variable's value with the new length.
After the last record has been read and tested, return the value of the maximum-length variable.

Notice that during this refinement we've implicitly defined a variable. Some purists have challenged me to define all my variables before I begin refining the initial problem statement, since everybody knows that data drives good design. Well ... not quite. At the procedure-design level, code and data are peers. We're not fussing with Big Picture stuff here. We're zeroing in on individual code statements. The refinement of the nature of the procedure's data is as much a part of the process as the refinement of the nature of its code. You should write down variables in some sort of a separate list as you determine that you need them. "MaxValue; an integer" is all you need to say.

Pseudocode Tools

Demented writer/editor that I am, I do now and have always written my pseudocode in my favorite word processor. (Heck, I used to write my Pascal/MT+ code itself in WordStar's nondocument mode.) My friend Chris Nelson pointed out something that I am genuinely amazed not to have hit upon before now: Outline processors are naturals for successive refinement.

If you write your pseudocode in an outline processor, you have the ability to refine and retain each level of detail rather than simply expanding a level and thereby losing the prior level. This gives you an intelligent way to accomplish the "artful hiding of detail" that Niklaus Wirth says is the main purpose of structured programming.

If you have an outline processor lying around or can find one, give it a try. It still feels a little strange to me, but I can sense myself gradually becoming addicted.

Where to Stop

Knowing how far to take pseudocode is a bit of an art, and again it depends on what the pseudocode will eventually be used for. What I watch for is the point when all ambiguity has left the pseudocode. That, too, is a judgment call.

Once you have your pseudocode, look it over with a critical eye. Most importantly, see if you've left anything out. Real-world procedures that deal with files should have some sort of error handling, and I haven't yet added this to the pseudocode described earlier. There may be other things, too -- does the procedure have to set a help context somehow? Are all variables that need initializing initialized? Does some aspect of the pseudocode's action imply a variable that I haven't explicitly described and initialized? (I've been stung on this one more than once....)

The overwhelming tendency among programmers would be to immediately take the pseudocode for a procedure to real code once that procedure's pseudocode was declared finished. It can be helpful to hold back and at least design any related procedures before beginning coding. Defining the procs that work with a proc you've already defined can spotlight conceptual errors in the first design. You may think of some new task that has to be done somewhere, and the best somewhere (after you've designed a half-dozen somewheres) may well be within the first procedure you designed.

It works both ways. I'm not a purist but a realist, and I take some heat for that occasionally. One of the heretical points I have made is that coding one subsystem can shed certain kinds of light on the design of another subsystem that no amount of analysis or deep thought can. This is especially true if your tools are evolving faster than you can climb their learning curve to genuine mastery. (This has been a growing problem in the last few years, as machine performance and tool sophistication continue spiraling out of sight.) This is another consequence of the Parts is Parts Fallacy: Like it or not, your tools occasionally dictate to you, and sometimes you can do nothing but bow and nod. You may not have time to get really good at a tool before beginning work on a project. The project may be the only way to learn the tool.

In a perfect world, where all programmers are full masters of their tools and the tools sit still for years on end, you design fully before you begin coding. In our world, you do whatcha gotta do to make things work.

Aiming for the Middle

I've recapped procedure-level design here because I can; it's well defined and just about everybody can learn to do it well following the guidelines in this column.

You'll notice, however, that we haven't yet touched on the middle of the diagram in Figure 1. This is where all the neat stuff happens, and it is also the toughest area in which to design. Designing at the information-system level is messy simply because the system tends to be big. Making it work at all is the realistic goal -- few teams that implement such big systems ever bother to try to make them work efficiently or quickly. If the sole value for a system is that it work, formal methods can serve you well, in that they can guide you to a piece of code that produces the set of logical outputs for a given set of logical inputs.

At the information-system level, flexibility is also an important value, because when a system is spread out over a WAN or crosses the boundaries between mainframe, mini, and PC, chunks of the system tend to be ripped out and replaced regularly. Minimal coupling is thus essential, and performance can only be tuned with minimal coupling enforced. Since information systems are almost always custom software without competition on the open market, the users are stuck with what they get and performance or usability is less of an issue than with commercial applications.

The middle of the chart is the area where you're squeezed between the rocks and the sky. It's all well and good to be a design purist and do things "by the book" -- only to discover that the application works so badly that no one will buy it. I've seen this happen -- almost always to innocents who are right out of school and green enough to believe everything their design textbooks tell them.

The thing to understand about application-level design, if you understand absolutely nothing else, is this: You cannot substitute formal methods for a thorough understanding of the problem and a creative enthusiasm for the task.

Parts is not parts.

Understanding is everything.

That's it for this issue. Slap down Figure 1 on your copier and tape the copy to your wall. We'll come back to it next month.