LETTERS

Catching A Few Rays

Dear DDJ,

Your recent article, "Ray Tracing," by Daniel Lyke (September 1990) has touched on a subject close to my heart. We in the optics business are always tracing rays, not to make nice pictures, but to design lenses. My own specialty is stray light analysis, in which even more rays are traced. The equations used in the article to describe reflection in terms of a unit normal vector and an incident ray unit vector, are among the equations I use in writing stray light analysis programs. In stray light analysis programs, however, we also compute scattered and diffracted "rays" with good radiometric accuracy.

Let me back up for a moment and briefly describe what stray light analysis is. I'm sure you've heard the advice given to amateur photographers to avoid shooting into the sun. The reason is that the sun will reflect off of lens surfaces and scatter off the inside of the lens barrel, causing much stray light and spoiling the picture. The job of the stray light analyst is to quantify this stray light versus position of the sun, or any other bright source of light. Several computer programs have been written to accurately analyze stray light. The accuracy required is extreme: An optical system may reduce stray light by a factor of 10-10 with an elaborate system of baffles, and the answer must still be known within about a factor of two.

For the past ten years I have maintained a stray light analysis program (written in Fortran around 1970 for a mainframe computer), and I have recently completed a new stray light analysis program (under contract) written in C for the IBM PC. Both use Monte Carlo methods to sample the stray light. I have always viewed the computer graphics industry from afar, with envy, but long ago realized that stray light programs would make wonderful engines for creating photo-realistic images. Alas, I have never had the time (i.e. funding) to find out just how good they would be.

The programs I work with can currently model any second-order surface. By this I mean any surface that can be represented by a second-order equation in Cartesian coordinates. This includes spheres, cones, any conic section of revolution (paraboloids, ellipsoids, hyperboloids, oblate spheroids), and all combinations such as parabolic cylinders, elliptic cones, hyperbolic paraboloids, etc. They can also model planes and tori, which are fourth-order surfaces. Many objects can be built up from these basic surface types. Not only that, but stray light analysis programs can also accurately model light emitted by hot objects.

The aspect of your article that struck home is that the same techniques you used for tracing rays are also used in stray light analysis programs, with surfaces being represented in a global coordinate system and second-order ray-surface intercepts being solved using the good old quadratic equation. The rest of the optics industry represents surfaces relative to one another with no global coordinate system and uses specialized ray tracing equations. A difference between stray light and computer graphics is that with my programs, I often trace rays starting from the source and going toward the detector, just like the real photons. (The detector may be an eye, photographic film, vidicon, or electro-optic detector.) The computer graphics industry seems to have revived the ideas of the ancient Greeks, who believed that light emanates from the eyes to allow the observer to see.

In conclusion, I have nothing to report other than my own enthusiasm and hope that I may have time to experiment with computer graphics before it's too late. I am soon to have a 486 computer with 1024 VGA display on my desk, however, so I may find the temptation to experiment irresistible.

Edward R. Freniere,

Vice President

Telic Optics, Inc.

Marlborough, Massachusetts

Optimizing Super VGA

Dear DDJ,

Christopher Howard's "Super VGA Programming" (DDJ, July 1990) is well written and informative. I have a problem, however, with his inference (at the top of page 22) that checking for a video bank switch must be done on a pixel-by-pixel rather than a line-by-line basis.

It is entirely possible to load not only a line, but multiple lines, without worrying about running into a bank boundary. The following code fragment (taken directly from Blackhawk's "Database Graphics Toolkit") illustrates how to do this (the numbers chosen let us load 80 columns of previously formatted 16-line graphics-mode text with a single move (80 x 8 = 640; 640 x 16 = 10240; 65536 - 10240 = 55296)). See Example 1.

Example 1

  bnkchk: push    dx;           AH:DI contains computed offset.
            cmp     ah, BNK;      Have we gone past a boundary?
            je      sambnk;       If so, no hardware changes.
            mov     BNK, ah;      Save new bank number.
            call    [BNKAD];      Do hardware bank switch.
  sambnk: mov     dx,0a000h;    Load standard VidRam segment.
            cmp     di,55296;     Memory wraparound possible?
            jb      bnkext;       If not, use standard VidRam segment.
            add     dx,640      Move segment forward 640 paragraphs.
            sub     di,10240;     Move index back same amount.
  bnkext: mov     es,dx;        Set ES register.
            pop     dx;           DX preserved, AX destroyed.
            ret;                  ES,DI altered on return.

The substance of this method is to divide the bank-switching problem into a video hardware component and a memory access component; it works because a memory reference falling within video memory will always be directed to video memory, whether or not the segment register points to its start.

A possible objection to the method is that, on a dual-monitor system, one might run into the monochrome video buffer at segment 0B000H, but, by shifting the DI register down, the method itself prevents this.

John D. Brink,

President

Blackhawk Data Corporation

Chicago, Illinois

Chris responds: My quick comment about optimizations only stated that some optimizations were limited. To expand a little further, I know of some software packages that optimize by storing an array of pointers to each scanline. This gets around interleaving problems, etc., but it does not work if the scanline crosses a 64K bank boundary.

What you refer to in your letter is similar to what we do in our GX Development Series of programming tools. We call it Thresholding which defines the point at which we can move a given amount of data the fastest way possible. If we are below the threshold, we use fast rep movsw instructions. If we are above the threshold, we need to move more slowly and wait for the point at which the bank needs changing.

However, your programming example is incorrect as written. When you are checking for a memory wraparound, you are in effect checking for what we call the threshold. Except if a wraparound is possible, you are simply calculating a new segment: offset for the same memory location. This does not prevent the wraparound, it merely insures that DI does not overflow. In fact, it converts a memory wraparound problem into a memory overwrite problem. Instead of wrapping back up to A000H, it makes sure that data is written past the VGA video address space into B000H. You bring up this point yourself, but you mention that "the method itself prevents this." By your example, the method does not prevent this. You are correct that the segment does not have to point to the top of video memory (an address is an address -- you can point to the same place with many segment: offset pairs), but that still does not help here.

Perhaps in illustrating your point, some code was inadvertently left out of your example. In any case, you are correct in that it is possible to optimize around the 64K bank switching without testing every single pixel. The point I was making in the article was simply that some optimizations are not possible or are more difficult to implement if the bank ends in the middle of a scanline. Your example serves to illustrate this perfectly.

Throwing A Slow Curve

Dear DDJ,

Thanks for the interesting graphics issue (DDJ, July 1990). I really appreciate learning about algorithms that I can actually use. When I first read Todd King's article on Bezier curves, I was amazed at the time performance difference he found between the deCasteljau and literal rendering methods of drawing Bezier curves. But after a careful look at the listing, I realized that most of that difference is due to a difference in the number of lines drawn per Bezier curve by the methods, rather than differences in algorithmic efficiency. In King's program, for the literal rendering method, 101 lines per curve are drawn; for the deCasteljau method, either 24 (for cut-off 3) or 48 (for cut-off 4) are drawn. By changing the increment in the for loop for the literal method to 0.4 or 0.2, 25 or 51 lines per curve are drawn for the literal method with little or no loss in appearance. (Note: The article says that the variable DPU is used to determine the number of lines drawn for the literal method, but DPU is defined in the listing, then never used.) With this improvement, the deCasteljau method is only 2.8 times as fast, instead of five to ten times as fast.

The literal method is also unnecessarily slow in King's implementation because of inefficient coding. I replaced the routine draw_bezier 1 with the code in Example 2 to avoid recalculating reused factors. With this alteration, the deCasteljau method is only about twice as fast as the literal method. The differences are smaller if a floating-point coprocessor is available: deCasteljau is only about 1.2 times as fast as improved literal with an 80x87.

Example 2

  draw_bezier1 (bcurve)
  BEZIER-BOX *bcurve;
  {
   float x, y;
   float tm;
   float t, t2, t3, a, b, c;
   move to ((int)bcurve->a.x,(int)bcurve->a.y);

   for (t=0.0; t<=1.0; t += 0.02) {
    t2 = t * t;
    t3 = t2 * t;
    a = 1 - 3*t + 3*t2 - t3;
    b = 3*(t - 2*t2 + t3);
    c = 3*(t2 - t3);
    x = a * bcurve->a.x
        + b * bcurve->b.x
        + c * bcurve->c.x
        + t3 * bcurve->d.x;
    y = a * bcurve->a.y
        + b * bcurve->b.y
        + c * bcurve->c.y
        + t3 * bcurve->d.y;
    lineto((int)x, (int)y);
   }
  }

Further improvements can be made to both methods: 1) If a curve is really a straight line, it should be drawn as a line, not a Bezier curve; 2) When a character is scaled down, many of the lines in each curve have zero length on-screen, and don't need to be drawn; 3) General efficiency clean-ups.

The biggest improvement possible with no 80x87 is to use all integer and long calculations. I have used carefully scaled integers and the literal method in my shareware PostScript drawing program PictureThis, and have reduced the drawing time dramatically. To draw King's six a's (with no fills) on an XT-clone with no 80x87 takes 20.5 seconds for the deCasteljau method (24 lines/curve), 34 seconds for the improved literal method (21 lines/curve), 47 seconds for the unimproved literal method (21 lines/curve), but only three seconds for PictureThis with an integer literal method (19 lines/curve). An integer deCasteljau method should make PictureThis slightly faster.

Picture This provides kerning, accent characters, subscripts and superscripts, font modifications, yet it runs quite well even on a 4.77 MHz PC with 512K memory, CGA (including Hercules with CGA emulation), no expanded memory, no mouse, and no hard disk. Picture This produces standard Encapsulated PostScript (EPS) files (with "show-pages"!).

Pat Williams

Hot Ideas Publishing

Rt. 1, Box 302

Gravel Switch, KY 40328

Todd responds: I had also wondered about the speed difference between the literal and deCasteljau methods of calculating a Bezier curve. Admittedly, I should have taken a closer look at the code and I should have done the same type of analysis you did in order to make a more fair comparison. One other observation is that with either method, a curve is drawn as a series of line segments. The only difference is how the end points of the line segments are calculated. In the deCasteljau method the calculation involves 12 multiplication/division operations and 25 addition/subtractions, whereas the literal method presented in the article uses 28 multiplication/divisions and 18 addition/subtractions. Your improved literal method uses 15 multiplication/divisions and 11 addition/subtractions. If a curve is drawn with the same number of line segments, regardless of the method, then you should expect that the deCasteljau and your improved literal method would calculate in approximately the same amount of time, while the original literal method is still the slowest puppy in the litter.

Encapsulating Memory Allocation

Dear DDJ,

The article "Encapsulating C Memory Allocation" by Jim Schimandle (DDJ, August 1990) describes a code construct which leads to "memory leakage," where a failure in a sequence of memory allocations results in losing those pieces successfully allocated (see Example 3). I agree that memory leakage is a real problem area; its effect is usually only made noticeable after the passage of time and is not made obvious in one's program by simply testing large cases.

Example 3

       if (     (fooptr = (FOO *) malloc (sizeof (FOO))) == NULL
            || (fooptr->string = strdup (name)) == NULL) {
         return (NULL);

  If the strdup() call fails, the memory allocated for the structure
  is lost.

The brevity of constructs such as in Example 3 is quite desirable. In an allocation sequence, explicitly checking success and freeing unusable memory areas after each step may severely contort the code. What seems to be called for is a routine which will correctly clean up after an aborted allocation sequence. I devised a routine, vfree( ), which takes a (NULL-terminated) list of pointers to allocated regions and frees them (Listing One). This allows program constructs such as that in Example 4.

Example 4

       if (    (fooptr = (FOO *) calloc (1, sizeof (FOO))) == NULL

            || (fooptr->string = strdup (name)) == NULL
            || (fooptr->field2 = (F1 *) malloc (sizeof (F1))) == NULL
            || (fooptr->field3 = (F2 *) malloc (sizeof (F2))) == NULL
  ) {
         vfree (fooptr, fooptr->string, fooptr->field2, NULL);
         return (NULL);
       }

The operation of vfree( ) is quite simple -- free each non-NULL pointer in the list. Argument pointers are presented in the same order that allocations were attempted, so all pointers to valid regions are contiguous in the beginning of the list, and no pointers past the first NULL need be considered. When allocating a structure and its fields, the fields must be initialized to NULL (through calloc( ) or a custom allocator). The last attempt's result does not need to be included in the arguments to vfree( ); if it succeeds, the call to vfree( ) will not be made. If all attempts before it succeed and it is the only one to fail, there is no need to free its space.

This function is not only useful in allocation-error memory recovery. Using vfree( ) makes the coding of multiple successive free( )operations cleaner. In addition, there may be a slight runtime benefit as well: Whenever you must call free( ) several times in succession, the compiler must generate code to push each pointer on the stack and then code to call free( ). Using vfree( ), all of the argument-pushing is retained, but only one function call is made.

The idea is quite expandable. Freeing of nested structures can be accomplished by creating a deallocation function for each level of structure, and extending the concept of vfree( ) to take deallocator/region pairs as arguments. If you need to free a grab bag of pointers, some of which may be NULL (such as in the cleanup stage of a function that could have failed or been cancelled at several places), a form of vfree( ) can be written which takes the number of pointers as its first argument; it would simply count through the argument list and skip the NULLs.

If the allocation for fooptr->string fails, fooptr is freed. If the allocation for fooptr->field2 fails, fooptr and fooptr ->string will both be freed. If the allocation for fooptr->field3 fails, fooptr, fooptr->string, and fooptr->foo2 will all be freed.

George Spofford

Northampton, Massachusetts

Jim responds: The vfree( ) routine proposed by George is a good example of using a single routine to handle a failure. However, I do not agree that the brevity of construct in his Example 3 is desired. It is only desired if malloc( )/ free( ) is your only concern. Most data structures require more cleanup than a simple free( ). Often there are files to close, other data structures that must be updated, and state variables that need to change. For the example presented, vfree( ) is a perfectly good solution. However, it is not generally applicable. What you really need is a constructor/destructor facility as is found in C++.

The only code benefit to be derived is in code size. vfree( ) actually imposes a longer execution time overhead because all the pointers must be copied on the stack twice: once for the call to vfree( ) and once for the call to free( ). Also, the copy loop in the vfree( ) routine takes time. This is not an issue if the failure of the malloc( ) call is relatively infrequent.

_LETTERS TO THE EDITOR_ DDJ, November 1990

Listing One

#include  /* for variable arg macro def's */
/*---  VFREE  -------  Author: george Spofford  ----*/
void VFREE (void *first,...)
{
	static 	void	*ptrs[MAX_VFREE];	/* reordering area */
			int 	i;
	void 	*p;
		va_list	argMarker;		/* variable-args marker */
	if (first == NULL)			/* ensure something to do */
		return;
/*** First, accumulate all pointers into array */
	ptrs[0] = first;
	va_start (argMarker, first);
	for (i = 1; p = va_arg(argMarker, void *);	p != NULL && i < MAX_VREE;			++i, p = va_arg(argMarker, void *) )
		ptrs[i] = p;
/*** Now, free them all, in reverse order */
	while (--i >= 0)
		free (prts[i]);
}