Algorithms for High-Precision Finite Differences

Dr. Dobb's Journal May 1998

Improving the accuracy of numerical methods

By Michael Herstine

Michael is a C++ developer at Northern Research and Engineering Corp. He can be reached at mgh@world.std.com.

In developing applications for science and engineering, code reuse is just as desirable as in other areas of software development, but can often have obstacles peculiar to applied mathematics. In particular, it is common to use the results of legacy code as input to new, higher-level algorithms. In practice, this amounts to coding numerical methods where a single function evaluation (from the standpoint of the algorithm) is, in fact, a run of large amounts of code. Often, that code is single-precision legacy Fortran 77, 66, or even Fortran IV.

In such cases, analytic derivatives are not available, and must be estimated. However, the level of error incurred must be considered. Unfortunately, the subject of finite-difference estimates of derivatives is largely absent from the applied mathematics literature. Researchers are typically interested in issues such as convergence rates, radii of convergence, and breadth of applicability of methods. Little attention is paid to the accuracy of results compared to available precision. Numerical experiments are typically performed in double precision, and results reported to between five and seven significant figures. When derivatives are needed, they are usually obtained via forward differencing (to keep the total number of function evaluations low) with some fixed step size. The availability of 16 figures of accuracy makes the error associated with this method acceptable.

In this industry, however, we often don't have the luxury of throwing away half of our significant digits, especially when forced to work with single-precision code. In this article, I'll implement techniques for dealing with this situation, and present the results of numerical experiments.

I encountered the particular development problem that prompted this article while working on a software package that predicts the performance of compressors. Typically, users adjust various parameters (such as RPM, amount of fluid moving through the compressor, or the physical dimensions of the part) in an attempt to achieve peak performance while still meeting design criteria for weight, overall size, and so on.

In an attempt to automate some of the simpler problems that the designer was solving by hand, we hooked the package up to a constrained optimizer. The idea was to have the optimizer vary selected inputs to the package to maximize performance while still satisfying various constraints. One of the challenges we needed to overcome was the lack of analytic derivatives. The application was already fairly complex and solved many of its subproblems iteratively. The code would need to be treated as a black box, forcing us to approximate the derivatives numerically. Further complicating things, the core analysis routines were made up of tens of thousands of lines of Fortran 77 written in single precision. That is, we were beginning with only about seven digits of accuracy, some of which were lost during the analysis. The additional loss of accuracy incurred by naive use of forward differences essentially prevented the optimizer from converging to any real accuracy.

Theoretical Basis

There are two primary sources of error in all finite-difference estimates of derivatives:

Truncation error, stemming from our having thrown away higher order terms in the Taylor series.
Condition error, stemming from the finite-precision available on a computer.

I'll turn first to the truncation error in forward differences. (For more information on this topic, see Practical Optimization, by Philip Gill et al., Academic Press, 1981, ISBN 0-1-228-3952-8.)

For more information on Taylor's theorem (Example 1(a)), turn to any standard work on calculus. Solving for df/dx leads immediately to the forward difference formula in Example 1(b) for estimating the first derivative. The second term on the right, which is dropped in making the estimate, is referred to as the "truncation error" (because we are truncating the Taylor series for f). If the calculations were being performed to infinite precision, that would be the end of the story -- smaller step sizes give smaller errors. Of course, that's not the case. Since f is being evaluated on a computer to a fixed precision, there is some approximation error. Let g(x) be the computed value of f(x). That is, g(x)=f(x)+eps(x) (so that eps(x) is the floating-point error in computing f at x on a computer). If f is a simple function, eps would typically be bounded by |f(x)|eps(M), where eps(M) is the machine precision (2^-2310^-7 for floats, 2^-5310^-16 for doubles). However, for more complex functions, eps(x) may be considerably larger. Taking this into account, Example 1(c) is the actual estimate. The last term, the error term, is referred to as "condition error."

You can now see how things get interesting. Truncation error gets smaller as the step size decreases, but the condition error grows. Suppose you can bound

by some constant, say M (at least near x). Suppose too, that you can bound |eps(x)| by another constant, E, near x. I'll denote (g(x+h)-g(x))/h by (x) (in other words, (x) will be the computed estimate of the derivative); then |(x)-df/dx(x)|<=Mh/2+2E/h.

Clearly then, there exists an optimal value for h; the first term on the right increases with h, while the second term decreases with it. To find that optimum, you differentiate the right side with respect to h to obtain Example 1(d). If you set this equal to zero and solve for h, you get Example 1(e).

When the step size is to be fixed (this is usually done for performance reasons), the square root of the machine precision is often used. An identical analysis yields the following step sizes for other formulas:

Example 2(a) shows the central difference approximation to the first derivative, where in this case M is an upper bound on d³f/dx³.
Example 2(b) is the central difference approximation to the second derivative, where in this case M is an upper bound on d⁴f/dx⁴.

If you do have analytic derivatives available, the aforementioned observations can be turned around to estimate E, the maximal error in evaluating f. Recall the bound on the error in forward differencing: |(x)-df/dx(x)|<=Mh/2+2E/h. Notice that if h is small enough, the error bound will be dominated by the second term on the right side. Under this assumption, you have |(x)-df/dx(x)|<=2E/h, or E=O(|(x)-df/dx(x)|h).

In other words, assume you know the correct value of df/dx(x). You can intentionally induce error in the forward difference approximation to df/dx(x) by intentionally choosing a very small step size. This will cause the condition error to balloon up, and the truncation error to die off, so you can drop the first term in our upper bound on |_f-df/dx(x)|. Then, since df/dx(x) is known, you can solve for E, and so estimate how much floating-point error is creeping into your function evaluations.

For this method to give realistic estimates in practice, a few subtle points need to be considered. The first is that the quantity 2E/h is only an upper bound on the finite-difference error. There is no guarantee that the actual error for a given step size will hit this upper bound. In fact, it is often the case that the actual error in evaluating a function varies widely at adjacent points. The second concerns the choice of step size. If it is taken to be too small, the forward difference will be zero, and any information on the error will have been lost. If it is taken to be too large, the assumption that the error bound is dominated by condition error will no longer be valid. Lastly, what if analytic derivatives are not available? These concerns are addressed in algorithms that implement these ideas.

Algorithms

I will now present algorithms for high-precision estimates of first derivatives through forward differencing (see the aforementioned Practical Optimization). The idea is to get a reasonable estimate of M (the upper bound on f's second derivative) through straightforward central difference estimates. The step size for forward differencing is then chosen to be Example 3(a).

The bulk of the work, then, is to determine a step size giving a suitable estimate of M. I assume that the second derivative of f is changing slowly enough that its value at x is an acceptable approximation to M. A sequence of trial values for that step size is generated, and at each point, the relative condition error is calculated as in Example 3(b). Here, E is our bound on the evaluation error in f, h is the trial step size, and is the estimate of f's second derivative. condErr() is a measure of the size of the condition error only (that is, neglecting truncation error) in estimating the second derivative, relative to the size of the second derivative. Likewise, the relative condition errors of the forward and backward difference estimates of f's first derivative are found from:

For a value of h to be considered acceptable, condErr() must be less than 0.1 (guaranteeing at least one correct significant digit, at least with respect to the condition error). To carefully assess the accuracy of , some estimate of the truncation error is needed. However, it is inconvenient to do this, and so instead, the relative condition error is required to be at least 0.001, since the truncation error generally rises as the condition error falls. To obtain an initial estimate of h, the assumption shown in Example 3(c) is made.

In other words, it is assumed that f's second derivative is proportional to f, and inversely to the square of its argument (note that this assumption is satisfied by any monomial of degree greater than 1). The "1s" in both numerator and denominator guard against the effects of small values for either f or x. If this assumption holds, then, from Example 1(e), the optimal interval is Example 3(d).

However, at a step size of , condErr() will be of order one, so we choose as our initial estimate of h the quantity 10to produce a condErr() of 0.01. Of course, it is entirely possible that the assumption on the behavior of f's second derivative is incorrect. If the initial estimate of produces a relative condition error that is too small, you simply decrease h, and if condErr() is too large, h is increased. This is repeated until an acceptable value is found.

Notice that this iteration could fail to ever produce an acceptable value of h if, say, f is constant. For this reason, we set an upper limit on the number of iterations, k_max. For diagnostic purposes on failure, we also record whether or not we ever found a value of h that was acceptable for estimating the first derivative. This information is encoded into the variable h, which is initialized to -1, and set to the step size h the first time we find an h for which both _f and _b have relative condition errors of less than 0.1. (In the following algorithms, _f denotes the forward difference approximation to the derivative, _b the backward difference approximation, and _c the central difference approximation.)

Algorithm #1

Initialization. Set h=10, where =2(1+|x|) (E/(1+|f(x)|). Set k=0. Evaluate _f(h), _b(h), (h), and their relative condition errors. Set h_s=-1.
Evaluate the initial estimate. If max(condErr(_f), condErr(_b)<=0.1, set h_s=h. If 0.001<=condErr()<=0.1, go to step 5. If condErr(F)<0.001, go to step 4. Otherwise, continue at step 3.
Relative condition error too large.
1. Set k=k+1, and h=10h_..
2. Recompute the finite differences and their relative condition errors.
3. If h_s=-1, and max(condErr(f_f), cond- Err(f_b))<=0.1, set h_s=h.
4. If condErr(F)<=0.1, go to step 5.
5. If k=k_max, go to step 6.
6. Go back to step 3a.
Relative condition error too small.
1. Set k=k+1, and h=h/10.
2. Recompute the finite differences and their relative condition errors.
3. If h_s=-1, and max(condErr(_f), cond- Err(_b))<=0.1, set h_s=h.
4. If condErr()>=0.1, go to step 5.
5. If k=k_Max, go to step 6.
6. Go back to step 4a.
Success. Set h_Opt=2E/||, the error estimate is 0.
Failure.
1. If h_s=1, then f appears to be nearly constant. Set h_Opt=h, and set _f=0;the error estimate is zero.
2. If h_s=/ -1 and condErr()>0.1, then f appears to be linear or odd. Set h_Opt=h_s, =0, and set the error estimate as in step 5.
3. Otherwise, the second derivative appears to be changing too rapidly. Set h_Opt=h and set the error estimate as in step 5.

Algorithm #1 assumes the availability of E. Often this is not a problem; E may be estimated by E~|f|(machine precision). In other cases, however, this may not be sufficient. If analytic derivatives are available, then the method outlined here could be used to compute eps(x). Of course, if you have analytic derivatives, then you don't care about Algorithm #1 (although I suppose that if derivative evaluation were expensive enough, you might do it once to get eps, and then use Algorithm #1). In general, though, the problem is that initially, you have neither a reasonable estimate of the derivative or of E. Luckily, I've found that, in practice, an order of magnitude estimate of E is sufficient for Algorithm #1. Consequently, Algorithm #2 may be used to perform a high-precision estimate of the derivative when only function evaluations are possible.

Algorithm #2

Use some fixed step length to estimate df/dx through forward differences.
Use this result to estimate E (see Algorithm #3).
Use the result of step 2 in Algorithm #1.

Previously, I mentioned that there were a few subtle points to keep in mind when trying to back out E. Algorithm #1 takes them into account. The idea is to find the smallest step size that produces a change in the function value, and use that for a forward difference estimate. Any step size less than that gives no useful information, since the resulting finite-difference estimate will always be zero, independent of the function. On the other hand, the smaller the step becomes, the more the condition error term dominates the error bound. This leads you to use the smallest step size possible.

The algorithm proceeds in three distinct phases. The first attempts to bracket the least significant step; A denotes the lower bound and B the upper. The second phase uses a simple bisection algorithm to find the least-significant step within a user-specified relative tolerance, relTol (this does not have to be unduly tight; in practice; I've found that a value of 0.1 works well). The last phase takes this approximation to the least-significant step and uses it to compute the forward, backward, and central difference approximations to the first derivative. These are then compared to the "known" derivative value, and (for each approximation) E is computed. The maximum value is returned.

Algorithm #3

Bracketing Phase. Break the procedure into cases: If both f(x) and df/dx(x) are nonzero, go to step 2. If df/dx(x) is nonzero, but f(x)=0, go to step 3, and if both f(x) and df/dx(x) are zero, go to step 4.
Bracketing Phase, Case 1. A reasonable guess for h might seem to be |x|eps(M), which would be the least-significant change you could possible make in x. However, when x is small, this can lead to absurdly small guesses for h, which slow the algorithm considerably. So, when x is small, we instead reason that f(x)+eps(M)|f(x)|~f(x+h)~f(x)+df/dx(x)h when h is the smallest significant step. So, if |x|>=1, let A=|x|eps(M) else let A=eps(M)|f(x)|\|df/dx(x)|. Let B=10A.
1. Ensure that A is a Lower Bound. While f(x+A)f(x), let B=A, A=A/10.
2. Ensure that B is an Upper Bound. While f(x+B)=f(x), let A=B, B=10B.
3. Go to step 5.
Bracketing Phase, Case 2. Since f(x)=0, function values near x are very small (near underflow), and the smallest-significant step size may be hard to find. Instead, we simply set A to 0.
1. Ensure that B is an Upper Bound. While f(x+B)=f(x), let A=B, B=10B.
2. Go to step 5.
Bracketing Phase, Case 3. Since df/dx(x)=0, the function may even be constant, so we must set some maximum number of guesses, say k_max, let A=0, k=0.
1. Ensure that B is an Upper Bound. While f(x+B)=0, and k<=k_max, let A=B, B=10B, k=k+1.
2. If k=k_max, then the function appears to be constant -- return an error code.
3. Go to step 5.
Search Phase. Carry out a binary search until A and B agree to relTol.
Estimation Phase. Compute _f(B), _b(B), _c(B). Compute err(f)=|_f(B)- df/dx(x), err(b)=|f_b(B)-df/dx(x)|, and err(c)= 2|_c(B)-df/dx(x)|.
Return max(err(f), err(b), err(c)).

Numerical Experiments

I've implemented these algorithms in C++ (available electronically; see "Resource Center," page 3), and applied them to two test functions that show the algorithms in their best and worst light. In each case, the derivative was first estimated cheaply (comparatively) in the following way:

Estimate E by (1+|f(x)|)eps(M).
Set h=|x|(E).
Use h to compute _f.

That estimate was then used in Algorithm #2 to produce a more careful approximation.

Figure 1 presents the first test function. Table 1 lists the errors incurred by forward difference approximations to the first partial derivative with respect to x at the point (0,0) (the exact value is-2). Figure 2 shows a log-log plot of the step sizes against the errors.

Clearly, the optimal step length is near 3.45266e-4, which happens to be the square root of machine precision. So, in this case, the standard step length does quite well, giving an estimate of about 1.99979. Algorithm #1 yields a step size of 6.40672e-4, and yields an estimate of about 1.99964, comparable in accuracy to the standard method.

Figure 3 presents the second test function (based on Example 8.5 in Practical Optimization) Table 2 lists the errors incurred in estimating the derivative at x=100.001 (the exact value is about .1219977). Figure 4 displays the same log-log plot as in Figure 2. In this case, the quick-and-dirty approach yields a step length of about .10358, and an estimated derivative of .225505 (or no correct significant digits). Algorithm #1, on the other hand, yields a step length of about 2.44427e-3, and an estimated derivative of about .124268 (or two correct significant digits).

Conclusion

The algorithms presented here can certainly improve the accuracy of numerical methods, though at the cost of significantly more function evaluations. They can principally be applied to practical situations where obtaining greater accuracy from the function evaluations is difficult or impossible. Another application of estimates of function evaluation error lies in appropriately tolerancing iterative schemes. Again, this generally is not covered in the academic literature, where convergence criteria can be adjusted as needed. However, in engineering software development, solvers must work reliably over a wide range of problems, and a tolerancing scheme that adapts to the available accuracy can help.

DDJ