LETTERS

FFTs: Fast, Faster, and Wow!

Dear DDJ,

In reference to "Faster FFTs" ("Algorithm Alley," February 1995). I would like to offer a slightly faster version of the ShuffleIndex() function.

The new function rbi (reverse binary index) takes the fbi (forward binary index) as it first parameter and pTable (a pointer to a table of bit masks) as its second parameter. It operates by:

The table of bit masks is specific to each length of the binary index; see the tables for length of 8 and 9 in Example 1(a). I would expect that in any particular application, the length of the index would be fixed. If not, then an array of pointers to bit-mask tables (indexed by the length of the ftt binary index) could easily be built. The function rbi in Example 1(b) is about 20 percent faster than ShuffleIndex().

Ken Allen

pgmkra@ibi.com

Dear DDJ,

I enjoyed Iwan Dobbe's article on FFTs. I like to really squeeze cycles out of algorithms, too. Example 1(c) is a faster way to do the bit-reverse shuffle.

The basic idea is that the inner loop (where most of the time is spent) simply toggles the 512 bit, the next-to-inner loop toggles the 256 bit, and so on. Some more time could be saved by unrolling the inner loop and using pointers instead of index arithmetic. I have never seen this method published.

Mike Dunlavey

70363.27@compuserve.com

Dear DDJ,

In "Faster FFTs," D.G.G. Dobbe gives a good overview of the factors affecting numeric performance. However, there are two cheap tricks for improving FFT speed even more, without resorting to assembly language. These tricks are most useful for large FFTs (more than 16,384 points) and fast CPUs coupled to slow RAM.

The first trick is to improve the "locality of reference" so the L1 cache gets used more efficiently. To do so, you can replace the separate Re[], Im[] arrays with an array of structures; see Example 1(d). The inner loop of the butterfly then becomes Example 1(e).

For pure compiled code (Watcom C 9.5), this change increases the speed of the forward FFT by a factor of 1.46 on a 90-MHz Pentium. The main reason for the improvement: When the Re and Im components are packed into a structure, a read from slow DRAM is very likely to put both components on the same L1 cache line, where they will be available for subsequent (fast) fetches.

The second trick is to use less math. The "split radix" FFT (Duhamel and Hollman, Electronics magazine, "Letters," January 1984) modifies the butterfly to use fewer multiplications and additions. Combined with trick number 1, the split radix is about 1.88 times faster than the classic butterfly for single precision, and 1.68 times faster in double precision, even when sine and cosine are explicitly calculated. A nice implementation of the split radix method can be found in the file tfftdp.c at ftp.nosc.mil, in the /pub/aburto directory.

Harlan W. Stockman

hwstock@abq-ros.com

VC++ and NT

Dear DDJ,

I would like to comment on John LaPlante's article, "Building an OLE Server Using Visual C++ 2.0" (DDJ, February 1995). First, I would like to say that Visual C++ 2.0 is a good product, but I am sure we will see as many bug fixes with it as with the Visual C++ 1.x series.

John's statements, "NT's responsiveness and robustness_" and "NT's crash resistance_", don't give 2.0 the credit it deserves. Since installing NT 3.5 and VC++ 2.0, I've written three separate apps. In doing so, I've had eight crashes in two weeks due to VC++. One crash was from trying to open a source file. These crashes would not be so bad if they didn't take the source files out with them. NT leaves them with size 0 bytes. A chkdsk will turn them up at the root (This is very helpful!).

As for the ClassWizard, it sometimes chokes up if the source files are not saved prior to invoking it from the resource editor. This is especially disheartening after you have added several members and functions, only to get the pop-up error!

Anyway, for those who struggle through it, I have a few helpful hints:

Generally, it takes two reboots in a row to get the system fully back; however, my hints usually help the first time.

Jon Friedline

76557.1643@compuserve.com

The Win95 Logo

Dear DDJ,

Isn't it true that IBM doesn't have rights to any of the Win32 source? If so, then I think this might be the strongest reason yet for Microsoft's new logo rules described by Jonathan Erickson in his March 1995 "Editorial." If Microsoft can convince all the major players to release only Win32 versions of their apps using OLE, long filenames, and other Windows 95- and NT-specific features, then suddenly IBM and OS/2 are in trouble. IBM will have to choose between clean-room engineering Win32 support (no mean feat) or living with fewer new and supported apps that run under OS/2.

Microsoft would clearly like to see more NT apps, but I'd bet it's even more interested in cutting OS/2's supply lines. If it wanted to push NT, it could announce a 50 percent price cut, and sales would surge overnight.

Lou Grinzo

71055.1240@compuserve.com

Dear DDJ,

I read with much interest Jonathan Erickson's editorial on Microsoft's Win95 logo issue. I am a small developer of vertical-market database applications. All of my clients are 20 to 50 workstations on Novell networks. My clients understand the demand of workgroup applications, and I have a good track record of delivering applications that fill this need.

Although many of these clients use Windows for word processing or spreadsheets, none have their workgroup apps under Windows. The applications I have developed are under Clipper, Clipper with Btrieve, C with Btrieve, C with c-tree plus, and Foxpro for DOS.

The area Erickson is talking about in his article is the rarefied world of mass-market applications--those spreadsheets, word processors, or e-mail packages that can sell tens of thousands of packages.

In a nutshell, his beef has little or nothing to do with me and my client base. The fact that Microsoft is prodding the marketplace to be more streamlined just makes it easier for my clients to make a thousand-dollar decision. This represents the everyday level of pain for the small businessman, myself included.

I would like to get a Windows database project. Visual Basic, Foxpro for Windows, dBase for Windows, the new CA Visual- Objects would be great. But my clients don't see this as a cost-effective option for their workgroup needs. Oracle, Sybase, Powerbuilder, are you kidding?

We are talking about workgroups with 30 386/16s with 4 Mbytes of RAM. We are talking about a learning curve of months (with the associated downtime). I am helping my clients move over to Windows one step at a time, replacing a few machines with 486/33s and 8 Mbytes of RAM as budgets and time permit.

If Microsoft can give the warm and fuzzy that a move to a new machine will be reasonably painless, just add some money and plug-n-play, all the better. Better for my clients and better for me.

I'm paying for my own learning curve with my time and energy (and missed opportunity). My clients expect me to have answers when they need the question answered. Windows 95 is not yet a question they need answered. If Microsoft can make the answer easier to live with (read "pay for"), my clients and I are all the better served.

Just remember a few years ago; the landscape had a lot of mainframe and mini iron, all incompatible with each other. They are being slowly squeezed out of existence because of the leveling of the playing field with the PC revolution.

I can only see the same thing happening with software. And I am one application developer that likes what I see.

Name Withheld

Dear DDJ,

Upon reading the "Editorial" in the March issue, I was struck by the similarity of Microsoft's logo requirements with Apple's Macintosh development-partner requirements of February 1984.

Apple said that to qualify, your company had to have x amount of dollars in the bank and y number of existing apps sitting on store shelves to "qualify" to become a certified Macintosh development partner. This policy closed the door on many early would-be Mac developers and, in my opinion, is a major reason why the Mac never became the platform of choice.

Gee, I sure would have thought that Bill Gates would be smart enough to learn from Steve Job's mistakes.

Andy Bentley

71055.3060@compuserve.com

Dear DDJ,

I just got my copy of DDJ (March 1995), and already I'm smiling. First off, I commend you in including Linus Torvalds in your "Excellence in Programming Award." I am calling my provider with Minicom, a Telix-like communications program running under Linux 1.1.59. I believe this operating system to be a wonderful thing, and join you in congratulating the winners.

Also, Jonathan Erickson's "Editorial" made me smile. Why? Because finally someone is talking about Windows and Microsoft without being afraid. One magazine, which I will not name, called OS/2 better Windows than Windows, rated it "Good," and went on to rate Windows better than OS/2. Weird. Erickson notes that Microsoft should "start treating third-party developers as partners, rather than serfs." I think that Windows is dead, and Bill doesn't seem to realize that he's killing it. No matter, I develop only for DOS, OS/2, and Linux. I commend you on your honest and great magazine!

Martin Brown

martin@nezumi.demon.co.uk

Example 1: FFTs: Fast, faster, and wow!

(a)
static unsigned int MaskTable08[] =
{
   0x81,  /*  10000001  */
   0x42,  /*  01000010  */
   0x24,  /*  00100100  */
   0x18,  /*  00011000  */
   0x00   /*end of table*/
};
static unsigned int MaskTable09[] =
{
   0x101, /* 100000001  */
   0x082, /* 010000010  */
   0x044, /* 001000100  */
   0x028, /* 000101000  */
   0x00   /*end of table*/
};

(b)
unsigned int rbi(unsigned int fbi, unsigned int * pTable)
{
   unsigned int result, mask, temp;
   result = fbi;
   for(mask = *pTable; mask; mask = *(++pTable))
   {
     temp = fbi & mask;             /* isolate symetric bits pair     */
     if (temp == 0 || temp == mask) /* are the symetric bits the same ?*/
         continue;                /* if so, then don't need to change */
      result ^= mask;              /* bits differ, so flip them       */
   }
   return(result);
}

(c)
float real[1024], imag[1024], temp;
#define SWAP(a,i,j)(temp=a[i], a[i]=a[j], a[j]=temp)
int i=0, j=0, i0,i1,i2,i3,i4,i5,i6,i7,i8,i9;
#define TWICE(k)
 for(i##k = 2; -- i##k >= 0; j ^= (1<<k) )
TWICE(0) TWICE(1) TWICE(2) TWICE(3) TWICE(4)
TWICE(5) TWICE(6) TWICE(7) TWICE(8) TWICE(9)
{       if (j > i){
                      SWAP(real,i,j);
                      SWAP(imag,i,j);
                  }
       i++;
       }

(d)
struct cmplx {float Re; float Im;} xc[SIZE], *cp;

(e)
tempr =  Qr * (cp+index2)->Re  -  Qi * (cp+index2)->Im;
tempi =  Qr * (cp+index2)->Im  +  Qi * (cp+index2)->Re;

(cp+index2)->Re = (cp+index1)->Re - tempr;   /* For Re-part */
(cp+index1)->Re = (cp+index1)->Re + tempr;
(cp+index2)->Im = (cp+index1)->Im - tempi;   /* For Im-part */
(cp+index1)->Im = (cp+index1)->Im + tempi;


Copyright © 1995, Dr. Dobb's Journal