October 1998

Packing database tables

by Mark G. Wiseman

You can download sample files from our Web site as part of the file oct98.zip. Visit www.zdjournals.com/cpb and click the Source Code hyperlink.

Packing database tables…. Do you have a mental picture of going on a trip and deciding to take your favorite database table? First, you stuff the table into your suitcase and then you sit on the suitcase's lid until you can get the latches fastened. Well, that is not exactly what we mean, but the image isn't too far off. In this article, we'll explain what it means to pack a table, then we'll describe how to accomplish table packing (without a suitcase).

Inside the black box

When you add a record to a database table, the database creates a slot for the record in the correct location, inserts the record, and updates any indices the table may have. When the database creates the slot for the record, it may have to enlarge the disk file that holds the table.

Later, when that record is deleted from the table, the database marks the record as deleted-but it doesn't reduce the size of the table's disk file. The two main reasons for this behavior both involve speed. Rather than deleting a record from a table and then shrinking the table's disk file, it's much faster for the database to just set a flag for the newly deleted record, indicating that the record has been deleted. Later, when a new record is added, it's again faster to reuse the slot of a deleted record than it is to expand the table's file on disk and then insert the new record.

Marking records as deleted without physically deleting them from the file is very similar to the way DOS deletes files from a disk. The file system sets a flag in the file-allocation table indicating that the file has been deleted-but the file actually remains on the disk until it's overwritten by a new file. DOS also does this to improve the speed of the file system. When you pack a table, you force the database to remove the deleted slots from the table's file. As a result, you shrink the size of that file on disk.

So why pack a table?

If marking records as deleted instead of physically altering the table file is faster, why bother packing a table? The answer is simple: size. As a programmer, you aren't concerned only with the speed of your program: You must also keep the program and its associated files to a reasonable size. Doing so will often involve tradeoffs between speed and size in the program's design and implementation. If a program adds many records to a table and then deletes most of those records, the table can take up a lot of unnecessary disk space. By packing the table, you may slow down the operation of the database, but you'll free disk space that another program may need. Let's consider an example that's similar to packing a table into a suitcase and taking it on a trip. We have a database table that we want to put into a ZIP file and send over the Internet. In this case, reducing the size of the table also increases its speed of transmission-no tradeoffs here. Can the disk-space savings be that dramatic? Yes, they can. Figure A shows our example program (available from our Web site).

Figure A: Our example program demonstrates the dramatic difference in table size after packing a table.
[ Figure A ]

The table in the example program starts with a single record; the table and its index use 8,192 bytes of file space. The program then adds 10,000 records; the table and its index now use 4,106,240 bytes of disk space. Next, the program deletes all but one of the records. This brings the table back to having a single record-but the table and index still occupy 4,106,240 bytes! Finally, the example program packs the table and-whew!-the total size returns to 8,192. Listings A and B contain the code that adds and then deletes records.

Listing A: Code snippet to add 10,000 records

const int RECNUM = 10000;

void TMainForm::AddRecords()
   {
   Table->Active = true;

   for (int i = 0; i < RECNUM; i++)
      {
      Table->Insert();
      TableName->Value = "PackTable Example";
   Table->Post();
   }

   Table->Active = false;
   }
Listing B: Code snippet to delete all but one record
void TMainForm::DeleteRecords()
   {
   Table->Active = true;

   int delNum = Table->RecordCount - 1;

   Table->First();
   for (int i = 0; i < delNum; i++)
      Table->Delete();

   Table->Active = false;
   }

How do we do it?

OK, you're convinced that packing a database table is something you might want to do. You've looked in the online help for the VCL components TTable and TDatabase and found nothing about packing tables. So, how do you pack a table? Listings C and D contain the code for a PackTable() function that will pack Paradox or dBASE tables.

Listing C: The PackTable function header

#ifndef PackTableH
#define PackTableH

#include 

bool PackTable(TTable *table);
#endif

Listing D: The PackTable function

#include 
#pragma hdrstop

#include "PackTable.h"

bool PackTable(TTable *table)
   {
   bool active = table->Active;
   bool exclusive = table->Exclusive;
   bool retval = true;

   try
      {
      if (!exclusive)
         {
         table->Active = false;
         table->Exclusive = true;
         }
      table->Active = true;

      CURProps props;
      Check(DbiGetCursorProps(table->Handle, props));

      String tableType = props.szTableType;
      if (tableType == szPARADOX)
         {
         CRTblDesc tableDesc;
         memset(&tableDesc, 0, sizeof(tableDesc));
         lstrcpy(tableDesc.szTblName, 
           table->TableName.c_str());
         lstrcpy(tableDesc.szTblType, szPARADOX);
         tableDesc.bPack = true;

         hDBIDb hDb = table->DBHandle;
         table->Close();
         Check(DbiDoRestructure(hDb, 1, &tableDesc, 
           0, 0, 0, false));
         table->Open();
         }
      else if (tableType == szDBASE)
         Check(DbiPackTable(table->DBHandle, 
           table->Handle, 0, szDBASE, true));
      else
         retval = false;
      }
   catch(...)
      {
      retval = false;
      }

   table->Active = false;
   table->Exclusive = exclusive;
   table->Active = active;

   return(retval);
   }
PackTable() makes some calls into the Borland Database Engine (BDE) API. The BDE API includes a function, DbiPackTable(), that will pack a dBASE table-unfortunately, it won't pack a Paradox table. Packing a Paradox table requires the use of another function: DbiDoRestructure(). Using DbiDoRestructure() is more complicated, but it's not too bad. First things first, though.

 
Tip: Look up BDE API
The online help reference for the BDE API is included with C++Builder; you can find it at \Program Files\Borland\BDE\bde32.hlp. While you're exploring the help file for the BDE API, be sure to look up the function DbiUndeleteRecord()-especially if you're using dBASE tables.

PackTable() takes a single argument, a pointer to a TTable object. If PackTable() successfully packs the table, it returns true. If it can't pack the table, it returns False. Both DbiPackTable() and DbiDoRestructure() require that the table to be packed be open in exclusive mode, meaning that no other programs can have the table open while you pack it. PackTable() first saves the current table settings for the Active and Exclusive properties. If the table isn't in exclusive mode, PackTable() tries to set the Exclusive property to True. If the table is in use by another program, attempting to set Exclusive to True will throw an exception. In this case, PackTable() will catch the exception and return False. Once the table is in exclusive mode, PackTable() calls the BDE API function DbiGetCursorProps(). PackTable() uses information returned by this function to determine the type of the table: Paradox, dBASE, or something else.

If it's a Paradox table, PackTable() sets up a CRTblDesc structure used in the call to DbiDoRestructure(). Most importantly for us, PackTable() sets the bPack member of CRTblDesc to true. The table must be closed before you call DbiDoRestructure(), so PackTable() closes the table, calls DbiDoRestructure(), and then reopens the table. If it's a dBASE table, PackTable() simply calls DbiPackTable(). If the table is neither a Paradox or dBASE table, PackTable() will return False. Finally, PackTable() resets the Active and Exclusive properties of the table and returns.

As we're packing to go…

We have a few final thoughts. Packing a table will frequently slow down the operation of the table. Also, packing a table that has many deleted records may take several seconds. So, you'll probably want to use PackTable() sparingly. You could call PackTable() every few days or only when you're going to transfer a table by disk or online. Or, you could allow your users to pack the table when they think it's necessary. Don't overuse it, but do use it. As an added bonus, PackTable() also regenerates all the maintained indices of the table. It does so as a by-product of calling either DbiDoRestructure() or DbiPackTable(). If you're using a table format other than Paradox or dBASE, you may be able to enhance PackTable() to work with that format also. For example, it should be possible to pack Microsoft Access tables using COM.

Finally, we may be packing, but we'll be back. In our next article on this subject, we'll build a component that incorporates PackTable() and will pack all the tables in a database.