C++ Builder Developer's Journal / DB->dilemmas

July 1998

DB->dilemmas

by Jim Bailey

Text Tables II

Thank you for the excellent article on importing text files. I found the schema information helpful, but I think I was missing the TDataBase object's Params property settings. I spent considerable time attempting to import text tables and just couldn't get the procedure to work.

Also, the article on text tables alone was worth the annual subscription. I'm considering using the TBatchMove object to populate my tables. I'd like to see a related article.

Fred C. Mitchell
Princeton NJ.

Thanks, Fred. I received several responses like yours. Importing text files is a definitely a trouble spot, primarily because the documentation is so weak. I've recently worked with the TBatchMove object and found it to be a superb tool. In this column, I'll show you how to build a utility to import text files using the TBatchMove component. Along the way, I'll discuss an easier way to write schema files.

My motivation

Recently, I was retained for a project which involved porting data from an old PC-based Xenix system to an AS 400. Retrieving data from the old system was apparently impossible. It was a DB-Dilemma I'd never before witnessed. Fortunately, I eventually stumbled onto the Xenix print queue and managed to export the files to reports. The reports weren't organized in predictable rows and columns, but in complex, irregular sections. Even so, I finally managed to parse the report files into text tables and began the process of exporting text files into Paradox.

On my first attempt to port the data, I opened the text file as a table and used a TQuery object with a parameterized INSERT query to populate the Paradox table. Much to my surprise, the process took hours. And there were only 10,000 records.

To get the data into a Paradox database, I decided to use the TBatchMove object, hoping it would offer reasonable performance. I was astonished at the results. The entire text table was copied in eight or nine seconds. Initially, I thought the command had failed.

Scheming to save time and effort

In the process of porting this data, I got sick of writing schema files. It's definitely a boring, error-prone task. I decided to develop a text-porting tool to enter the structure of the text table and let the porting tool create the schema file and database table. I set up the following simple C structure named fld to define text table fields:

struct fld {
  char FieldName[25];
  int FieldLen;
  char type;	//Field type
  };

Completing this structure is much simpler than working with schema files. For instance, if the text table's structure changes, every line in a schema file will probably be impacted. Using the declaration shown in Listing A, I only need to know the width of each field. If it changes, I just update the width of that one field. Figure A is a listing of open invoices. The data structure in Listing A defines every field in Figure A.

Listing A: The declaration to set up a field structure

#define CHAR_TYPE	'c' //code value
#define DATE_TYPE   	'd'
#define SHORT_TYPE	'i' //16 
	bit integer
#define CURRENCY_TYPE	'm'
#define PAD_TYPE	'p'
#define LAST_INSTRUX	'z'

fld f[]=
{  //fName   	len	Type
  {"p1",         3,     PAD_TYPE},
  {"InvoiceNo",  6,     CHAR_TYPE},
  {"p2",         1,     PAD_TYPE},
  {"BranchID",   2,     SHORT_TYPE},
  {"p3",         1,     PAD_TYPE},
  {"Description",15,    CHAR_TYPE},
  {"SoldBy",     2,     SHORT_TYPE},
  {"InvoiceCode",1,     CHAR_TYPE},
  {"InvoiceDate",12,    DATE_TYPE},
  {"Reference",  7,     CHAR_TYPE},
  {"InvoiceAmt", 8,     CURRENCY_TYPE},
  {"eof",        0,     LAST_INSTRUX}
};
String TextTableName("Aging");
String DBTableName(TextTableName);

Figure A:
R02502 01 INVOICE 1 R 10/09/97 L03453 2418.75

R02502 01 W/O INSURANCE 1 R 10/13/97 W/O INS 180.98-

R02502 01 LOCKBOX BRANCH 1 R 11/05/97 CK15246 2413.12-

R02503 01 INVOICE 1 R 10/09/97 L03454 1666.25

This text file lists invoices for an aging statement.

In Listing A, each text table field is defined with three columns: the field name, the field width, and the field type. These parameters apply only to the text table. I just count the number of characters in each field. If I don't want to keep the field, I mark it as a PAD_TYPE; otherwise, I select the appropriate type, and I'm home free.

What is PAD_TYPE? Remember, Figure A is a printed report file. The first three characters on every line are spaces. Creating a PAD_TYPE field allows the text-porting tool to skip over certain data in fixed field width text tables. Compare Listing A to Figure A to see how to effectively use the pad fields to strip empty columns.

About the application

The text-porting tool creates the schema and table based on the information in the fld structure shown in Listing A. Then, the porting tool can populate the table it creates with data from the text file. If you do much work with Text tables, you'll want to download the sample text table tool application from The Cobb Group's Web site at www.cobb.com/cpb. The application relies on four key functions to do most of the work. Those functions are as follows:

void BuildSchema(fld f[], String &
	TextTableName, const char SCHEMA_
		TYPE, bool Debug);

void BuildTable(fld f[], String &
	TableName, bool Debug);

void CopyTextToTable(String &Text
	TableName, String &DBTableName);

void ViewTable(String NewFile);

Using only the information in the fld structure, these four functions create a schema, set up a database table, copy the text table to the database table, and link the data to a TDBGrid for immediate viewing. Using the fld structure is remarkably fast and much easier to maintain than working directly on the schema files and SQL commands.

Figure B shows the control panel for the text-porting tool.

Figure B: You select your options from the text-porting application main screen.
[ Figure B ]

The application consists of a radio group, rich edit text box, grid, three check boxes, and a command button. The grid and rich edit box occupy the same space on the screen.

The radio group object labeled Command specifies what action is taken after you click the Execute Command button. In this application, the button seems a bit of overkill; but in real life, I put several buttons on the screen, each processing its own text table.

The three check boxes are extremely helpful. When Debug is checked, the commands don't actually create a schema file or a table. However, the commands, which would be written to a file if Debug wasn't checked, are visible in the rich edit component.

The All Char check box forces every field to type CHAR (text). This process is often helpful when debugging a data port. The View Grid operates to show or hide the grid, based on the value of the check box. I didn't include too much code in this article because of space limitations, but the source is available in the sample application.

Logic Anomalies

I ran across some interesting behavior developing the BuildSchema and BuildTable functions. The schema field types are mentally incompatible (at least in my mental resources). For example, in a schema file, field type NUMBER is a short integer. SQL typically defines NUMBER as a double value. I set up a system to cross-reference field types between the schema and SQL. The results are shown in Table A.

Table A: Field-type specification cross-references

Flg

Macro Name

Schema

SQL

'c'

CHAR_TYPE

CHAR

CHAR

'd'

DATE_TYPE

DBIDATE

DATE

't'

TIME_TYPE

TIME

TIME

'D'

DATETIME_TYPE

TIMESTAMP

TIMESTAMP

'm'

CURRENCY_TYPE

FLOAT

MONEY

'f'

FLOAT_TYPE

FLOAT

FLOAT

'i'

SHORT_TYPE

NUMBER

SMALLINT

'l'

LONG_TYPE

LONGINT

INTEGER

'b'

BOOL_TYPE

BOOL

BOOLEAN

I tested the SQL types using Paradox and Access 97. All the types are supported in both environments except BOOLEAN. The MS Access Boolean specifier is LOGICAL and doesn't accept BOOLEAN as a field type in a CREATE TABLE statement.

Using a table similar to the one shown in Table A adds considerable flexibility to your application. Modifications to the program's macros accommodate requirements across multiple databases. Simply changing BOOLEAN to LOGICAL in the source code allows me to create an Access table instead of Paradox.

Double mapping also provides flexibility. For example, the Schema specification doesn't support a currency field type. I wanted to create a database table using currency fields. So, I simply mapped the schema's float type twice--once as float and once as money. When creating the schema file, the program uses FLOAT for both. However, the SQL command uses either FLOAT or MONEY, as specified.

TBatchMove

The TBatchMove component is terribly simple to use. It requires two tables--a text table and a database table. You should set up the TDatabase and/or any aliases before trying to execute this code. Since we've already covered all these topics in detail in previous issues, we won't repeat them now. Once you've set up all the databases, the code shown in Listing B executes the copy from the text table to the database table.

Listing B: Code required for porting data from a text table to a database table

  String dbTable(DBTableName);
  dbTable +=	DB_FILE_EXT;
  tblPdox->TableName = dbTable;
  tblTxt->TableName = TextTableName;
  dbTxt->Connected = true;
  dbPdox->Connected = true;
  tblTxt->Active = true;
  tblPdox->Active = true;
  BatchMove1->Execute();
  tblPdox->Active = false;
  tblTxt->Active = false;
  dbTxt->Connected = false;
  dbPdox->Connected = false;

The objects with tbl name prefixes are TTable objects. The db prefixed objects are TDatabase objects. Honestly, there isn't much more we can say about using the TBatchMove object other than it works. It works fast. It makes my clients happy. And it makes me happy.

Conclusion

I've seen numerous questions concerning performance while appending new records using C++Builder. Indeed, at times I've experienced terrible performance myself. However, this text table-porting tool, (including the TmoveBatch), provides astonishing performance and is also easy to use. Keep in mind that the text-porting tool isn't designed for end users, but it has been a welcome edition to this developer's tool kit.

Flg	Macro Name	Schema	SQL
'c'	CHAR_TYPE	CHAR	CHAR
'd'	DATE_TYPE	DBIDATE	DATE
't'	TIME_TYPE	TIME	TIME
'D'	DATETIME_TYPE	TIMESTAMP	TIMESTAMP
'm'	CURRENCY_TYPE	FLOAT	MONEY
'f'	FLOAT_TYPE	FLOAT	FLOAT
'i'	SHORT_TYPE	NUMBER	SMALLINT
'l'	LONG_TYPE	LONGINT	INTEGER
'b'	BOOL_TYPE	BOOL	BOOLEAN