April 1998
by Jim Bailey

 

Text files as tables

I've heard I can open a text file as a table in C++ Builder. But try as I might, I can't get anything to work. Is it a bug or is it me?

I don't think it's a bug or you. The documentation is weak, and there are some strange quirks involved. Frankly, I've struggled with this question until my wits are completely frazzled. I never found a working example, and there's little documentation. However, trial and error finally led me to a technique that works. If you have ideas for improvements, it won't hurt my feelings one bit to hear from you. My quest led me to some technical information on a Delphi Web site, and another clue in the BDE API help file added to my understanding. (You'll find the locations of these documents at the end of this article.) A little experimentation finally yielded positive results.

So, why would anybody want to use text files as tables? Why not just use a table to begin with? Good questions. Not all data originates in databases. Some data starts out in spreadsheets or document files. Other data is captured from sophisticated instruments that create flat text files. If the text file is organized into rows and columns, it can be converted into a database table quickly and easily using the text file as a table. Borland includes powerful capabilities for developers to append massive quantities of data from a flat file into a database table with a minimum of time and effort.

There are, of course, requirements for organizing the information in a text file for use as a table. Each line in a text file is a complete record, the end of which is identified by a carriage return and line feed. Each row must be correctly divided into fields or columns, implying consistent, predictable organization.

The Borland Database Engine (BDE) supports two types of column organization in text file tables: varying and fixed. Varying files are delimited with quotes and commas (or other characters, but more on that later). Fields in fixed files are delimited by field widths. Figures A, B, and C show sample text files.

Figure A: This varying-format text file is named vCustomer.txt.

123.00,"James & Sullivan","281-281-5151",04/12/95
124.00,"George Wainrite","887-788-8787",5/5/1996
125.00,"Smith Blarney","221-212-2121",12/31/97
Figure B: This fixed-format text file is named fCustomer.txt.
123.00James & Sullivan281-281-515104/12/1995
124.00George Wainrite 887-788-878705/05/1996
125.00Smith Blarney   221-212-212112/31/1997
Figure C: This tab-delimited text file is named tCustomer.txt. The character represents a tab (ASCII 9).
123.00James & Sullivan281-281-515104/12/1995
124.00George Wainrite887-788-87875/5/1996
125.00Smith Blarney221-212-212112/31/97
Each example contains identical information. Each has three records, and each record has four fields: Customer Number, Customer Name, Phone, and Last Purchase. A description of each field is obvious to the casual observer, but a CPU needs more help to intelligently parse the file.

Know thy schema

Borland uses a schema file to communicate a text file's table structure to the BDE. The structure of a schema file is similar to that of a Windows INI file. The schema files associated with Figures A, B, and C are shown in Figures D, E, and F, respectively. A schema file must have an SCH file extension. Except for the extension, the text and schema files must share the same filename.

Figure D: This is the vCustomer.sch schema file for the vCustomer.txt flat text file.

[vCUSTOMER]
Filetype=VARYING
Delimiter="
Separator=,
CharSet=ascii
Field1=CustomerNumber,Float,20,04,00
Field2=CustomerName,Char,30,00,20
Field3=Phone,Char,15,00,50
Field4=LastSale,Date,11,00,65
Figure E: This is the fCustomer.sch schema file for the fCustomer.txt flat text file.
[fCUSTOMER]
Filetype=FIXED
CharSet=ascii
Field1=CustomerNumber,Float,6,04,00
Field2=CustomerName,Char,16,00,6
Field3=Phone,Char,12,00,22
Field4=LastSale,Date,10,00,34
Figure F: This is the tCustomer.sch schema file named for the tCustomer.txt flat text file. The character represents a tab (ASCII 9).
[tCUSTOMER]
Filetype=VARYING
Separator=
CharSet=ascii
Field1=CustomerNumber,Float,20,04,00
Field2=CustomerName,Char,30,00,20
Field3=Phone,Char,15,00,50
Field4=LastSale,Date,11,00,65
The first line in the schema file is the name of the text file (without the extension), enclosed in square brackets. The documentation indicates the file extension can't be used here. In my tests, I found that including the extension resulted in data corruption--I guess it really is important to follow this rule.

The second parameter is Filetype. You must set it to VARYING or FIXED. The delimiter and separator settings depend on the file type. There are at least two kinds of varying text files: quote/comma and tab delimited. The example in Figure A shows the correct settings for quote/comma files, while Figure C shows a tab-delimited text file. Their respective schema files are shown in Figures D and F.

A tab-delimited schema doesn't specify a delimiter, in spite of its name. It uses only a separator. The separator is the tab character, which generally isn't visible as a printable character. Enter a tab character into the schema file by pressing the [Tab] key on the keyboard; sometimes [Ctrl][Tab] may be required. The matter is further complicated by Borland's statement in Technical Information Paper 2844 that "[the separator] must be a visible character." I found the tab character works as a separator, even though it isn't a printable character. Tab-delimited files are an ideal solution, and I think they offer the easiest port.

CharSet specifies the language driver. Use ASCII for English, as shown in the examples.

The remainder of the schema file occupies itself with field definition. The definition consists of the following properties: ordinal position, field name, field type (see Table A for more information), maximum field width, digits to the right of a decimal, and record offset to the first character in the field.

Table A: Field types supported by text table schema files
Field typeDescription
CHARCharacter
FLOAT 64-bit floating point
NUMBER16-bit integer
BOOLBoolean (T or F)
LONGINT32-bit long integer
DBIDATE Date field
TIME Time field
TIMESTAMP Date/Time field

All parameters for a field are included on one line. Field-definition lines begin with the word Field, followed by a number specifying the ordinal position of the field in the table.

The offset parameter is important for both fixed and varying text tables. Varying fields use the information to construct virtual fields in memory. Fixed type files use the value to locate the first character in a field.

Files of the varying type rely on separator and delimiter characters to parse the file. However, be careful with the fixed-field files. Here's the pseudocode used to parse a single record in a fixed-type text file, as far as I could determine:

Read a line from the text file
Loop for each field
  Move to the offset character
  Read n characters as set in 
    maximum width
This can actually result in some field overlap, since the parsing utility doesn't observe field boundaries when reading field data. It just moves to a particular location and reads a specified number of characters. If the field width for one field specified in the schema file overlaps into the next field, the data in the second field will appear in the import twice--once as a suffix to the first field and again in its own field. Consider the following text line:
123.00James & Sullivan281-281-515104/12/1995
The customer name obviously starts at character six (base zero). The length of the name is 16 characters. However, if you specify the company name's field width as 27, it will include the customer name and phone number in the customer name field. If the phone number field is set to begin at character 22, the phone number will also be shown as a separate field.

Text tables in C++Builder

Even with this comprehensive information described, I couldn't get the blasted text tables to work in C++Builder. How frustrating! But it was actually quite simple, once I figured it out. (Note that I haven't found the following information in any documentation.) Here's how I finally made it work. I first added a TDatabase object and set the Driver Name property to Standard. I added the following text to the Params property:

 

PATH=C:\TxtFiles
DEFAULT DRIVER=ASCIIDRV
ENABLE BCD=FALSE
I didn't set the AliasName property. The Path specification in the Params property is the location of both the flat text and schema files. I set the DatabaseName property to TextTables, which effectively creates a local, temporary alias of the same name. I set the Connected property to false; more on this later

Next, I added a TTable object, setting the DatabaseName to TextTables and the TableType property to ttASCII. After completing this setting, the dropdown box in the TableName property contains a list of TXT files in the path specified in the TDatabase object's Params property.

The text file table can have any extension. However, files without a TXT extension won't appear in the dropdown list of the TableName property.

If you select a text file with an associated schema, you're almost home free. The rest is simple.

Text table "gotchas"

There are a couple of limitations and "gotchas" with text tables. They're only nuisances, but they can add up to big trouble and lost time. For example, be certain the TTable's Active property is set to false during program development. This is important. If the active property is true, the program will compile and link, but it won't run. You see, an active design-time TTable object locks the text file. The runtime TTable object can't open the text file, and the program crashes. Very ugly. That's why I set the TDatabase object's Connected property to false. Doing so assures me the Active property will be false at runtime on all related TTable objects. I activate the data objects in the FormCreate event.

Another annoying problem involves the text file itself. The last record must terminate with a carriage return and line feed. (Either character alone may also work, but I didn't test it that way.) Without a proper terminator, the test programs crashed, throwing a rather nasty error message: Data Structure Corruption. Wow. What does that mean? The situation isn't nearly as serious as the message indicates. Actually, it just means that the last record isn't accessible without proper termination. Add a terminator by pressing the [Enter] key in most text editors.

You can't access text tables using TQuery objects. I wasn't able to delete records in any text table, and I couldn't edit or append data in varying-type text tables using VCL data objects. However, I could edit and append fixed-type text tables. The data objects seem to maintain correct field without supporting code.

There were problems with the date field, too. Notice the fixed-width text file shown in Figure B. The dates are entered in the format of mm/dd/yyyy. The system converts these dates to the Windows default date format, which is m/d/yy on my system. When the date is written to the text file, it isn't properly formatted, and it throws off the fixed-width format. A little supporting software could take care of this problem.

Interestingly enough, the system can parse a fixed text table with lines longer than the specified maximum record length; the extra bytes are ignored. This is a handy bit of information. I've had situations in which I needed to parse a text file with several fields. All but the last field complied with fixed-field requirements. The last field did not comply, because of its varying width. A fixed-type text table can handle such irregular fields if all required fields are located at the beginning of each record, and they comply with other fixed-width requirements. In such cases, it appears the text table parsing software simply ignores the extra characters.

It's possible to view all the text tables shown in this article using TDBGrid and TDBCtrlGrid data objects. 

If you're interested, Technical Information Paper 2844 is available at www.inprise.com/devsupport/delphi/ti_list/TI2844.html

Some help on this topic is also available in the BDE API help file by searching on schema.