Back in the good old days of C++ programming, you had two choices when it came to strings and string manipulation: You could use the C-style character array or the C++ string class. The situation is more complicated in C++Builder, because there are at least five different string-handling mechanisms:
The C-style character array
| The C++ cstring class
| The Standard Template Library (STL) basic_string class
| The VCL AnsiString class
| The VCL SmallString template
| |
This whole string thing can be confusing, whether you come from a C++ or Delphi background or you're a newcomer to the sport. This article will help clear up the string issue. We'll take a look at each of the string-handling mechanisms and discuss the situations in which you might use each one.
In C, there's no such thing as a true string data type--C handles strings as an array of characters (an array of the char data type). A character array declaration looks like this:
char buffer[50];This statement creates a character buffer with a length of 50.
Character arrays are null-terminated--which means the end of the string is marked by the terminating null character. The terminating null character is 0, or in C parlance, \0. For example, let's say you store the word John in a character array using the
strcpy()function, as follows:
char buffer[8]; strcpy(buffer,"John");In memory, the character array looks like this:
J O H N \0The terminating null character tells the compiler, "This is the end of the string." Since the character array is eight characters long, any characters following the terminating null will contain random values--garbage, if you will. These random characters don't matter, though, because as far as the compiler is concerned, the string ends at the terminating null.
This issue becomes important when you consider the declared length of a string. In the previous example, we declared a character array with an eight-character length. However, you can store only seven characters in the array, because C++ will add the terminating null character to any string you store.
So, if you try to store the eight-character string BillyBob in the array, you'll overrun the array by one character when C++ adds the terminating null. Your code will cause an access violation, either immediately or at some future point in the program.
The interesting factor is that C++ won't complain if you try to store 100 characters in an array that you declared to hold only 20 characters. It's up to you to keep track of how large your array is and to ensure that you don't overrun the end of the array.
You manipulate character arrays using a host of C++ functions. Table A lists a few of the most common string manipulation functions.
Table A: String manipulation functions
| Function | Description |
| sprintf() | Builds a string with formatting |
| strcpy() | Copies one string to another |
| strcmp() | Compares one string to another |
| strcmpi() | Performs a case-insensitive comparison |
| strcat() | Concatenates one string to another |
| strlen() | Returns the length of the string up to, and excluding, the terminating null |
| strstr() | Searches a string for an occurrence of another string |
While the character array is as old as dust, we shouldn't discard it as an antiquated tool. When you want fast string manipulation, character arrays are the way to go. In some cases, in fact, you have no choice but to use character arrays. For instance, many Windows API functions require pointers to character arrays as parameters. So, the character array won't disappear any time soon.
Why does C++ have a string class? Primarily because the C-style character arrays aren't very convenient when you need to manipulate many strings. Classes let you more easily truncate strings, concatenate strings, delete portions of a string, search a string for certain character combinations, and so on.
For example, the following code demonstrates two ways to add four strings together, using the string class and then again using a C-style character array:
string s = "Now is the time " + "for all good men " + "to come to the aid " + "of their country."
char buff[80]; // be sure it's big enough
strcpy(buff, "Now is the time"); strcat(buff, "for all good men"); strcat(buff, "to come to the aid"); strcat(buff, "of their country");This code illustrates two important points about string classes. First, notice that you don't have to worry about specifying the size of the string. The cstring class will allocate memory as needed to accommodate an operation such as concatenation. And, speaking of concatenation, the string class overrides the + operator and lets you use + to add to the end of the string. With character arrays, you must use the strcat() function to add strings.
Testing for equality is another operation that's much easier with a string class than with character arrays. For instance, consider the following lines of code:
if (!strcmp(buff, "test")) // strings match if (s == "test") // strings matchThe cstring version is easier to read and more intuitive than the character array version, particularly because the strcmp() function returns 0 if the strings match and a non-zero value if the strings don't match. The first line above seems to read "if not buff, compare to test," when in fact, the line compares for equality. Again, the string class version is more readable and understandable.
Another big plus about string classes is that you can call cstring class functions to perform operations on the string. Table B describes several cstring functions. This list isn't complete, but it provides an idea of the kinds of operations you can perform on a string object.
Table B: cstring class functions
| Name | Description |
| append() | Adds text to the end of a string |
| contains() | Determines whether a string contains another string |
| c_str() | Returns the character array buffer used by the string class to store a string's data |
| find() | Finds characters within the string class and returns a string's position |
| find_first_of() | Returns an index to the first occurrence of specified characters within a string |
| find_last_of() | Returns an index to the last occurrence of specified characters within a string |
| insert() | Inserts text into a string at the specified location |
| length() | Returns the length of the text in a string |
| prepend() | Adds text to the beginning of a string |
| remove() | Removes characters from a string |
| strip() | Removes trailing or leading characters, such as trailing blank, from a string |
| substring() | Creates a new string from characters in a string |
| to_lower() | Converts a string to lowercase |
| to_upper() | Converts a string to uppercase |
For example, you may need to strip the path and extension off a filename, using code like the following:
string s = "c:\\myprog\\readme.txt";
// find last backslash
int pos = s.find_last_of("\\");
// remove all characters from beginning of
// string to character following backslash
s.remove(0, pos + 1);
// chop extension off the end of string
s.remove(s.length() - 4, 4);
s.prepend("MyProgram - ");
// string now contains 'MyProgram - readme'
SetWindowText(Handle, s.c_str());
This code illustrates three points. First, you can easily manipulate the text in a cstring class object (the code might not seem easy, until you contrast it to the equivalent code using char arrays). Second, to specify the backslash in a C++ string literal, you must use a double backslash. Since a single backslash is an escape character used to enter special codes, to indicate an actual backslash in the string you must use a double backslash.
Finally, this code illustrates the c_str() function, which yields a char* representation of your string. In this case, the Windows API function SetWindowText() wouldn't understand if you tried to pass it the cstring object itself. The c_str() function isn't pretty, and it's a pain to type, but you'll need it if you're going to use string classes with any functions requiring a char*.
One problem with string classes is that, depending on how they're written, they can be fairly slow when performing much string manipulation. For instance, each time the string length changes, memory must be reallocated to account for the new string length. Depending on how thestring class was originally written, this reallocation can cost you manyclock cycles.
The STL string class has many functions in common with cstring, and the class also provides most of the same overloaded operators as cstring. Although STL is considered the new wave in terms of general C++ programming classes, I don't anticipate using STL strings extensively in C++Builder. For one thing, you can't place the STL headers in the C++Builder pre-compiled headers. Compiling a unit that contains the lines
#include <vcl\vcl.h> #include <string> #pragma hdrstopwill generate the compiler warning Could not create pre-compiled header: code in header. In order to eliminate the compiler warning, you must write the code this way:
#include <vcl\vcl.h> #pragma hdrstop #include <string>The bottom line is that a program using STL's basic_string class will take longer to compile because the STL header has to compile each and every time the unit compiles.
Another problem comes into play when you use cstring and basic_string, because both use the common name string. When you use the STL version of string, you must declare the std namespace. To do so, you can place the following declaration at the beginning of the unit that uses STL string:
using namespace std;This line tells the compiler, "Use the string class found in the std namespace." You can also declare the std namespace by explicitly declaring the string class with the std namespace specified, as follows:
std::string s = "This is an STL string.";It doesn't matter which method you use. This requirement, along with the header issue, makes me avoid using the STL string class all together.
SmallString<30> s; s = "Test";
VCL makes heavy use of the string data type (long string). Nearly all text-based VCL properties are of the Pascal string data type. For example, the Text, Name, and Caption properties are string properties. VCL also uses the string data type in various methods and event-handling functions.
You should understand two things about this data type. First, it's an actual language data type, not just a character array. Second, C++ has no built-in equivalent for the Pascal string data type. Since string is used so heavily in VCL, and since C++Builder uses the Pascal VCL, Borland created a C++ class called AnsiString to approximate the Pascal string data type. You can use this C++ class wherever you require a Pascal string data type.
Let's face it--the name AnsiString isn't particularly appealing. Somewhere in SYSDEFS.H, you'll find the following line:
typedef AnsiString String;This line lets you use the name String (uppercase S) to declare an instance of the AnsiString class. To illustrate, you can write a line like this:
String s = "This is a test";Since String is the recommended alias for the AnsiString class, there's no reason to use the name AnsiString in your C++Builder programs.
The String class, like the other C++ classes, has several functions to make string manipulation easier. The class's constructors allow you to create a String object from a char*, an int, or a double. In addition, the String class has several overridden operators to ease tasks like concatenation (+ and +=), assignment (=), and testing for equality (==). Finally, the conversion operators make mixing and matching String and other object types invisible to you.
Consider the following code, for example:
char* buff = "Test"; String s = "Test"; if (s == buff) DoSomething();This code works because the String class has a char* conversion operator (as well as the overridden == operator). The char* conversion operator performs an implicit conversion in this case, which allows you to test the contents of the String object and the contents of the character array for equality. The whole process happens automatically--you don't have to worry about how it works, just understand that it does work.
The String class, like cstring, has a c_str() function, which is required when you want to get the character buffer of the String object. For example, if you use the Windows API function DrawText(), you have to write code something like this:
String s = "This is a test"; DrawText(Canvas->Handle, s.c_str(), s.Length(), &rect, DT_SINGLELINE);Since the second parameter of the DrawText() function requires a pointer to a character buffer, you must use the c_str() function.
At this point, we need to mention an oddity of the String class: The index operator ([ ]) can reference a particular element of a string. For example, the lines
String s = "Hello World!"; Label->Caption = s[7];assign the character W to the Caption property of a label component. Note that the first element of the string is at array index 1--not array index 0, as with other C++ arrays.
While the 1-based index is required for technical reasons, I suspect this feature will cause C++Builder programmers some grief. For instance, the following code will fail silently:
String s = "c:\\myprog\\myprog.exe";
int index = s.LastDelimiter("\\");
s.Delete(0, index);
The code will fail because 0 isn't a valid index number for a string. The following line of codes.Delete(1, index);on the other hand, is correct.
I've found that the String class works well for almost all of my string needs. (This endorsement is based primarily on the fact that the String class was designed to be used with VCL properties and methods.) For day-to-day programming using VCL, the String class is a clear choice. It works seamlessly with VCL and includes enough functions to handle most string-manipulation chores. But, as usual, the choice of string types depends on the task at hand in your code. It's always best, though, to know your options. In this article, we've attempted to shed some light on the subject of handling strings in C++Builder.
Kent Reisdorph is a editor of the C++Builder Developer's Journal as well as director of systems and services at TurboPower Software Company, and a member of TeamB, Borland's volunteer online support group. He's the author of Teach Yourself C++Builder in 21 Days and Teach Yourself C++Builder in 14 Days. You can contact Kent at editor@bridgespublishing.com.