Understanding AnsiString's c_str() function

by Kent Reisdorph

As you probably know, the VCL is written in Object Pascal. Object Pascal contains a dynamic string data type called AnsiString. The Object Pascal AnsiString type is dynamic in that memory is allocated and deallocated by the compiler as needed. The VCL makes heavy use of the AnsiString data type.

Unlike Object Pascal, C++ does not have a native data type for strings. Instead, character arrays are used to hold string data (often called null-terminated strings). C++Builder’s AnsiString class was designed to interface with Object Pascal’s AnsiString type.

The AnsiString class has a wide variety of methods that make string manipulation easy. Perhaps the most widely misunderstood (and abused) AnsiString function is the c_str() function. It can helpful to know when to use c_str() and the pitfalls to avoid when using it.

 

The advantages of a string class

A string class makes working with strings much easier than is possible using character arrays and the C runtime library functions. As an example, consider the case where you want to concatenate two strings. The C code to concatenate two strings looks like this:

char buff[20];
strcpy(buff, "Hello");
strcat(buff, " World!");
Label1->Caption = buff;

This is fairly straightforward, but careful attention must be paid to ensure that you don’t overwrite the end of the character array. Using AnsiString, you can accomplish the same thing with this code:

String S = "Hello";
S += " World!";
Label1->Caption = S;

Obviously, this code is more readable and more intuitive than the C code presented in the first example. Concatenation of strings is a very simple example. Deleting part of a character array, for example, is much more work using the C functions than it is using the AnsiString class.

Ultimately, the string data for any C++ string class is still stored within the class as a null-terminated string. In the case of AnsiString, a private data member called Data holds the string data. Data is declared in DSTRING.H as follows:

private:
  char *Data;

The memory needed to hold the string data is dynamically allocated as needed. For example, concatenating two strings together will result in the memory associated with the Data variable being deleted and then reallocated to account for the new size.

 

The c_str() function

The c_str() function returns a pointer to AnsiString’s private Data variable. It is an inline function whose declaration is ridiculously simple:

char* __fastcall c_str() const
  { return (Data)? Data: "";}

If Data is non-null, c_str() returns a pointer to the data. If Data is null, c_str() returns an empty string.

When do you use the c_str() function? Basically, you use c_str() when passing the string represented by an AnsiString to any function requiring a char*. The TCanvas class, for example, does not have a DrawText() method. In order to use the Windows API function DrawText() you need to pass a char* for the second parameter. For example:

String S = "Hello World!";
TRect R(20, 20, 100, 40);
DrawText(Canvas->Handle, 
  S.c_str(), -1, &R, DT_SINGLELINE);

The most common use of c_str() is when calling API functions, or when passing data from an AnsiString to other string classes (such as the STL’s basic_string class).

 

Common misuses of c_str()

The c_str() function is necessary but it is often misused by C++Builder programmers. This section will address common misuses of c_str().

 

Using c_str() out of scope

It is important to understand that the pointer returned by c_str() is temporary. That is, it is not guaranteed to exist past the line of code in which it is used. For example, the following code, while syntactically correct, will likely fail:

char* buffer = S.c_str();
strcat(buffer, " more text");

In some cases this code may work, but if it does it is more luck than science. The code may appear to work, but often the buffer variable will point to some random data in the second line of this example.

If you need to store the string data contained in an AnsiString, you need to allocate memory and copy the data pointed to by c_str() as shown here:

char* buffer = new char[reqLen];
strcpy(buffer, S.c_str());
strcat(buffer, " more text");
delete[] buffer;

This is the only way to ensure that you are working with data that is, in fact, valid.

 

Writing to c_str()

Simply put, you should never attempt to write directly to c_str(). For example, the GetWindowsDirectory() API function is used to obtain the directory where Windows is installed. The first parameter of GetWIndowsDirectory() is used to specify a char* that will contain the Windows path when the function returns. You might be tempted to use code like this:

String S;
GetWindowsDirectory(S.c_str(),MAX_PATH);
Label1->Caption = S;

This code compiles and runs without error, but will produce erroneous results. In this example, AnsiString has not allocated any storage for the Data variable. If you look back and examine the code for c_str() you will see that the code above evaluates to something like this:

GetWindowsDirectory("", MAX_PATH);

Remember that c_str() will return an empty string if no memory has been allocated for the string data. For this reason, the previous code won’t produce the expected results, but it also won’t generate any errors. In some cases, though, writing to c_str() may result in an access violation. Here’s an example:

String S = "hello";
GetWindowsDirectory(S.c_str(),MAX_PATH);
Label1->Caption = S;

In this case AnsiString allocates six bytes of storage to hold the assigned string. When GetWindowsDirectory() executes, the internal AnsiString buffer is overwritten by several bytes (the exact number depends on where Windows is installed on your system). The result is an access violation, either immediately or sometime later in the program’s execution.

You can force AnsiString to allocate memory by calling the SetLength() method. For example:

String S;
S.SetLength(MAX_PATH);
GetWindowsDirectory(S.c_str(),MAX_PATH);
Label1->Caption = S;

This code produces the expected result, but it is still best to avoid writing to the c_str() function at all. Doing so is not good practice and there is always an alternative. The proper way to write the code in preceding examples is like this:

char buff[MAX_PATH];
GetWindowsDirectory(buff, MAX_PATH);
String S = buff;
Label1->Caption = S;

AnsiString will make a copy of the data contained in the buff variable and will allocate the proper amount of storage.

 

Conclusion

The AnsiString class is a powerful way to manipulate string data. Its c_str() function is a necessary and valuable tool, but must be used with care. Understanding how to properly use c_str() is vital to writing error-free applications.