C++ STRING CLASSES

An exercise in class design

Scott Robert Ladd

Scott is a computer programming addict with 15 years of experience in a wide variety of languages. You can contact him via MCI Mail (369-4376), or at 705 W. Virginia, Gunnison CO 81230. Scott also maintains a BBS system dedicated to computer programming and scientific subjects; its phone number is 303-641-5125 (300/1200/2400 bps, 8 bits, no parity, 1 stop bit).

Object-oriented programming is the art of breaking down a program into its fundamental data types and their associated operations. A class defines a specific data type and encapsulates both the data definition and the operations for objects of that type. The fundamental problem facing the beginning C++ programmer is learning how to create classes. Almost a language unto itself, class definition in C++ requires an understanding of the fundamental philosophies behind object-oriented programming. Once you've grasped the process of class definition, you are well on the road to becoming an object-oriented programmer.

Object-oriented programming is not a task to be undertaken lightly; it requires forethought and planning. Therefore, the first action to be undertaken when designing a class is to determine exactly what its purpose is and how it will be used. In these days of extensive programming environments and high-pressure schedules, it might seem unrealistic to expect programmers to spend time designing their programs before writing them. In the case of developing classes, however, planning is of the utmost importance. The programmer must have a clear idea of the goal of a class before writing the first line of code.

A class in C++ is a definition of a new data type. For all intents and purposes, this new data type is an analog to those which C++ predefines. Similar to the int, float, and char types, the types defined by classes have their own built-in rules, operations, and attributes. This is known as "encapsulation," where form and function are linked. Encapsulation defines and controls exactly what a type can and cannot do.

Furthermore, a class also offers the capability to do "data abstraction." When we use a predefined type, such as a float, we are not concerned with exactly how floating-point numbers are stored and manipulated. The actual implementation of the float type will vary from architecture to architecture. If we had to rewrite our floating-point code every time we ported it between computers, there would be very little porting of code going on. Instead, floating-point values are abstracted. We know that we can do assignments and mathematical operations on them, and we write our programs using the high-level definition of these activities. We don't need to understand the complexities of binary multiplication to multiply floats; we assume the compiler understands this already, and will do it for us automatically. Classes give us the capability to "hide" the actual implementation of our own data types so that the users of those classes can make assumptions about how they work. Data abstraction frees the programmer from becoming involved in the details.

A String Class

The first project many budding C++ programmers undertake is the development of a character string class. One of C's primary faults is that it lacks the sophisticated string handling available in languages such as Pascal and Basic. Strings are quite useful: Nearly every program manipulates text data of one type or another. A string class was one of my first projects, and during the two years of its existence the class has undergone substantial changes. As my understanding of C++ has grown, so has my ability to build a better class. I am quite happy with the current incarnation, which works well in applications ranging from data bases to text editors. The entire class is shown in Listings One, string.hpp, (page 68) and Two, string.cpp (page 68). Listing Three, strtst.cpp, (page 69) shows a program to exercise the String class.

My goal was to create a dynamically allocated string class that would provide all the functionality of standard, NULL-terminated C character arrays (which I call "C-strings"). However, I wanted to avoid the pitfalls of C-strings; for example, errors often occur when working with C-strings because they fail to do any sort of range or validity checking. In addition, the start library functions defined in string.h are missing important features. In order for strings to be useful in a wide variety of applications, they needed manipulation routines not normally found in C function libraries, such as those for inserting and deleting data.

The private Section

Listing One shows the file string.hpp, which contains the definition of the String class. A String is defined as having three private instance variables: Siz, Len, and Txt. Siz contains the currently allocated length of the Txt pointer; Len holds the actual number of characters stored in String. The char pointer Txt points to the location on the heap of the buffer containing the String's text data. Every instance of String will have its own, unique variables with these names.

AllocIncr is not an instance variable; rather, it is a private "static class member." Any class data item defined as static has only one occurrence, shared by the entire class. In this case, there is only one copy of AllocIncr, which is common to all String objects. The purpose of private static class members is to eliminate global variables; they should be used whenever there is a data item that is accessed only from within a class scope. There's no need for anything external to the String class to "see" AllocIncr, so it is safely locked away from outside manipulation.

The public Section

In general, class methods are made public to facilitate their use by user-defined objects. In the case the String class, however, the method Shrink is used only internally by the class. Shrink adjusts the buffer space allocation for a string in order to eliminate wasted space. It is not meant to be called from outside the class scope, and thus it is declared in the private section of the String class definition.

The public section of the String class begins by defining two enumerated types: StrCompVal and StrCompMode. StrCompVal is the return value of the Compare method; StrCompMode is used to indicate whether or not String comparisons are case-sensitive. I use enumerations for these types so that I can control the validity of values being passed to and returned from methods. The types must be public so that the enumeration constants are available to the user of the class.

The remainder of the public section defines all of the other methods associated with the String class. The first four of these are constructors, which are specialized methods used to initialize (that is, construct) new String objects. Objects tend to be complex, and constructors allow the programmer complete control over how the instance variables of an object are loaded with values when an object is instantiated (comes into scope).

The constructor String() is used to create an empty, uninitialized string. String(String & Str) is known as the "copy constructor," it copies one String object into another. String(char * Cstr) allows a newly created String to be initialized with the value of a C-string. The last constructor, String(charFillCh, unsigned int Count), creates a new string, which contains Count FillCh characters.

Copy constructors require a bit of explanation. The C++ compiler will generate a copy constructor for you if you don't design one yourself. The generated copy constructor simply assigns the instance variables of one object to another. This default copy constructor won't work for most classes, including String. Each String contains a pointer to a buffer. When we create a copy of a String, we want it to have its own, unique buffer. The default copy constructor will merely assign the address of the existing buffer to the new Strings Txt instance variable, giving us two objects that are using the same memory space. Therefore, we need to define a copy constructor that duplicates the buffer from the original String for the new String.

The next method, ~String(), is a destructor. Destructors are called whenever an object is deleted or goes out of scope. As with constructors, most complex classes will require an explicitly defined destructor. A default constructor is created, but it merely frees the space being used by the instance variables of an object. This will not work for Strings; the Txt pointer locates spaces allocated on the stack, and this space must be deallocated to avoid wasting memory. The ~String() destructor handles this for us.

Remember that constructors and destructors are called automatically by the compiler. Every time an object is created, a constructor is called for it; every time an object is destroyed, a destructor is called. As we shall see shortly, constructors tend to get called far more often than is immediately apparent, and the programmer must be aware of these hidden method function calls.

The Length() and Size() methods simply return the current length and allocation size, respectively, of a string. Length corresponds to the string.h function strlen(). Size was originally created to help in testing the class; because it is so simple, I just left it in for future use. Another simple method is Empty(), which clears a string to the blank -- or empty -- value, with a length of 0 and an allocation of 8 bytes.

The remaining methods manipulate the value of String. Copy() and Dupe() provide similar functions; Copy() copies the contents of String into a pre-defined C-string, and Dupe() returns a pointer to a C-string (filled with the value of its Txt buffer) it created on the heap. Copy() is useful when an existing C-string needs to be filled with the value of String. Dupe() can be used as the equivalent of the standard function strdup().

Next we come to a series of operator definitions. The first is the operator = method, which handles assignments between Strings. Note that this is not duplicating the function of the copy constructor discussed earlier. The copy constructor is used when a new String is created; the assignment operator is used when the value of an existing String is assigned to another extant String.

You may wonder why I didn't create an assignment operator to copy a C-string to an existing String. The truth is that there is no need for such a method. Earlier, we defined a constructor (String(char * Cstr)) that created a String from a C-string. Whenever a C-string is used in a method invocation that expects a String as a parameter, the C-string is automatically converted via the constructor to a temporary String. Once the temporary String has been used, it is destructed.

This principle carries over to the methods for the two additive operators, + and + =. Rather than define methods for every possible combination of adding String and a C-string, these methods operated entirely on Strings. If a C-string is passed as one of the arguments to these methods, it is automatically converted to String by the String(char * Cstr) constructor.

The advantage of using conversion constructors such as String(char * Cstr) is simplicity. Only one method needs to be defined for each operation, and the constructor can take care of the rest. To add compatibility with another data type (say, an alternative string class), you merely need to create a single conversion constructor. The drawback is overhead: custom-written methods for each combination of values will not incur the overhead of hidden constructor and destructor calls.

The next method declared is Compare(). It compares two strings and returns an enumeration of type StrCompVal, which indicates the relationship between the two values. This works much like the standard function strcmp, with the exception that the Case parameter determines whether the comparison is case-sensitive. It's possible to define a series of operator methods for the <, >, and = = operations as well; I've found that Compare() works well enough for my purposes.

The method Find() duplicates the purpose of the standard function strstr(), by locating a given String within another String. It returns an index indicating where the substring begins. Similar to Compare, the Case parameter defaults to SF_IGNORE to indicate a case-insensitive comparison. The search can be made case sensitive by specifying the Case parameter as SF_SENSITIVE.

The Delete() method removes a specified number of characters from a string. The parameter Pos provides the index of the first character to be deleted, and the Count parameter indicates how many characters should be deleted.

Characters and strings can be inserted into String at any position with the Insert() methods. The first method listed inserts a single character, and the second inserts an entire String. Once again, the conversion constructor allows us to use a C-string in place of the String parameter.

Occasionally, it is necessary to extract a section of a string. The SubStr method accomplishes this by copying Count characters beginning at Pos into another String.

Last but not least, the indexing operator is defined. This allows us to extract single characters at specific positions within String in the same fashion as we would if working with a C-string. If the position given is beyond the last character of String, a NULL character is returned.

The Implementation

With the just mentioned definition and a copy of Listing One (string.hpp), a programmer should be able to use and expand upon String objects without ever seeing the implementation of the String class. There are, however, some interesting facets of method implementation that can be seen by studying Listing Two (string.cpp). It is particularly important to see the way in which objects are returned from functions.

If you examine the methods that return String, you will see what looks like a violation of proper programming practice. For example, SubStr() copies the substring into a local String object -- TempStr. Then, it returns TempStr, a seemingly dangerous act. After all, isn't the destructor for TempStr called when the method is exited, meaning that the recipient of the method's return value will get garbage?

Again, C++ is trickier than it looks. When an object is returned from a function, the copy constructor is called first to copy the return value into its destination, and then the destructor is called. This makes it much easier to write methods that return objects.

Summing Up

As I mentioned earlier, it is possible to improve this class. The most obvious improvement is to add methods for the comparison operators (<, >, and = =). I haven't needed them, but perhaps your application will.

The String class does not contain any inline methods. I tend to be cautious about making methods inline. Remember that under the C++ definition, inline methods can be treated as regular method functions by the compiler. Such as the register keyword, inline methods can be ignored by the compiler. No C++ compiler I know of will actually inline a method that contains complex control statements like loops and switches; these methods are made into function methods, regardless of their declaration as inline. Not all C++ translators can inline methods containing if statements, either.

Inline functions also tend to be abused by programmers who are learning C++. While an inline method is certainly faster than a function method, it can cause severe code-size increases. You need to analyze which methods are being used the most, and determine what the speed versus size trade-offs are. In the String class, the most obvious candidates for inlining are the simple methods like Length().

This class has served me well in a number of complex applications. Work with and modify it; if you come up with interesting alternatives and changes, I'd like to hear about your experiences.

_C++ STRING CLASSES_ by Scott Robert Ladd [LISTING ONE]




//  Header:     String  (Dynamic Strings)
//  Version:    1.01    13-Sep-1989
//  Language:   C++ 2.0
//  Environ:    Any
//  Compilers:  Zortech C++
//  Purpose:    Provides a general dynamic string class.
//  Written by: Scott Robert Ladd
//              705 West Virginia
//              Gunnison CO 81230
//              MCI ID:  srl
//              FidoNet: 1:104/45.2

#if !defined(STRING_HPP)
#define STRING_HPP

#include "stddef.h"

class String
    {
    private:
        // instance variables
        unsigned int Siz;    // allocated size
        unsigned int Len;    // current length
        char * Txt;          // pointer to text
        // class constant
        static unsigned int AllocIncr;
        // private method used to shrink a string to its minimum allocation
        void Shrink();
    public:
        enum StrCompVal  {SC_LESS, SC_EQUAL, SC_GREATER};
        enum StrCompMode {SF_SENSITIVE, SF_IGNORE};
        // constructor
        String();
        String(String & Str);
        String(char * Cstr);
        String(char FillCh, unsigned int Count);
        // destructor
        ~String();
        // value return methods
        unsigned int Length();
        unsigned int Size();
        // Function to return a blank string
        friend String Empty();
        // copy String to c-string method
        void Copy(char * Cstr, unsigned int Max);
        // create a c-string from String method
        char * Dupe();
        // assignment method
        void operator = (String & Str);
        // concatenation methods
        friend String operator + (String Str1, String Str2);
        void operator += (String Str);
        // comparison method
        StrCompVal Compare(String Str, StrCompMode Case = SF_IGNORE);
        // substring search methods
        int Find(String Str, unsigned int & Pos, StrCompMode Case = SF_IGNORE);
        // substring deletion method
        void Delete(unsigned int Pos, unsigned int Count);
        // substring insertion methods
        void Insert(unsigned int Pos, char Ch);
        void Insert(unsigned int Pos, String Str);
        // substring retrieval method
        String SubStr(unsigned int Start, unsigned int Count);
        // character retrieval method
        char operator [] (unsigned int Pos);
        // case-modification methods
        String ToUpper();
        String ToLower();
    };
#endif

[LISTING TWO]



//  Module:     String  (Dynamic Strings)
//  Version:    1.01    13-Sep-1989
//  Language:   C++ 2.0
//  Environ:    Any
//  Compilers:  Zortech C++
//  Purpose:    Provides a general dynamic string class.
//  Written by: Scott Robert Ladd
//              705 West Virginia
//              Gunnison CO 81230
//              MCI ID:  srl
//              FidoNet: 1:104/45.2

#include "String.hpp"
#include "string.h"
#include "stddef.h"
#include "ctype.h"

// class-global constant intialization
unsigned int String::AllocIncr = 8;

// private function to shrink the size of an allocated string
void String::Shrink()
    {
    char * Temp;
    if ((Siz - Len) > AllocIncr)
        {
        Siz  = ((Len + AllocIncr - 1) / AllocIncr) * AllocIncr;
        Temp = new char[Siz];
        memcpy(Temp,Txt,Len);
        delete Txt;
        Txt  = Temp;
        }
    }

// constructor
String::String()
    {
    Len    = 0;
    Siz    = AllocIncr;
    Txt    = new char[Siz];
    Txt[0] = '\x00';
    }
String::String(String & Str)
    {
    Len = Str.Len;
    Siz = Str.Siz;
    Txt = new char[Siz];
    memcpy(Txt,Str.Txt,Len);
    }
String::String(char * Cstr)
    {
    Len = strlen(Cstr);
    Siz  = ((Len + AllocIncr - 1) / AllocIncr) * AllocIncr;
    Txt = new char[Siz];
    memcpy(Txt,Cstr,Len);
    }
String::String(char FillCh, unsigned int Count)
    {
    unsigned int Pos;
    Siz = ((Count + AllocIncr - 1) / AllocIncr) * AllocIncr;
    Len = Siz;
    Txt = new char[Siz];
    memset(Txt,FillCh,Count);
    }

// destructor
String::~String()
    {
    delete Txt;
    }

// value return methods
unsigned int String::Length()
    {
    return Len;
    }
unsigned int String::Size()
    {
    return Siz;
    }

// Function to return a blank string
String Empty()
    {
    static String EmptyStr;
    return EmptyStr;
    }

// copy String to c-string method
void String::Copy(char * Cstr, unsigned int Max)
    {
    unsigned int CopyLen;
    if (Max == 0)
        return;
    if (Len >= Max)
        CopyLen = Max - 1;
    else
        CopyLen = Len;
    memcpy(Cstr,Txt,CopyLen);
    Cstr[CopyLen] = '\x00';
    }

// create a c-string from String method
char * String::Dupe()
    {
    char * new_cstr;
    new_cstr = new char[Len + 1];
    memcpy(new_cstr,Txt,Len);
    new_cstr[Len] = '\x00';
    return new_cstr;
    }

// assignment method
void String::operator = (String & Str)
    {
    Len    = Str.Len;
    Siz    = Str.Siz;
    delete Txt;
    Txt = new char[Siz];
    memcpy(Txt,Str.Txt,Len);
    }

// concatenation methods
String operator + (String Str1, String Str2)
    {
    unsigned int NewLen, NewSiz, CopyLen;
    String TempStr;
    char * Temp;
    TempStr = Str1;
    CopyLen = Str2.Len;
    NewLen  = TempStr.Len + Str2.Len;
    NewSiz  = TempStr.Siz + Str2.Siz;
    Temp = new char[NewSiz];
    memcpy(Temp,TempStr.Txt,TempStr.Len);
    delete TempStr.Txt;
    TempStr.Txt = Temp;
    memcpy(&TempStr.Txt[TempStr.Len],Str2.Txt,CopyLen);
    TempStr.Len = NewLen;
    TempStr.Siz = NewSiz;
    TempStr.Shrink();
    return TempStr;
    }
void String::operator += (String Str)
    {
    unsigned int NewLen, NewSiz, CopyLen;
    char * Temp;
    CopyLen = Str.Len;
    NewLen  = Len + CopyLen;
    NewSiz  = Siz + Str.Siz;
    Temp = new char[NewSiz];
    memcpy(Temp,Txt,Len);
    delete Txt;
    Txt = Temp;
    memcpy(&Txt[Len],Str.Txt,CopyLen);
    Len = NewLen;
    Siz = NewSiz;
    Shrink();
    }

// comparison method
StrCompVal String::Compare(String Str, StrCompMode Case)
    {
    char * Temp1, * Temp2;
    Temp1 = new char[Len + 1];
    Copy(Temp1,Len+1);
    Temp2 = new char[Str.Len + 1];
    Str.Copy(Temp2,Str.Len+1);
    if (Case == SF_IGNORE)
        {
        strupr(Temp1);
        strupr(Temp2);
        }
    switch (strcmp(Temp1,Temp2))
        {
        case -1: return SC_LESS;
        case  0: return SC_EQUAL;
        case  1: return SC_GREATER;
        }
    delete Temp1;
    delete Temp2;
    }

// substring search methods
int String::Find(String Str, unsigned int & Pos, StrCompMode Case)
    {
    char * TempStr1, * TempStr2;
    unsigned int LastPos, SearchLen, TempPos;
    int Found;
    TempStr1 = new char[Len + 1];
    memcpy(TempStr1,Txt,Len);
    TempStr1[Len] = '\x00';
    TempStr2 = new char[Str.Len + 1];
    memcpy(TempStr2,Str.Txt,Str.Len);
    TempStr2[Str.Len] = '\x00';
    if (Case == SF_IGNORE)
        {
        strupr(TempStr1);
        strupr(TempStr2);
        }
    Pos     = 0;
    TempPos = 0;
    Found   = 0;
    SearchLen = Str.Len;
    LastPos   = Len - SearchLen;
    while ((TempPos <= LastPos) && !Found)
        {
        if (0 == strncmp(&TempStr1[TempPos],TempStr2,SearchLen))
            {
            Pos   = TempPos;
            Found = 1;
            }
        else
            ++TempPos;
        }
    delete TempStr1;
    delete TempStr2;
    return Found;
    }

// substring deletion method
void String::Delete(unsigned int Pos, unsigned int Count)
    {
    unsigned int CopyPos;
    if (Pos > Len)
        return;
    CopyPos = Pos + Count;
    if (CopyPos >= Len)
        Txt[Pos] = 0;
    else
        while (CopyPos <= Len)
            {
            Txt[Pos] = Txt[CopyPos];
            ++Pos;
            ++CopyPos;
            }
    Len -= Count;
    Shrink();
    }

// substring insertion methods
void String::Insert(unsigned int Pos, char Ch)
    {
    char * Temp;
    if (Pos > Len)
        return;
    if (Len == Siz)
        {
        Siz += AllocIncr;
        Temp = new char[Siz];
        memcpy(Temp,Txt,Len);
        delete Txt;
        Txt = Temp;
        }
    if (Pos < Len)
        for (unsigned int Col = Len + 1; Col > Pos; --Col)
            Txt[Col] = Txt[Col-1];
    Txt[Pos] = Ch;
    ++Len;
    }
void String::Insert(unsigned int Pos, String Str)
    {
    unsigned int SLen = Str.Len;
    SLen = Str.Len;
    if (SLen > 0)
        for (unsigned int I = 0; I < SLen; ++I)
            {
            Insert(Pos,Str.Txt[I]);
            ++Pos;
            }
    }

// substring retrieval method
String String::SubStr(unsigned int Start, unsigned int Count)
    {
    String TempStr;
    char * Temp;
    if ((Start < Len) && (Count > 0))
        for (unsigned int Pos = 0; Pos < Count; ++Pos)
            {
            if (TempStr.Len == TempStr.Siz)
                {
                TempStr.Siz += AllocIncr;
                Temp = new char[TempStr.Siz];
                memcpy(Temp,TempStr.Txt,Len);
                delete TempStr.Txt;
                TempStr.Txt = Temp;
                }
            TempStr.Txt[Pos] = Txt[Start + Pos];
            ++TempStr.Len;
            }
    return TempStr;
    }

// character retrieval method
char String::operator [] (unsigned int Pos)
    {
    if (Pos >= Len)
        return '\x00';
    return Txt[Pos];
    }

// case-modification methods
String String::ToUpper()
    {
    String TempStr = *this;
    for (unsigned int Pos = 0; Pos < Len; ++Pos)
        TempStr.Txt[Pos] = toupper(TempStr.Txt[Pos]);
    return TempStr;
    }
String String::ToLower()
    {
    String TempStr = *this;
    for (unsigned int Pos = 0; Pos < Len; ++Pos)
        TempStr.Txt[Pos] = tolower(TempStr.Txt[Pos]);
    return TempStr;
    }

[LISTING THREE]



#include "String.hpp"
#include "stdio.h"
#include "stream.hpp"

int main();
void print_string(String S);

String s1;
String s2("This is the second string!");

int main()
    {
    String ls;
    String ls2("Another local string");
    unsigned int pos, i;
    char ch;

    s1 = s2;
    ls = "This is the local string.";
    print_string(s1);
    print_string(s2);
    print_string(ls);
    print_string(ls2);
    cout << "\n";

    s1 = s2 + ls;
    print_string(s1);

    s1 = "String one has a value.";
    print_string(s1);

    s1 = s1 + "****";
    print_string(s1);

    s2 += "*****";
    print_string(s2);
    print_string(ls);

    s2 += ls;
    print_string(s2);

    cout << "\n";

    if (s2.Find("Burfulgunk",pos))
        printf("first search = %d\n",pos);
    if (s2.Find("*****",pos))
        printf("second search = %d\n",pos);
    if (s2.Find(ls,pos))
        printf("third search = %d\n",pos);
    if (s2.Find(s1,pos))
        printf("fourth search = %d\n",pos);
    ls2 = "&&";

    s1.Insert(10,'*');
    s1.Insert(15,ls2);
    print_string(s1);

    s1.Insert(s1.Length(),'%');
    s1.Insert(s1.Length(),'%');
    s1.Insert(s1.Length(),'%');
    s1.Insert(s1.Length(),'%');
    print_string(s1);

    for (i = 0; 0 != (ch = s1[i]); ++i)
        putchar(ch);
    putchar('\n');

    s1.Insert(2,"<><><><><>");
    print_string(s1);

    s1.Delete(2,10);
    print_string(s1);

    s2 = s1.ToUpper();
    print_string(s2);

    s2 = s1.ToLower();
    print_string(s2);

    s1 = Empty();
    print_string(s1);

    s1 = s2.SubStr(2,10);
    print_string(s1);

    return 0;
    }
void print_string(String S)
    {
    char * cs;
    cs = S.Dupe();
    cout << cs << " Len = " << S.Length() << " Siz = " << S.Size() << "\n";
    delete cs;
    }