Whenever a data message is transmitted it can be corrupted in ways that cannot be predicted. Even data sitting on a disk can be corrupted by hardware failures, software bugs, or malicious activity.
A key requirement of networking systems and file transfer programs is to ensure that data gets transferred reliably. When a message is received, the amount of data and contents must match what was transmitted exactly. If a message is modified during transmission, the system asks the sender to retransmit the corrupted message.
The first problem in handling data corruption is how to determine if the data has been modified. An early method used in serial communications was to check the parity of the data bytes. Since ASCII characters only require 7-bits, one bit is left over in each byte and that bit can be used to ensure that each byte has either odd or even parity. In other words, the sender would set or clear the last bit to ensure that each data byte either had either an odd or even number of bits set, depending upon what the receiver was expecting.
The problem with this system is that it only works reliably with errors of one bit. If more than one bit gets corrupted, there is only a 50/50 chance of detecting errors.
Another simple method for error detection is to include a checksum value within each message. The sender calculates the checksum for the data before it is transmitted. The receiver also calculates the checksum on the data and, if the checksum values do not match, the receiver knows that the data has been corrupted in some way.
A simple checksum function is to take the sum of each byte in the message. Take a message consisting of the characters “ABCDEF”, for example. Before transmitting the message, an application can sum each byte and append the result (495) to the message.
While summing the data bytes allows errors to be detected in some situations, it is possible for many types of errors to slip by. For example, a message consisting of the characters “ACBDEF” differs from the previous string by only two bits, but the sum of the data bytes is the same. The data is obviously incorrect but a checksum won’t catch the error.
What is really needed for reliable error detection is a function that causes small changes in the input to produce large changes in the check value. In other words, we need a function that essentially produces a random value from a data block.
The most common method for generating a check value is known as the Cyclic Redundancy Check, or CRC. A CRC function can be used to calculate the CRC on a block of data. The CRC function works by treating a data block as a large binary number and dividing by a constant value. The CRC result is the division remainder.
In reality, the process of generating a CRC is not true division, but rather something that looks a lot like division. CRC calculations use binary arithmetic with no carries or borrows. Bit positions are XORed rather than subtracted.
Figure A show the process for generating the CRC value for the message “AB” (0100 0001 0100 0010 in binary) using the divisor 1 10001. Notice in Figure A how XOR operations are used in place of subtraction as in real division. Also notice that the number of bits in the CRC value is one less than the number of bits in the divisor.
CRC is simple to implement in hardware but if the CRC were implemented in software by performing bit-by-bit pseudo-long division, it would be relatively computationally intensive. Fortunately long division is not required in a software implementation.
If you look at the individual XOR operations in Figure A you can see that at each step the one data bit is shifted into the remainder and XORed. The individual steps are trivial.
Software CRC implementations are invariably implemented with a lookup table, allowing byte-by-byte processing rather than by bit. The application maintains a register containing the CRC value and updates it as each byte is processed.
The following code fragment shows how the CRC is calculated for a buffer from a lookup table. BITCOUNT is the size of the CRC value in bits. I will explain how the lookup table is generated later.
for (unsigned int ii=0;ii<length;++ ii)
{
int index = ((crc_register >> (BITCOUNT - 8)) ^ buffer[ii]);
crc_register = crc_table [index & 0xFF] ^ (crc_register << 8);
}
In Figure A I made an arbitrary choice for the divisor in the CRC calculation. There are a number of CRC variants in use and one of the ways in which they vary is in the choice of the divisor. The choice of non-zero bits in the divisor determines the types of errors the CRC variant can detect.
Descriptions of the CRC frequently describe it in terms of polynomial division. Instead of representing the CRC divisor as 1 1000 0000 0000 0101 it is represented as x16 + x15 + x2 + 1. This is the reason you will usually see the divisor referred to as the CRC polynomial.
Since the highest order bit is always 1, when the polynomial is shown in hexadecimal the highest order bit is invariably left out. For example, the CRC polynomial shown above would be listed as the
2-byte value 0x8005 rather than 0x18005.
Most CRC functions produce either 16 or 32-bit values (17 or 33-bit polynomials). There are some 12-bit CRC functions, but these are not very common. The number of bits in the divisor limits the size of the block that can reliably detect errors. Divisors with more bits can reliably detect errors in larger data blocks.
Besides the polynomial length and polynomial value, there are three other common variations among CRC functions.
From an implementation point of view, the most significant difference among CRC variants is the ordering of the bits in the data block. When performing division, start with the most significant bit and work to the least significant.
However, when data is transmitted, such as over a serial line, the least significant bit within a byte often comes first in the data stream. In a hardware implementation, it is easier to process the bits as they arrive.
As a result, there are CRC variants that process data bits both in the natural order and reversed. Since the CRC uses something that looks like division, but is not actually division, the order in which the bits are processed is irrelevant (as long as they are processed consistently in the same order).
This example shows the process for using a table lookup to calculate a reversed CRC. The two versions are almost identical except for way the CRC register is shifted.
for (unsigned int ii=0;ii < length; ++ii)
{
crc_register = crc_table.values[(crc_register ^
buffer [ii]) & 0xFF] ^ (crc_register >> 8);
}
Refer back to the CRC process shown in Figure A. Suppose the message were corrupted with zero bits inserted at the start of the message. Since zero divided by the polynomial is zero, these extra bits do not change the CRC value. In order to detect this type of error, some CRC variants initialize the CRC register to a value other than zero.
Some CRC variants modify the remainder value as well by taking the complement of the value (XOR with –1). The CRC variants that modify the remainder invariably initialize the CRC to a value other than 0. The converse is not true, however.
In the previous sections I have identified the five ways common CRC variants differ: polynomial length, polynomial value, reversed or forward, initialization, and finalization. Table A shows how the major CRC variants fit into these classifications.
Table A: CRC Variants
| Name | Polynomial Length | Polynomial 1 | Reserved | Initialization | Final XOR |
| CRC-16 | 16 | 0x8005 | Yes | 0 | 0 |
| X.25 | 16 | 0x1021 | Yes | -1 | -1 |
| CCITT | 16 | 0x1021 | No | -1 | 0 |
| XMODEM | 16 | 0x1021 | No | 0 | 0 |
| CRC-32 | 32 | 0x04D11DB7 | Yes | -1 | -1 |
C++ templates provide a good mechanism for dealing with parameterized variants, such as the ones shown in Table A. Here we have a template class whose parameters are modeled on Table A.
template < unsigned int BITCOUNT, unsigned long POLYNOMIAL, bool REVERSE, unsigned long INITIAL, unsigned long FINALMASK>
class Crc {
public:
Crc();
Crc(const Crc &);
~Crc() {}
Crc &operator=(const Crc &);
unsigned long value() const;
void reset();
void update(const char *buffer,
unsigned int length);
private:
struct CrcTable {
enum { MAXBYTEVALUES = 256 };
CrcTable();
unsigned long values [MAXBYTEVALUES];
};
static const CrcTable crc_table;
unsigned long crc_register;
};
The following function creates the value for an entry in a lookup table for the forward CRC process. You can see that this function divides the input value using the process shown in Figure A. Here is the function:
unsigned int ForwardTableEntry(
unsigned long polynomial,
unsigned int entryindex,
unsigned int bitcount)
{
unsigned long result =
entryindex << (bitcount - 8);
for (unsigned int ii=0;
ii<BITSPERBYTE; ++ii) {
if ((result & bits [bitcount-1]) == 0)
result <<= 1;
else
result = (result << 1) ^ polynomial;
}
unsigned long mask = ((1UL << (bitcount - 1)) - 1UL) |
(1UL << (bitcount - 1));
result &= mask;
return result;
}
Entries for reversed CRC lookup tables are created in a similar manner, except with bit reversals.
const unsigned long bits [32] = {
0x00000001UL, 0x00000002UL, 0x00000004UL,
0x00000008UL, 0x00000010UL, 0x00000020UL,
0x00000040UL, 0x00000080UL, 0x00000100UL,
0x00000200UL, 0x00000400UL, 0x00000800UL,
0x00001000UL, 0x00002000UL, 0x00004000UL,
0x00008000UL, 0x00010000UL, 0x00020000UL,
0x00040000UL, 0x00080000UL, 0x00100000UL,
0x00200000UL, 0x00400000UL, 0x00800000UL,
0x01000000UL, 0x02000000UL, 0x04000000UL,
0x08000000UL, 0x10000000UL, 0x20000000UL,
0x40000000UL, 0x80000000UL};
unsigned long Reverse(
unsigned long value,
unsigned int bitcount)
{
unsigned long result = 0;
for (unsigned int jj=0;jj<bitcount;++jj)
{
if ((value & bits [jj]) != 0)
result |= bits [bitcount - jj - 1];
}
return result;
}
unsigned long ReverseTableEntry(
unsigned int polynomial,
unsigned int entryindex,
unsigned int bitcount)
{
unsigned long result = entryindex;
for (unsigned int ii=0;ii < 8; ++ii)
{
if ((result & 1) == 0)
result >>= 1;
else
result = (result >> 1) ^
Reverse(polynomial, bitcount);
}
unsigned long mask =
((1UL << (bitcount - 1)) - 1) |
(1UL << (bitcount - 1));
result &= mask;
return result;
}
Here is the class constructor and definition for the lookup table:
template < unsigned int BITCOUNT, unsigned long POLYNOMIAL, bool REVERSE, unsigned long INITIAL, unsigned long FINALMASK>
const CrcTable Crc< BITCOUNT, POLYNOMIAL,REVERSE,INITIAL, FINALMASK>::crc_table; template < unsigned int BITCOUNT, unsigned long POLYNOMIAL, bool REVERSE, unsigned long INITIAL, unsigned long FINALMASK>
Crc<BITCOUNT,POLYNOMIAL, REVERSE,
INITIAL,FINALMASK>::CrcTable::CrcTable()
{
if (REVERSE) {
for(unsigned int ii=0;
ii<MAXBYTEVALUES; ++ii) {
values [ii] =
ReverseTableEntry(
POLYNOMIAL, ii, BITCOUNT);
}
} else {
for(unsigned int ii=0;
ii<MAXBYTEVALUES; ++ii) {
values [ii] = ForwardTableEntry(
POLYNOMIAL, ii, BITCOUNT);
}
}
}
The lookup table is a static member object so that all equivalent instantiations share the same table. The table defines a constructor that initializes it at startup.
The reset() function on the following page initializes the CRC using the value specified in the template parameter. Use this function when you want to use a CRC object to calculate the CRC value for more than one block of data.
template < unsigned int BITCOUNT, unsigned long POLYNOMIAL, bool REVERSE, unsigned long INITIAL, unsigned long FINALMASK>
void Crc<BITCOUNT,POLYNOMIAL,
REVERSE,INITIAL,FINALMASK>::reset()
{
crc_register = INITIAL;
return;
}
The update() function updates the CRC register using a data block:
template <
unsigned int BITCOUNT,
unsigned long POLYNOMIAL,
bool REVERSE,
unsigned long INITIAL,
unsigned long FINALMASK>
void Crc<BITCOUNT,POLYNOMIAL,
REVERSE,INITIAL,FINALMASK>
::update(const char *buffer,
unsigned int length)
{
// The process for updating depends upon
// whether or not we are using the reversed
// CRC form.
if (REVERSE)
{
for (unsigned int ii=0;ii<length;++ii){
crc_register = crc_table.values
[(crc_register ^ buffer[ii]) & 0xFF]
^ (crc_register >> 8);
}
} else {
for (unsigned int ii=0;ii<length;++ii) {
unsigned long index = (
(crc_register >> (BITCOUNT -
BITSPERBYTE)) ^ buffer[ii]);
crc_register =
crc_table.values[index & 0xFF] ^
(crc_register << BITSPERBYTE);
}
}
return;
}
The value() function returns the contents of the CRC register XORed with the
FINALMASK template parameter:
template <unsigned int BITCOUNT,
unsigned long POLYNOMIAL,
bool REVERSE,
unsigned long INITIAL,
unsigned long FINALMASK>
unsigned long Crc<BITCOUNT,POLYNOMIAL,
REVERSE,INITIAL,FINALMASK>::value() const
{
unsigned long result =
crc_register ^ FINALMASK;
// The initial value is
// 1 << BITCOUNT - 1. The convolutions
// prevent an integer overflow.
static const unsigned long mask =
((1UL << (BITCOUNT - 1)) - 1UL) |
(1UL << (BITCOUNT - 1));
result &= mask;
return result;
}
The example program for this article includes the copy constructor, default constructor, and assignment operator. You can download the example from our Web site at www.bridgespublishing.com.
You need to create a header file and compilation for each CRC class you want to create from the template. To create a CRC-32 class you would create the header file CRC32.H that defines a type that uses the values in Table A as template parameters:
#include "crc.h" typedef Crc <32, 0x04C11DB7, true, 0xFFFFFFFF, 0xFFFFFFFF> Crc32;
To create the compilation unit CRC32.CPP, all you need are include directives.
#include "crc32.h" #include "crc.cpp"
When you compile the CRC32.CPP file you must use the -Ja switch either at the command line or using
#pragma option -Ja
This ensures that the template members get created properly.
The following console application example shows how to use the Crc32 class to calculate the CRC-32 value for the string “abcdefgh12345678.”
#include <iostream>
#include <cstring>
using namespace std;
#include "crc32.h"
const char *msg1 = "abcdefgh";
const char *msg2 = "12345678";
main() {
Crc32 crc;
crc.update(msg1, strlen(msg1));
crc.update(msg2, strlen(msg2));
cout << hex << crc.value() << endl;
return 0;
}
Don’t be concerned over efficiency because of the tests for reversed CRC in the update function. Since the value in the test is a constant, you can be assured that any compiler will optimize these tests out.
The CRC is an effective and efficient method for detecting corruption in data. A template implementation allows many CRC variants to be implemented with one set of code. The template CRC class shown here allows you easily incorporate data verification in your applications.