Applications that exchange data across multiple systems must deal with differences in integer representation. The function WriteIntegers() reads three integers from cin and writes them in binary format to the file DATA.DAT. The function ReadIntegers() reads the file and prints the three integers to cout. Both functions are shown in Listing A.
Listing A: WriteIntegers() and ReadIntegers()
void WriteIntegers ()
{
BYTE4 a, b, c ;
cin >> a >> b >> c ;
ofstream strm ("DATA.DAT", ios::binary) ;
strm.write ((char*) &a, sizeof (a)) ;
strm.write ((char*) &b, sizeof (b)) ;
strm.write ((char*) &c, sizeof (c)) ;
return ;
}
void ReadIntegers()
{
BYTE4 a, b, c ;
ifstream strm ("DATA.DAT", ios::binary) ;
strm.read ((char *) &a, sizeof (a)) ;
strm.read ((char *) &b, sizeof (b)) ;
strm.read ((char *) &c, sizeof (c)) ;
cout << "a = " << a
<< " b = " << b
<< " c = " << c << endl ;
return ;
}
If you execute WriteIntegers() followed by ReadIntegers() on a PC, everything
will work as expected. On the other hand, if you were to run WriteIntegers() on
another system, such as a SPARC or a Mac, and then transfer the file to a PC
and run ReadIntegers(), the output values wouldn't be the same as those written
to the file.
The problem is that two different formats are used to represent multi-byte integers. The Intel 80x86 family, Alpha, and some MIPS processors store the least significant byte first in an integer variable. This format is known as Little Endian. Not surprisingly, systems that store the most significant byte of an integer first are known as Big Endian. The SPARC, 680x0, and some MIPS chips are Big Endian. Table A shows how the value 0x12345678 is stored on both types of systems.
Table A: Two different storage formats
Bit 31 Bit 0 |78|56|34|12| Big Endian |12|34|56|78| Little EndianWhen you create applications that transmit binary data over a network or exchange it in a file, you need to take the integer format into account. For example, graphics formats, such as GIF (Little Endian) and JPEG (Big Endian), specify the byte order to be used.
In a Little Endian system, a small value in an int can be correctly accessed with an overlaid char. The function WhatType() uses this relationship to display the type of integer format used by a processor:
void WhatType()
{
int x = 1 ;
if (*(char *) &x == 1)
cout << "Little Endian" << endl ;
else
cout << "Big Endian" << endl ;
return ;
}
Converting
between Little Endian and Big Endian format is simply a matter of swapping
bytes and the conversion is the same in both directions.
The following example shows how an Endian conversion function could be
implemented for multiple systems. It would be appropriate in applications that
exchange data in Little Endian format. On a Big Endian system, the function
EndianConversion() converts its parameter value to Little Endian format. On a
Little Endian system, this function simply returns its input value. The macro
BIGENDIAN would be defined on the command line:typedef unsigned long UBYTE4 ;
#if defined (BIGENDIAN)
inline UBYTE4 EndianConversion (UBYTE4
input)
{
UBYTE4 result =
((input & 0x000000FFL) << 24)
| ((input & 0x0000FF00L) << 16)
| ((input & 0x00FF0000L) >> 16)
| ((input & 0xFF000000L) >> 24) ;
return result ;
}
#else
inline UBYTE4 EndianConversion (UBYTE4
input)
{
return input ;
}
#endif
If
you've ever done any Internet socket programming, you've probably used the
functions htonl() and ntohl(). Internet protocols use Big Endian format to
represent integers. These functions return their argument on Big Endian systems
and do a byte swap on Little Endian systems.
When you need to exchange integers among different systems, you need to pick an integer format to use. One format is just as good as another. You simply need to pick one and be consistent. Use functions to convert between the exchange format and the system format.