June 1999

Handling differences in integer representation

by John M. Miano

Applications that exchange data across multiple systems must deal with differences in integer representation. The function WriteIntegers() reads three integers from cin and writes them in binary format to the file DATA.DAT. The function ReadIntegers() reads the file and prints the three integers to cout. Both functions are shown in Listing A.

Listing A: WriteIntegers() and ReadIntegers()

void WriteIntegers ()
{
	BYTE4 a, b, c ;
	cin >> a >> b >> c ;
	ofstream strm ("DATA.DAT", ios::binary) ;
	strm.write ((char*) &a, sizeof (a)) ;
	strm.write ((char*) &b, sizeof (b)) ;
	strm.write ((char*) &c, sizeof (c)) ;
	return ;
}
void ReadIntegers()
{
		BYTE4 a, b, c ;
		ifstream strm ("DATA.DAT", ios::binary) ;
		strm.read ((char *) &a, sizeof (a)) ;
		strm.read ((char *) &b, sizeof (b)) ;
		strm.read ((char *) &c, sizeof (c)) ;

		cout << "a = " << a 
			<< " b = " << b 
			<< " c = " << c << endl ;
		return ;
}
If you execute WriteIntegers() followed by ReadIntegers() on a PC, everything will work as expected. On the other hand, if you were to run WriteIntegers() on another system, such as a SPARC or a Mac, and then transfer the file to a PC and run ReadIntegers(), the output values wouldn't be the same as those written to the file.

The problem is that two different formats are used to represent multi-byte integers. The Intel 80x86 family, Alpha, and some MIPS processors store the least significant byte first in an integer variable. This format is known as Little Endian. Not surprisingly, systems that store the most significant byte of an integer first are known as Big Endian. The SPARC, 680x0, and some MIPS chips are Big Endian. Table A shows how the value 0x12345678 is stored on both types of systems.

Table A: Two different storage formats

Bit 31		Bit 0
|78|56|34|12| 	Big Endian
|12|34|56|78|	Little Endian
When you create applications that transmit binary data over a network or exchange it in a file, you need to take the integer format into account. For example, graphics formats, such as GIF (Little Endian) and JPEG (Big Endian), specify the byte order to be used.

In a Little Endian system, a small value in an int can be correctly accessed with an overlaid char. The function WhatType() uses this relationship to display the type of integer format used by a processor:

void WhatType()
{
	int x = 1 ;
	if (*(char *) &x == 1)
		cout << "Little Endian" << endl ;
	else
		cout << "Big Endian" << endl ;
	return ;
}
Converting between Little Endian and Big Endian format is simply a matter of swapping bytes and the conversion is the same in both directions. The following example shows how an Endian conversion function could be implemented for multiple systems. It would be appropriate in applications that exchange data in Little Endian format. On a Big Endian system, the function EndianConversion() converts its parameter value to Little Endian format. On a Little Endian system, this function simply returns its input value. The macro BIGENDIAN would be defined on the command line:
typedef unsigned long UBYTE4 ;
#if defined (BIGENDIAN)
inline UBYTE4 EndianConversion (UBYTE4 
	input)
{
	UBYTE4 result = 
		((input & 0x000000FFL) << 24)
	| ((input & 0x0000FF00L) << 16)
	| ((input & 0x00FF0000L) >> 16)
	| ((input & 0xFF000000L) >> 24) ;
	return result ;
}
#else
inline UBYTE4 EndianConversion (UBYTE4 
	input)
{
	return input ;
}
#endif 
If you've ever done any Internet socket programming, you've probably used the functions htonl() and ntohl(). Internet protocols use Big Endian format to represent integers. These functions return their argument on Big Endian systems and do a byte swap on Little Endian systems.

When you need to exchange integers among different systems, you need to pick an integer format to use. One format is just as good as another. You simply need to pick one and be consistent. Use functions to convert between the exchange format and the system format.