Data Representation: Byte order and alignment

Some low-level issues that you may encounter in lab 4b are byte order and alignment.

Byte order

CPUs represent numbers in different ways in their internal registers. The way they store a 2-byte (short integer) or four byte (long) integer defines the category of the machine: big endian or little endian. The category is called the host byte order. A computer that stores the most significant byte of data first, is a big endian machine. Examples of big endian machines are Sun sparcs, Motorolas (eg 68000), and PowerPC.

The computer running Kattis is a Sun Sparc, that is, big endian.

A computer that stores the least significant byte first is a little endian machine. Examples of little endian machines are those based on Intel microprocessors such as in typical PC:s.

Therefore, if you try your program on a PC and submit to Kattis, you may get endian problems.

The TCP/IP protocols uses big endian byte order as network byte order when it encodes the data into messages. A little endian machine therefore has to translate between network byte order and its internal representation. For portability reasons, all hosts usually always translate between host and network byte orders, so that the same code can be used on different targets. If the code is used on a big endian machine, the translation is a no-operation.

There are a set of functions doing byte order transformations:

  1. ntohs - network to host byte order for a short integer (2 bytes)
  2. ntohl - network to host byte order for a long integer (4 bytes)
  3. htons - host to network byte order for a short integer.
  4. htonl - host to network byte order for a long integer.

Alignment

Different processor targets may also have different word alignment restrictions. Typically, a two or four byte integer should start on a four-byte address in order to be fetched correctly over a bus. An access of a misaligned memory element may cause a bus error.

Intel/AMD processors (eg a PC) are typically more tolerant of alignment problems than others. The SUN Sparc that kattis runs on is quite picky about alignment!

Therefore, the software may need to adjust the contents of a message, or copy it into a new location, before interpreting it.

A typical example of alignment problems in networking is an Ethernet header followed by an IP header. If the 14-byte ethernet header is correctly aligned on a four-byte boundary, the IP header will not be (since it starts 14 byte after the Ethernet header).

Due to alignment and byte order problems, it is necessary to translate protocol headers between the network representation (a sequence of bytes) to a structured data type (eg a C-struct). The process of encoding a structured data type into a sequence of bytes in a message is sometimes called marshaling. The reverse process, of unpacking data from its network representation to one interpretable by a host is called unmarshaling.

In the lab, it is recommended to create C structures for the Ethernet and IP headers. Then there are essentially two approaches to handle the translation between the raw packet data and the structured information:

  1. Use the unstructured raw data, and typecast it into structured information. The fields are then accessed by byte- swapping functions. This is easiest but has limitation for handling misaligned data. However, the x86 processors are forgiving when it comes to accessing misaligned data,but not other architectures. Example: char *buf; /* This is the raw data */ iphdr *ip; /* This is the structured type */ ip = (iphdr *)buf; /* Typecast */ ip->len = ntohl(ip->len); /* Fields may be accessed via the structured type after byte swapping */
  2. Copying and translating the raw data into separate structures. The fields can then be accessed without type conversions. This is more complex but is a more general method.

One example of C-types are uint8_t, uint16_t and uint32_t to represent 1- 2-byte and 4-byte unsigned integers, respectively. It is recommended that you use these types when defining fields in protocol headers, for example.