Bill and Lynne are the authors of the 386BSD CD-ROM and can be contacted through the DDJ offices.
Very high-speed networking has become a key component in the race to rapidly and economically deliver large amounts of information. The high visibility of the information highway, the increasing interest in multimedia applications, and the demands of high-profile public-policy issues (such as rapid and confidential access to medical information on a national scale) ensure that very high-speed gigabit networks will be implemented within the next few years, current technology notwithstanding.
In "Very High-Speed Networking" (DDJ, August 1992), we outlined a number of hardware and software approaches which could be useful in achieving the required gigabit rates. Unfortunately, very little work of practical substance has been forthcoming. Many hardware solutions--including protocol engines--have recently fallen out of vogue, primarily due to cost constraints. Changing to different transmission technologies (FDDI and SONET, for instance) has also proved difficult (replacing the infrastructure is costly), so the focus is back on improving rates on existing, copper transmission lines. Popular software solutions (such as header prediction) have been successful, but generally have been mined out.
Even though there has been a great deal of talk about gigabit testbeds, the relative lack of interface hardware has been a stumbling block. Most projects have assembled a gigabit platform by using banks of T3 (45-Mbyte) or FDDI (100-Mbyte) interfaces, since few interface standards exist in this rarefied area, but these testbeds tend to be beyond the economic reach of most software and hardware application-development groups. However, one extant standard which may be within reach--HiPPI (high-performance protocol interface) allows for a 880-Mbit link to supercomputers; see "HiPPI and High-Performance LANs" by Andy Nicholson, (DDJ, June 1993).
In this article, we'll examine two HiPPI-based projects--the PC-Supercomputer HiPPI Project and Project SIGNA--both of which utilize the 386BSD publicly accessible research software. However, any system using TCP/IP (Windows NT, for instance) can also be so modified, assuming you have the patience and access to kernel source code.
The Los Alamos National Laboratory (LANL) views supercomputer resources as a kind of "numerical" science laboratory of simulation, and PCs and workstations as the "visualization" devices (terminals) which provide rapid access to these shared resources. By placing all these computer resources (such as oddball supercomputer architectures, massive data stores, and tape/optical backup) on the same high-speed network, LANL can effectively "remove" the bottlenecks which occur in managing an information system that deals in extremely huge objects (for example, as in a plasma-reaction simulation, where data is shipped to the facility needed at the moment--even between clusters of supercomputers).
To accomplish this, Richard Thomsen, Michael McGowan, and Craig Idler of LANL developed a special HiPPI-based interface to connect these supercomputers and high-speed storage devices to the Internet at very high rates. Since these devices could not work with the Internet protocols directly, a PC running 386BSD is used to interpret the protocol headers stripped off the incoming packet, with the remaining data payload redirected to a separate HiPPI link for reliable delivery to the target hardware. The combination of dual HiPPI interfaces, 486 PC, and software effectively produces a TCP/IP protocol engine running at HiPPI rates (see the accompanying text box entitled, "The LANL HiPPI Protocol Engine Hardware").
Hardware solutions, while intriguing, are usually out of reach for most software programmers. Still, the prospect of developing a scalable network-interface technology is very desirable. Even though hardware interfaces are still evolving, the software technology, coupled with fast (100-MIPS), inexpensive processor technologies and memory systems (greater than 512K write-back caches), is now available.
SIGNA (short for "simplified Internet gigabit networking architecture") is designed as a guide to inexpensively exploring Internet gigabit-networking technologies in an inexpensive manner by running extremely high-speed protocols on an ordinary PC via 386BSD software.
The SIGNA approach currently emphasizes the most minimal of gigabit networking applications: client operation of a PC with a single application. However, when gigabit hardware interfaces become available, a SIGNA platform could allow client PCs to access supercomputer "servers" (as, for example, during image uploading and downloading) and other client PCs (such as in video teleconferencing). By dedicating PC resources to a single "bursty" application, you can essentially create "gigabit-terminal equipment."
Key considerations in the 386BSD SIGNA design included:
To guarantee real-time application response, it is necessary to add a limited real-time mechanism to the 386BSD kernel. This mechanism allows a special single process to preempt the kernel on demand. This special case carefully "violates" the UNIX model of restricted preemption to achieve a rapid response to data delivery; it is not intended as a general-purpose mechanism for real-time programs.
Extant device and driver interfaces which place the burden of buffer allocation and packet extraction on the device driver are not appropriate for gigabit-network interfaces. Gigabit-networking interfaces must cope with the fact that while processor speed is increasing, memory-system bandwidth is not keeping pace. Operations involving the most bandwidth (the packetdata payload) are costly; if you require more than a single pass over the packet, you overload the memory-system bandwidth and "get behind" in processing a packet. One way to avoid this is to use extensive amounts of memory (arranged as frame buffers) to assemble and present the link-layer packets in transit. Such memory-based devices require novel device-driver interfaces.
Finally, Internet Core protocol structures (TCP, UDP, IP, ICMP) must themselves be modified to eliminate copies and reduce checksum overhead. By operating on descriptors instead of copying the packet around during processing, you can reduce the average passes required per packet from three to one--a significant reduction in memory overhead; see Figure 3. This is done by combining the copy and checksum operations directed to protocol headers and data. The descriptors selectively reference header/data portions of the packet in place in the interface's buffer.
Header prediction can also be enhanced through a "clustering" mechanism, which synchronizes a half-duplex stream of packets. This effectively locks out other system activity during peak-rate transfers.
The LANL HiPPI project, exhibited at the Supercomputer '92 and '93 conferences, is possibly the only successful protocol-engine design ever put into operation. Even software-testbed designs (including Project SIGNA) cannot match the current speed of good protocol-engine designs due to the limitation in the memory system used by the processor itself. As such, anyone interested in getting a hands-on, operational, protocol-engine testbed should look at this design carefully. It could save a company years in design and development costs and also bring very high-speed networking that much closer to reality.
Because gigabit hardware technologies are still a matter of speculation, software-only approaches (such as SIGNA) and testbeds are more than just interesting. Both 600-Mbit ATM (MAN) and 100-Mbit Ethernet might offer affordable desktop bandwidth in the near future, while SONET scaled to multi-gigabit levels offers the possibility of metropolitan-network interconnections. Even HiPPI, originally a supercomputer mass-storage interface, has been demonstrated as a network-interconnect standard. With the recent standardization of the HiPPI serial standard, the cost of implementation has lowered drastically.
While gigabit networking is considered solely the province of the data industry, knowledge of telephony techniques provides insight into design considerations and constraints. In fact, both the SIGNA and LANL HiPPI testbeds could be viewed simply as a gigabit-terminal equipment. In addition, new gigabit-networking technologies must rely on switching technologies instead of routing technologies, since the data rates required prohibit the delay imposed by the interim retransmission of a packet.
The inevitable reunion of the data-networking and telecommunications industries will be spurred on by the demand for global very high-speed gigabit networking, although probably not in the manner either of these industries have separately forecast. Ironically, the experts most suited to leading the charge are at risk of being most blind to these new possibilities, since they are used to seeing them only in terms of their respective disciplines.
In the meantime, hardware projects like LANL's HiPPI project and software-testbed engines like SIGNA will provide us with the knowledge and experience needed when very high-speed networking solutions become available. Perhaps they will encourage entrepreneurs from both industries to take the initiative and offer ad hoc solutions, creating a whole new information industry. In any case, the demand for very high-speed networks is real, and that demand will be satisfied--one way or another.
The LANL protocol engine (see Figure 1) consists of two CBI (crossbar interface) cards attached to an ordinary EISA PC. Each CBI card has two unidirectional HiPPI ports (one input, one output), each used to manage one half circuit of the communications between an Internet network and a non-Internet-capable application host. Only data and requests for Internet service flow across the application link, and only Internet-protocol (IP) datagrams appear on the network link. It is the sole responsibility of the PC to handle the transformation of the application's requests into appropriate Internet-protocol operations without ever seeing the application's data (just handling pointers to the data only). In this case, the PC is the actual Internet host which operates on behalf of the external host computer.
The key to this architecture is the design of each CBI (see Figure 2), which is built around a large (4-Mbyte) block of video RAM (VRAM). The VRAM has three ports: two serial (one in and one out) for receiving and transmitting HiPPI, and one parallel, bidirectional port that allows the PC to access TCP/IP header and HiPPI Link Layer information. Each board has a port on the network and a port connected to the application host (which runs the network application connected to the network). The data is buffered between the network and the application host solely in the VRAM while the PC arranges the details of the network transfer.
While the roles of application and network are split between two hosts, you could design a delivery mechanism to the application running on the same PC (sort of a "socket protocol engine" for the particular application program) if necessary. This approach can also be used on a single PC or workstation.
By stratifying the design of protocol processing into scalable sections, you can cope with any degree of bandwidth on a networking implementation. Given the rate of technology change, switching a gigabit per second between computers will be routine in less than a decade.
The choice of a PC/supercomputer connection presented some novel problems which had to be resolved to make the LANL HiPPI project fly. One of the most critical issues dealt with the rate of information itself: While a supercomputer has no trouble churning out TCP/IP in order to source a HiPPI link, how could a PC handle it? The secret was to decouple the overhead of the data payload from the protocol processing so that the overhead per packet is fixed, regardless of the size of the packet. Assuming maximum packet size of 64 Kbytes (219 or 512 Kbits), a packet rate of 211 or 2048 per second would be necessary to support a data rate of a gigabit (230 bits per second).
Since these packet rates are achievable with a carefully tuned PC Internet implementation, the real key to high-speed networking is to find ways to scale packet-data payload delivery. The LANL CBI project addresses this through clever hardware design. The TCP protocol has two requirements on its data payload: a delivery requirement and a checksum across the span of both the payload anda special, pseudo-protocol header. A hardware-checksum mechanism offloads from the networking implementation a portion of the protocol processing that increases with packet payload.
A second hardware mechanism eliminates the remaining payload overhead from delivering the data to the application. Essentially, the PC never touches the data inside the packets--it merely manages the association of hardware data-buffer pointers between the two interfaces. The PC simply does the bookkeeping of the protocol, which is the same whether the packet is 64 bytes or 64 Kbytes.
--W.F.J.
More information on the LANL HiPPI Project, including documentation on the CBI, is available via ftp at the Internet site ftp.lanl.gov in the /pub/cbi directory. For information on Project SIGNA, 386BSD, or pointers to further information about the LANL HiPPI Project, please send e-mail to wjolitz@cardio.ucsf.edu.
Figure 1 The LANL protocol engine. Figure 2 The design of each CBI. Figure 3 Reducing the average passes required per packet; (a) three pass; (b)one pass.
Copyright © 1994, Dr. Dobb's Journal