Next: File Systems
Up: Linux & Cluster
Previous: Why Use Linux for
Contents
The term ``Linux'' is most correctly applied to the name for the Unix-like kernel, the heart of an operating system that directly controls the hardware and provides
- true multitasking,
- virtual memory,
- shared libraries,
- demand loading,
- shared copy-on-write executables,
- TCP/IP networking,
- and file systems.
The kernel is lean and mean. It contains neither an integrated Web browser nor a graphic windowing system. Linux, in keeping with its Unix heritage,
follows the rule that smaller and simpler should be applied to every component in the system and that components should be easily replaceable. However, the term ``Linux'' has also been applied in a very general way to mean the entire system, the Linux kernel combined will all of the other programs that make the system easy to use, such as the graphic interface, the compiler tools, the e-mail programs, and the utilities for copying and naming files. Strictly speaking, Linux is the kernel.
Many Linux distribution companies exist. They take the freely available Linux kernel and add an ``installer'' and all the other goodies. In fact, those companies (Red Hat, Turbolinux, Caldera, SuSE, and a host of smaller companies) have the freedom to customize, optimize, support, and extend their Linux distribution to satisfy the needs of their users.
The open source software movement has gathered both publicity and credibility over the past couple of years. Richard Stallman began work in 1984 on creating a free, publicly available set of Unix-compatible tools. He uses the term ``free software'' to describe the freedom users have to modify it, not the price. Several years later, the GNU General Public License (GPL) was released. The GPL (sometimes called the ``copyleft'') became the license for all of the GNU products, such as gcc (a C compiler) and emacs (a text editor). The GPL strives to ensure that nobody can restrict access to the original source code of GPL licensed software or can limit other rights to using the software. Probably the most important aspect of the GPL is that any modifications to GPLed source code must also be GPLed. The Linux kernel also uses a clarified GPL license.
Therefore, modifying the Linux kernel for private use is fine, but users may not modify the kernel and then make binary-only versions of the kernel for distribution. Instead, they must make the source code available if they intend to share their changes with the rest of the world.
What exactly does the kernel do? Its first responsibility is to be an interface to the hardware and provide a basic environment for processes and memory management. When user code opens a file, requests 30 megabytes of memory for user data, or sends a TCP/IP message, the kernel does the resource management. If the Linux server is a firewall, special kernel code can be used to filter network traffic. In general, there are no additives to the Linux kernel to make it better for scientific clusters-usually, making the kernel smaller and tighter is the goal.
- However, sometimes a virtual memory management algorithm can be twiddled to improve cache locality, since the memory access patterns of scientific applications are often much different from the patterns common Web servers and desktop workstations, the applications for which Linux kernel parameters and algorithms are generally tuned.
- Likewise, occasionally someone creates a TCP/IP patch that makes message passing for Linux clusters work a little better.
Since each kernel version can have different configuration options and module names, it is not possible simply to provide the
Beowulf user a list of kernel configuration options. Some basic principles can be outlined, however.
- Think compute server: Most compute servers don't need support for amateur radio networking. The list for what is really needed for a compute server is actually quite small. IrDA (infrared), quality of service, ISDN, ARCnet, Appletalk, Token ring, WAN, AX.25, USB support, mouse support, joysticks, and telephony are probably all useless for a Beowulf.
- Optimize for your CPU: By default, many distributions ship their kernels compiled for the first-generation Pentium CPUs, so they will work on the widest range of machines. For your high-performance Beowulf, however, compiling the kernel to use the most advanced CPU instruction set available for your CPU can be an important optimization.
- Optimize for the number of processors: If the target server has only one CPU, don't compile a symmetric multiprocessing kernel, because this adds unneeded locking overhead to the kernel.
- Remove firewall or denial-of-service protections: Since Linux is usually optimized for Web serving or the desktop, kernel features to prevent or reduce the severity of denial-of-services attacks are often compiled into the kernel. Unfortunately, an extremely intense parallel program that is messaging bound can flood the interface with traffic, often resembling a denial-of-service attack. Removing the special checks and
detection algorithms can make the Beowulf more vulnerable, but putting it behind a firewall is relatively easy compared with securing and hampering every node's computation to perform some additional security checks.
- Other Considerations. Many Beowulf users slim down their kernel and even remove loadable module support. Since most hardware for a Beowulf is known, and scientific applications are very unlikely to require dynamic modules be loaded and unloaded while they are running, many administrators simply compile the required kernel code into the core. Particularly careful selection of kernel features can trim the kernel from a 1.5-megabyte compressed file with 10 megabytes of possible loadable modules to a 600-kilobyte compressed kernel image with no loadable modules. Some of the kernel features that should be considered for Beowulfs include the following:
- NFS: While NFS does not scale to hundreds of node, it is very convenient for small clusters.
- Serial console: Rather than using KVM (Keyboard, Video, Mouse) switches or plugging a VGA (video graphics array) cable directly into a node, it is often very convenient to use a serial concentrator to aggregate 32 serial consoles into one device that the system administrator can control.
- Kernel IP configuration: This lets the kernel get its IP address from BOOTP or DHCP, often convenient for initial deployment of servers.
- NFS root: Diskless booting is an important configuration for some Beowulfs. NFS root permits the node to mount the basic distribution files such as /etc/passwd from an NFS server.
- Special high-performance network drivers: Often, an extreme performance Beowulf will use high-speed networking, such as Gigabit Ethernet or Myrinet. Naturally, those specialized drivers as well as the more common 100BT Ethernet driver can be compiled into the kernel.
- A file system: It is important the kernel is compiled to support the file system chosen for the compute nodes.
- Network Booting. Because of the flexibility of Linux, many options are available to the cluster builder. While certainly most clusters are built using a local hard drive for booting the operating system, it is certainly not required. Network booting permits the kernel to be loaded from a network-attached server. Generally, a specialized network adapters or system BIOS is required. Until recently, there were no good standards in place for networking booting commodity hardware. Now, however, most companies are offering network boot-capable machines in their high-end servers. The most common standard is the Intel PXE 2.0 net booting mechanism. On such machines, the firmware boot code will request a network address and kernel from a network attached server, and then receive the kernel using TFTP (Trivial File Transfer Protocol). Unfortunately, the protocol is not very scalable, and attempting to boot more than a dozen or so nodes simultaneously will yield very poor results. Large
Beowulfs attempting to use network boot protocols must carefully consider the number of simultaneously booting nodes or provide multiple TFTP servers and separate Ethernet collision domains. For a Linux cluster, performing a network boot and then mounting the local hard drive for the remainder of the operating system does not seem advantageous; it probably would have been much simpler to store the kernel on hard drive. However, network booting can be important for some clusters if it is used in conjunction with diskless nodes.
Next: File Systems
Up: Linux & Cluster
Previous: Why Use Linux for
Contents
Cem Ozdogan
2009-01-05