Our 1,000-Pentium Beowulf-style cluster computer system consists of a server and 1,000 Pentium II 350-MHz processors. It is constructed entirely from "Commodity Off The Shelf" (COTS) components.
Each of the 1,000 processors of our parallel computer system has a Pentium II 350-MHz processor. Each processor uses 64 megabytes of RAM (so that the system as a whole has 64 gigabytes of RAM).
The 1000 Pentium II processors reside on 500 dual-CPU Tyan Tiger 100 (ATX form factor) motherboards and each motherboard is housed in a standard mini-tower box. Each mini-tower box contains 128 megabytes of RAM, a 100 megabit-per-second Intel PRO/100+ Ethernet Network Interface Card (NIC), and a standard 300W power switching power supply. There is no hard disk, video monitor, keyboard, floppy disk drive, or other input-output device associated with any of the 1,000 processors. The processors run the GNU / Linux operating system (Red Hat Linux 6.0). The communication between processors and between the server and the processors is by means of 100 megabit-per-second Ethernet. Each group 40 processors (20 boxes) is connected to one 24-port 100 megabit-per-second Ethernet hub. There are 25 hubs in the system. Each hub is connected on its uplink to one of two 100 megabit-per-second 16-port Ethernet switches. The two switches are connected to each other and to the server.
The server computer is also a dual Pentium II 350-MHz processor. The server has 256 MB of RAM. It runs on the GNU / Linux operating system (Red Hat Linux 6.0 from VA Linux Research). The server contains a 14 GB hard disk, a video display monitor, a floppy disk drive, a CD ROM drive, and a keyboard.
Our parallel genetic programming application views the 1,000 processors as if they were arranged in a toroidal network. Each processor communicates only with the server and four toroidally nearest neighbors (North, East, South, West). Except for occasional trace messages, the sole type of communication from the processors to the server during a run of genetic programming is the end-of-generation message. The end-of-generation message contains the best individual of the current generation (and certain additional statistics about the progress of the run). The sole type of inter-processor communication consists of the migration of a certain small number of individuals from one processor to each of its four toroidally adjacent processors. The 1,000 processors are logically arranged into a 25 by 40 toroidal grid. Each group of 40 processors that share a hub are arranged into one 8 by 5 rectangle within the larger grid. Because of the toroidal communication between processors and their neighbors, most of the traffic for a given group of 40 processors is local to its group.
Approximately half of 64 MB of RAM associated with each of the 1,000 processors is available for the storage of the population. The remainder of the memory is used to house the operating system (shared by the two processors on a given motherboard), the GP application software, and various buffers used for exporting and importing individuals, and other items of overhead. For genetic programming, a population of 10,000 individuals, each occupying 3,000 bytes of RAM, can be accommodated with 30 MB of RAM. Alternately, 30,000 individuals, each occupying 1,000 bytes of RAM, can be accommodated with 30 MB of RAM. Using the one-byte-per-point method of storing individual program trees in genetic programming, each 3,000-byte individual in the population can possess 3,000 points (functions or terminals) and each 1,000-byte individual in the population can possess 1,000 points. A 1,000-node system can therefore accommodate a population of 10,000,000 3,000-byte individuals or 30,000,000 1,000-byte individuals. If, for example, a fitness evaluation consumes 0.10 seconds, then evaluating the fitness of 10,000 individuals will consume 1,000 seconds. This means that a generation will require 0.25 hours and, consequently, 96 generations can be executed per day.
The 100 megabit-per-second Ethernet is more than sufficient to handle the migration of individuals in most practical runs of genetic programming using the island model. Migration usually occurs at a rate of approximately 2% in each of four directions on each generation for each processor. If the population size is 10,000 3,000-byte individuals at each processor and 2% of the population migrates in each of four directions, then communication of 800 individuals (2.4 MB) is required for every generation for each processor. If one generation takes about15 minutes (900 seconds), this amounts to transmission of 2,666 bytes (about 2 kilobits) per second for each processor. Even without considering the hierarchical arrangement of the Ethernet and the localization of communication, total communication activity thus amounts to only about 2 megabits per second. This inter-node communication for migration obviously does not tax a 100 megabit-per-second Ethernet. Similarly, the end-of-generation messages from each processor (which consists of about 10,000 bytes each and occur only once per generation) do not tax the 100 megabit-per-second Ethernet.
The use of a Commodity Off-The-Shelf (COTS) mini-tower case solves the electromagnetic emission problems associated with a 350 MHz processor as well as the heat dissipation requirements associated with the chip. The use of standard cases does not minimize the space occupied by the system; however, it provides a highly cost-effective solution to the emission and heat problems. The standard 300 watt power supplies (produced and priced as a commodity product) are similarly more cost-effective than a centralized power supply would be. Each case has four fans (one for each processor chip, one for the power supply, and one for the case). The fan on each processor contains a sensor that shuts down the node if it fails.
An Ethernet ("dumb") hub is sufficient for handling the communication requirements of each group of 40 processors. Hubs are Bay Stack 100 from Bay Networks. The two ("smart") switches are Bay Networks Bay Stack (Nortel) 350T 16-port 10/100 BT Ethernet switches.
We use a large central uninterruptable power supply (UPS) providing 15 minutes of support for the system. Fifteen minutes bridges all but the most severe (and most occasional) power outages. Our application is not mission-critical, so we need backup only to prevent very brief power outages from ruining a run. For picture of uninterruptable power supply (UPS) for new 1000-Pentium computer.
The GNU / Linux operating system is, by far, the most common operating system used on individual nodes of Beowulf-style parallel computer systems (whether the nodes are Alpha processors, Pentium processors, or other processors). This operating system is free. In addition, our experience is that it is remarkably robust. Between May 1998 and July 1999, we have operated a 70-node system (arranged in essentially the same manner as described in this paper and composed of 70 533 MHz Alpha processors) for 14 months on a 24-hour / 7-day-per-week basis with only three crashes (for whatever reason).
Since the main requirement for memory in genetic programming work is storage of the population and the relatively small genetic programming application code, we chose not to have hard disks at each processor. The relatively small size of the GNU / Linux operating system obviates the need for disk storage at each processor. Hard disks present potential reliability problems. The existence of hard disks (and the consequent state of the system) add complication to the system (for which we would receive no benefit, given our particular application).
The system is booted using a DHCP message from the server to the 1,000 processors. We used Beoboot software from Rembo Technology Sarl of Switzerland. The 500 boxes directly access the server's file system using NFS.
The server receives the end-of-generation reports from each processor. It creates an output file containing statistics about the run and all pace-setting individuals (i.e., best-of-generation individuals that are better than any previously seen individual). This output file is stored on the hard disk of the host computer. The size of this output file is not large because only pace-setting individuals are retained.
A 1,000-node system generates a noticeable amount of heat and air conditioning (two 25-ton air conditioners) are required to keep the system at 70 degrees.
The 1,000-Pentium system operates at an aggregate clock rate of 0.350 Tera-Hertz. The SPECfp95 rating of microprocessors is produced by the Standard Performance Evaluation Corporation (SPEC), a non-profit group of computer vendors, system integrators, universities, and research organizations. The SPECfp95 rating is designed to provide measures of performance for comparing computationally intensive floating-point workloads on different computer systems. The SPECfp95 rating measures the floating-point performance of a computer’s processor, memory architecture, and compiler using a suite of existing application and benchmark source code running across multiple platforms. The SPECfp95 rating of a 350-MHz Pentium II processor is about 12, so the 1,000-Pentium system delivers about 12,000 SPECft95. Since less than 1% of the processors cycles are consumed by the low-bandwidth migration and communications tasks of the parallel genetic programming algorithm, the system as a whole delivers essentially this full amount of computation.
The 1,000-Pentium machine was assembled by Stan Fox of the COMPAQ Sunnyvale Staging Center.
The 70-Alpha machine was provided by Stephen Gaudet of DCG Computers of Londonderry, New Hampshire.
Design and contracting of site for 1000-Pentium computer by Gordon Prill Inc. of Mountain View, California.
Additional information on genetic programming can be found in the book Genetic Programming III: Darwinian Invention and Problem Solving and its accompanying videotape. Visit www.genetic-programming.org for citations and links to numerous other authored books, edited collections of papers, conference proceedings, and individual published papers concerning genetic programming.
Bennett, Forrest H III, Koza, John R., Shipman, James, and Stiffelman, Oscar. 1999. Building a parallel computer system for $18,000 that performs a half peta-flop per day. In Banzhaf, Wolfgang, Daida, Jason, Eiben, A. E., Garzon, Max H., Honavar, Vasant, Jakiela, Mark, and Smith, Robert E. (editors). 1999. GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, July 13-17, 1999, Orlando, Florida USA. San Francisco, CA: Morgan Kaufmann. Pages 1484 - 1490.
Techniques of evolutionary computation generally require significant computational resources to solve non-trivial problems of interest. Increases in computing power can be realized either by using a faster computer or by parallelizing the application. Techniques of evolutionary computation are especially amenable to parallelization. This paper describes how to build a 10-node Beowulf-style parallel computer system for $18,000 that delivers about a half peta-flop (1015 floating-point operations) per day on runs of genetic programming. Each of the 10 nodes of the system contains a 533 MHz Alpha processor and runs with the GNU / Linux operating system. This amount of computational power is sufficient to yield solutions (within a couple of days per problem) to 14 published problems where genetic programming has produced results that are competitive with human-produced results.
Click here for Postscript copy of this GECCO-99 conference paper on "Building a parallel computer."
Forrest H Bennett III
Stan Fox of COMPAQ Computers
Ray Prill of by Gordon Prill Inc. of Mountain View, California
VA Linux Research of Mountain View, California
· The home page of Genetic Programming Inc. at www.genetic-programming.com.
· For information about the field of genetic programming in general, visit www.genetic-programming.org
· Information about the 1992 book Genetic Programming: On the Programming of Computers by Means of Natural Selection, the 1994 book Genetic Programming II: Automatic Discovery of Reusable Programs, the 1999 book Genetic Programming III: Darwinian Invention and Problem Solving, and the 2003 book Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Click here to read chapter 1 of Genetic Programming IV book in PDF format.
· For information on 3,198 papers (many on-line) on genetic programming (as of June 27, 2003) by over 900 authors, see William Langdon’s bibliography on genetic programming.
· For information on the Genetic Programming and Evolvable Machines journal published by Kluwer Academic Publishers
· For information on the Genetic Programming book series from Kluwer Academic Publishers, see the Call For Book Proposals
· For information about the annual Genetic and Evolutionary Computation (GECCO) conference (which includes the annual GP conference) to be held on June 26–30, 2004 (Saturday – Wednesday) in Seattle and its sponsoring organization, the International Society for Genetic and Evolutionary Computation (ISGEC). For information about the annual NASA/DoD Conference on Evolvable Hardware Conference (EH) to be held on June 24-26 (Thursday-Saturday), 2004 in Seattle. For information about the annual Euro-Genetic-Programming Conference to be held on April 5-7, 2004 (Monday – Wednesday) at the University of Coimbra in Coimbra Portugal.
Last updated on August 27, 2003