Apple Developer Connection
Advanced Search
Member Login Log In | Not a Member? Contact ADC

Xgrid: High Performance Computing for the Rest of Us

James Reynolds considered the unused power available in the 350 Power Macintosh computers in the University of Utah's Student Computing Labs. Each night when the labs would close, the computers would sit idle. He and his co-workers wanted to tap into this potential power, but implementing a distributed computing system had not been a priority.

Then Apple's Xgrid came along, and within a few days, those Power Macs were no longer idle.

With the Xgrid technology preview, James explains, "I was just curious and one night I tried it." By the end of the night he had a proof of concept up and running and by the end of the week he was using Xgrid to render frames for an animation he was working on. "I like Xgrid so much because it is so easy," James says.

Simple grid computing was the design goal for the Apple Advanced Computation Group. They spent the last two years looking at a modern way to make it easy to create and use an ad hoc cluster. The result of their efforts—Xgrid—allows almost anyone to easily run a set of calculations on many machines using machine-dependent parameters. You can keep your focus on the science and mathematics and not distract yourself learning the details of setting up a network of computers. You don't have to become a system administrator and set up user accounts and manage the network topology. With Xgrid you click a few buttons, make a modification here or there to your application, and take advantage of the extra power of the grid.

James used his experience with rendering software together with information he found in an article by Dr. Daniel Côté (see below for more from Dr. Côté) on using a graphics application called a POV-ray renderer (for Persistence of Vision Raytracer) with Xgrid. He modified the generate script to use parameters suggested by the POV-ray team, and ran tests by rendering the POV-ray benchmark graphic. The job took 36.5 minutes using one of the two processors on a dual processor G5 2.0GHz with 1 GB of RAM. Running it on a grid consisting primarily of Power Mac G4/400's reduced the time to 26 minutes. Adding the G5 to the grid reduced the total time to 24 minutes.

With the benchmark out of the way, James turned to his goal of producing an animation of LDraw models. All together there were 180 jobs each of which consisted of rendering about 26 frames. He found it easy to experiment with Xgrid until he got his application right. A little tweak here or there and James smiles and repeats, "It's so easy. This isn't a tool designed for a dedicated cluster like Virginia Tech. If you have computers that are designed to do something other than distributed computing, and you want to run a job on multiple machines, Xgrid is perfect."

Getting Started with Xgrid

The reaction from early users echoes Reynold's comment of how easy Xgrid is to set up and use. Step one is to download Xgrid and install it on a single machine. Step two is to install Xgrid on other machines and configure them to be available for grid calculations. There is no step three. Xgrid is designed to take care of most of the details of distributing your calculation so that you can concentrate on the problem you are trying to solve.

You can download Xgrid Technology Preview 2 on the Apple Advanced Computation Group Xgrid page. To test it out, try running Xgrid on a single machine; this will give you a taste of the three roles that are involved in every Xgrid application: Agent, Controller, and Client.

The Agent is the worker bee in an Xgrid system. Each agent makes itself available to a single controller. It receives computational tasks and returns the results of these computations to the controller. Once you have installed Xgrid, you will find a new Xgrid item in your System Preferences. This is where you get to make key decisions about the agent. You can set the agent to accept tasks when the computer is idle in the same way that a screensaver only runs when the computer is not being used. If the calculation of tasks being submitted to the grid is more important than tasks performed by a user sitting at the machine, you can set the agent to be a dedicated agent that always accepts tasks. The remaining configuration options allow you to choose whether the agent will bind to a specific location, service name, or just to the first available service. You can also require the controller to provide a password.

The Controller is the middle man. It receives jobs from clients and breaks the jobs into tasks that it then sends to agents. It then receives the results of the calculations from the agents and sends the information back to the client. You can use the System Preferences to set the password that the controller uses to communicate with agents and that the clients use to communicate with the controller. You can think of the controller as defining the Xgrid system. Each grid can only have a single controller but can have multiple clients and certainly multiple agents. Since each agent is associated with only one controller at a time, you can think of the grid as revolving around the controller.

The Client submits a job to the grid by communicating with a controller. Once you have installed Xgrid you can use either the Xgrid.app for a GUI version of the client or you can open a Terminal window and use the Xgrid command-line version. For more details on the command-line version you can check the man pages by typing man xgrid from the command line. The clients use either method to originate jobs and to receive the results.

The Xgrid workflow is summarized in Figure 1: Xgrid Workflow. Jobs originate with the client and are split into tasks and sent on to agents by the controller. The completed tasks are returned to the controller which collects them and reports back to the client.

Xgrid Workflow

Figure 1: Xgrid Workflow

Cooking with Xgrid

After downloading and installing Xgrid, you are a few button presses away from having it up and running. From the System Preferences start an agent and a controller. Double-click the Xgrid application and choose "Start Local Service". Select the job type "Mandelbrot" and your machine will act as client, controller, and agent to render different regions of the famous Mandelbrot fractal image. You can view the combined CPU power available on all the machines by watching the needle on the Tachometer (see Figure 2: Xgrid Tachometer at Reed College). To add more power, you will have to add more agents. The Mandelbrot example is an ideal Xgrid job. Each agent is working independently on the same problem with different parameters.

Xgrid Tachometer at Reed College

Figure 2: Xgrid Tachometer at Reed College

A first step in identifying potential candidate applications that can benefit from Xgrid is to distinguish serial calculations from parallelizable calculations. Imagine you are given a frozen chicken to cook and a recipe that basically consists of these three steps: defrost the chicken, marinate it, and roast it. Each step must be done in sequence. Xgrid doesn't help in these cases of serial calculations. On the other hand, imagine you are preparing a simple vegetable platter where you need to cut up carrots, celery, and cucumber. It doesn't really matter which order you choose to cut up your vegetables. In fact, if you have enough counter space, cutting boards, knives, and friends you may speed up this task dramatically. This second recipe is clearly parallelizable and may benefit from more people working on it.

Whether you are cooking or programming, there are other issues to think about before deciding on a distributed approach. If you are only cutting up enough vegetables for your own lunch then it may not be worth your while to gather friends and set up and later clean the extra cutting boards and knives. If you are cutting up enough vegetables to serve at lunch at this year's WWDC then parallelizing the task is clearly a good idea. The size of the job, the set up required for each task, and the management costs for distributing the task are all factors. Especially if you want to use your friend's kitchen across the street; sometimes the time it takes to distribute the job (communication latency) overwhelms the benefits of parallelization.

Another consideration is how tightly coupled the tasks are. As another food example, it may be important in a restaurant that all dinner orders are delivered to a table at the same time. In this case the "client" is the waiter who delivers a ticket full of orders to a "controller" in the kitchen. The controller then calls out the order and various agents start to work. The salad station may start to work on the Caesar salad while the grill station starts the steak. Each agent knows how to perform their particular task and how long it takes. By communicating with each other they can ensure that the meals are ready within a few seconds of each other. Tight coupling can refer to interdependencies or to constraints common to two tasks. With Technology Preview 2, you can use LnxMPI Library (a Carbon-free, sockets-based, Message Passing Interface implementation based on MacMPI from UCLA's Project AppleSeed) to communicate among the agents. This was added in response to requests after the initial technology preview.

The Monte Carlo Method—“Embarrassingly Parallel”

At the Ontario Cancer Institute at the University of Toronto, Dr. Daniel Côté has been using Xgrid to reduce the time required to perform a Monte Carlo simulation made up of billions of calculations. This technique consists of calculations that are parallelizable and loosely coupled. Côté reported, "I have been using Xgrid since the beginning, and it has been a breeze to set up. For a technology preview, this is great." Côté works in the area of medical biophysics known as biophotonics. He explains that the best way to make predictions about the tissue he is studying is to send in one photon at a time and calculate where it goes and to score it. When you do this between ten million and one billion times, you can begin to see a picture of the tissue you are investigating.

You may have performed a simple Monte Carlo experiment when you were a student to estimate the value of pi. There the experiment consists of placing a circle of radius one inside of a two by two unit square. You pick a random point inside the square and determine whether or not it is inside the circle. If you repeat this experiment thousands of times and divide the number of points that fall inside the circle by the total number of points you will approximate pi/4. Classroom teachers often can have each student perform the experiment one hundred times and then aggregate their results to get an estimate of pi based on more readings.

In Côté's work the key to parallelization is the complete independence of the results of each photon from all past and future events. Calculating the results for one billion photons can take a very long time. Côté explained that on a single Power Mac G4 this job takes between two and three days. "With Xgrid," he says, "you take your job and send it to ten computers in ten smaller chunks and you get the job done in under six hours. The calculation is linear, so it is easy to split it up and send it to more machines. Because the problem is 'embarrassingly parallel' (some use the moniker "embarrallel"—editor), it is the kind of calculation that Xgrid is well suited for. The time reduction is almost proportional to the number of machines involved in the calculation."

Côté would love to harness even more computing power. He smiles, referring to the Virginia Tech Supercluster, and says, "I would love to have one thousand Xserves available but for now I could have a maximum of 20 or 30 Macs. We have access to tons of Linux, SunOS, and Alpha machines here or on our collaborator's networks. "

For now that means that much of Côté's calculations must be performed on these other platforms using other solutions. He explained that "If you use other computers, controlling who will do what task and with what parameter and monitoring the status is extremely annoying, painful, complicated, and error prone. You often have to set it all up yourself, and it is hard. On Mac OS X, Xgrid bypasses all of that. Apple was smart enough to build Xgrid on top of open source protocols such as BEEP. This should be easy to implement on other machines."

Côté says, "I look forward to sitting at my Mac and controlling all of the machines in the institution and sending them jobs through a nice interface. That would be paradise. Mac OS X is already shining in an academic environment, because it integrates extremely well with all of the Mac and Unix software out there. Xgrid makes that even better. In addition to being the digital hub, Mac OS X could become the High Performance Computing hub."

Using Spare Cycles

While thousand-node dedicated clusters like the one at Virginia tech are unusual, five- and ten-node clusters for bioinformatics or other research problems are being more common. The agents can be dedicated to performing grid tasks first or they can operate as screen saver agents that only accept grid calculations when the computer is otherwise idle. "I am thrilled to see a computer being used even where there's no one sitting at the machine," says Ethan Benatan, Reed College's Director of Computer User Services. See Figure 3: The Cluster at Reed College. At the Center for Advanced Computation at Reed College, Xgrid is being used for nonlinear-system computations. A centerpiece of this work is the evolution of a very complex, discrete epidemic model. It is estimated that 10^18 (or, 1 quintillion) operations will be required to resolve certain questions about survivor sets in such epidemiological scenarios. Xgrid will be able to achieve this computation effort in about one year, with a facility full of Power Mac G5's.

The Cluster at Reed College

Figure 3: The Cluster at Reed College

Benatan explained another component to Reed's early Xgrid investigations. They were investigating whether Xgrid would work well on public stations taking advantage of idle time on a computer. "This opens up a new realm of usability," Ethan explained. "We wanted to see if we can put Xgrid in our public labs or clusters and make sure it works with no impact on the day-to-day users. It worked out great. A user could sit at a machine and not know that they were working on a machine that was part of a grid. "He says that they installed Xgrid on machines known to students as a test wing and no one reported noticing any drop in performance. They are a small college with seventy-five Power Mac G4's that typically spend two-thirds to three-quarters of their time doing nothing. Xgrid allows these spare cycles to be used in much the same way that projects such as SETI@home and Folding@Home use spare cycles to search for extraterrestrial intelligence or protein folding while the screensaver is active.

Mathematical researchers led by Dr. Peter Borwein at Simon Fraser University in Burnaby, British Columbia have used Xgrid in their exploration of the well-known, difficult problem of finding low autocorrelation binary sequences. With the help of Xgrid, the group has harnessed the computing power of machines in student labs at the university to create a system capable of processing at more than 30 GHz.

Recordbreaking results produced with the Xgrid system will be highlighted in an upcoming academic publication. Borwein says, "Xgrid has given us a tremendous number of essentially free and easy to cluster computer cycles on our lab Power Macs. It is a very efficient way to exploit our resources '24/7' on interesting research problems."

You can read more about the problem being solved on Joshua Knauer's LABS web page on Least Correlated Binary Sequences. Each calculation is straightforward. You take a sequence of ones and negative ones and add up products of pairs of numbers that occupy the same position. You do this while shifting one of the sequences over and add up the squares of these results to calculate the energy of the sequence. This number is then used to calculate the merit factor which is a measure of the self similarity of a sequence. Although none of the calculations are difficult, there are a large number of them to make and these calculations are easily split among many agents.

Simon Fraser's Noah Adams says, "We didn't have to do very much to move from the version that ran on a single machine to the version that ran on Xgrid. Mainly we had to bring in random seeds. You can't really use time as your seed when you are starting 100 jobs at the same time. Xgrid supports providing this random seed. We've seen the tachometer peak at around 60 GHz while running on a grid of 700MHz eMacs."

These are just some of the projects that Xgrid is being used for. Others include a FORTRAN-based jet noise prediction, code "Jet3D," tested at NASA Langley Research Center in Hampton, Virginia. The project has been run across a distributed cluster of Power Mac G5, Power Mac G4, and Xserve G4 systems using the Xfeed job submission interface of Xgrid. A total of eight G4 and two G5 processors were run, resulting in performance of approximately 32 GFLOPS.

Another Xgrid project is at NASA Langley Research Center's AAAC/Configuration Aerodynamics Branch, where Dr. Craig Hunter helped to Alpha test Xgrid. Hunter's main contribution was "to validate that Xgrid worked to run FORTRAN-based software in a typical research environment and provide feedback to the developers."

These projects should give you some idea of what you might be able to do with Xgrid.

Creating Your Xgrid

Look around at the CPUs that you would put to use if you had an easy way of harnessing that unused power. Consider the problems you work on each day. If you have scientific computation that is loosely coupled and embarrassingly parallel, Xgrid can provide you with the framework to distribute your jobs among all of the available machines. Then take another look at the tachometer from Reed, and consider what you could do with 30 GHz of power.

For More Information

Updated: 2005-03-09