
James Reynolds considered the unused power available in the 350 Power Macintosh computers
in the University of Utah's Student Computing Labs. Each night when the
labs would close, the computers would sit idle. He and his co-workers wanted to tap
into this potential power, but implementing a distributed computing
system had not been a priority.
Then Apple's Xgrid came along, and within a few days, those Power Macs were no longer idle.
With the Xgrid technology preview, James
explains, "I was just curious and one night I tried it." By the end
of the night he had a proof of concept up and running and by the end of
the week he was using Xgrid to render frames for an animation he was
working on. "I like Xgrid so much because it is so easy," James says.
Simple grid computing was the design goal for the Apple Advanced Computation Group. They
spent the last two years looking at a modern way to make it easy to
create and use an ad hoc cluster. The result of their
efforts—Xgrid—allows almost anyone to easily run a set of
calculations on many machines using machine-dependent parameters. You can keep your focus
on the science and mathematics and not distract yourself learning the
details of setting up a network of computers. You don't have to become
a system administrator and set up user accounts and manage the network
topology. With Xgrid you click a few buttons, make a modification here
or there to your application, and take advantage of the extra power of
the grid.
James used his experience with rendering software together with information
he found in an article by Dr.
Daniel Côté (see below for more from Dr. Côté) on
using a graphics application called a POV-ray renderer (for Persistence of
Vision Raytracer) with Xgrid. He modified the generate script to use parameters
suggested by the POV-ray team, and ran tests by rendering the POV-ray benchmark
graphic. The job took 36.5 minutes using one of the two processors on a dual
processor G5 2.0GHz with 1 GB of RAM. Running it on a grid consisting primarily
of Power Mac G4/400's reduced the time to 26 minutes. Adding the G5 to the grid reduced
the total time to 24 minutes.
With the benchmark out of the way, James turned to his goal of producing an
animation of LDraw models. All together there were 180 jobs each of which
consisted of rendering about 26 frames. He found it easy to experiment with
Xgrid until he got his application right. A little tweak here or there and James
smiles and repeats, "It's so easy. This isn't a tool designed for a dedicated
cluster like Virginia Tech. If you have computers that are designed to do
something other than distributed computing, and you want to run a job on multiple
machines, Xgrid is perfect."
Getting Started with Xgrid
The reaction from early users echoes Reynold's comment of how easy
Xgrid is to set up and use. Step one is to download Xgrid and install
it on a single machine. Step two is to install Xgrid on other machines and
configure them to be available for grid calculations. There is no step
three. Xgrid is designed to take care of most of the details of distributing
your calculation so that you can concentrate on the problem you are trying to solve.
You can download Xgrid Technology Preview
2 on the Apple Advanced Computation
Group Xgrid page. To test it out, try running Xgrid on a single machine; this will give you a taste of
the three roles that are involved in every Xgrid application: Agent, Controller,
and Client.
The Agent is the worker bee in an Xgrid system. Each agent makes itself
available to a single controller. It receives computational tasks and returns
the results of these computations to the controller. Once you have installed
Xgrid, you will find a new Xgrid item in your System Preferences. This is where
you get to make key decisions about the agent. You can set the agent to accept
tasks when the computer is idle in the same way that a screensaver only runs
when the computer is not being used. If the calculation of tasks being submitted
to the grid is more important than tasks performed by a user sitting at the
machine, you can set the agent to be a dedicated agent that always accepts
tasks. The remaining configuration options allow you to choose whether the agent
will bind to a specific location, service name, or just to the first available
service. You can also require the controller to provide a password.
The Controller is the middle man. It receives jobs from clients and breaks the
jobs into tasks that it then sends to agents. It then receives the results of
the calculations from the agents and sends the information back to the client.
You can use the System Preferences to set the password that the controller uses
to communicate with agents and that the clients use to communicate with the
controller. You can think of the controller as defining the Xgrid system. Each
grid can only have a single controller but can have multiple clients and
certainly multiple agents. Since each agent is associated with only one
controller at a time, you can think of the grid as revolving around the
controller.
The Client submits a job to the grid by communicating with a controller. Once
you have installed Xgrid you can use either the Xgrid.app for a GUI version of
the client or you can open a Terminal window and use the Xgrid command-line
version. For more details on the command-line version you can check the man
pages by typing man xgrid from the command line. The clients use either
method to originate jobs and to receive the results.
The Xgrid workflow is summarized in Figure 1: Xgrid
Workflow. Jobs originate with the client and are split into tasks and
sent on to agents by the controller. The completed tasks are returned to the
controller which collects them and reports back to the client.

Figure 1: Xgrid Workflow
Cooking with Xgrid
After downloading and installing Xgrid, you are a few button presses away
from having it up and running. From the System Preferences start an agent and a
controller. Double-click the Xgrid application and choose "Start Local Service".
Select the job type "Mandelbrot" and your machine will act as client,
controller, and agent to render different regions of the famous Mandelbrot
fractal image. You can view the combined CPU power available on all the machines by watching the
needle on the Tachometer (see Figure 2: Xgrid Tachometer at Reed College). To add more power, you will have to add more agents.
The Mandelbrot example is an ideal Xgrid job. Each agent is working
independently on the same problem with different parameters.

Figure 2: Xgrid Tachometer at Reed College
A first step in identifying potential candidate applications that can benefit
from Xgrid is to distinguish serial calculations from parallelizable
calculations. Imagine you are given a frozen chicken to cook and a recipe that
basically consists of these three steps: defrost the chicken, marinate it, and
roast it. Each step must be done in sequence. Xgrid doesn't help in these cases
of serial calculations. On the other hand, imagine you are preparing a simple
vegetable platter where you need to cut up carrots, celery, and cucumber. It
doesn't really matter which order you choose to cut up your vegetables. In fact,
if you have enough counter space, cutting boards, knives, and friends you may
speed up this task dramatically. This second recipe is clearly parallelizable
and may benefit from more people working on it.
Whether you are cooking or programming, there are other issues to think about
before deciding on a distributed approach. If you are only cutting up enough
vegetables for your own lunch then it may not be worth your while to gather
friends and set up and later clean the extra cutting boards and knives. If you
are cutting up enough vegetables to serve at lunch at this year's WWDC then parallelizing the task
is clearly a good idea. The size of the job, the set up required for each task,
and the management costs for distributing the task are all factors. Especially
if you want to use your friend's kitchen across the street; sometimes the
time it takes to distribute the job (communication latency) overwhelms the
benefits of parallelization.
Another consideration is how tightly coupled the tasks are. As another food
example, it may be important in a restaurant that all dinner orders are
delivered to a table at the same time. In this case the "client" is the waiter
who delivers a ticket full of orders to a "controller" in the kitchen. The
controller then calls out the order and various agents start to work. The salad
station may start to work on the Caesar salad while the grill station starts the
steak. Each agent knows how to perform their particular task and how long it
takes. By communicating with each other they can ensure that the meals are ready
within a few seconds of each other. Tight coupling can refer to
interdependencies or to constraints common to two tasks. With Technology Preview
2, you can use LnxMPI
Library (a Carbon-free, sockets-based, Message Passing Interface
implementation based on MacMPI from UCLA's Project AppleSeed) to communicate
among the agents.
This was added in response to requests after the initial technology preview.
The Monte Carlo Method—“Embarrassingly Parallel”
At the Ontario Cancer Institute at the University of Toronto, Dr. Daniel Côté
has been using Xgrid to reduce the time required to perform a Monte Carlo
simulation made up of billions of calculations. This technique consists of
calculations that are parallelizable and loosely coupled. Côté reported, "I have
been using Xgrid since the beginning, and it has been a breeze to set up. For a
technology preview, this is great." Côté works in the area of medical biophysics
known as biophotonics. He explains that the best way to make predictions about
the tissue he is studying is to send in one photon at a time and calculate where
it goes and to score it. When you do this between ten million and one billion
times, you can begin to see a picture of the tissue you are investigating.
You may have performed a simple Monte Carlo experiment when you were a student to
estimate the value of pi. There the experiment consists of placing a circle of
radius one inside of a two by two unit square. You pick a random point inside
the square and determine whether or not it is inside the circle. If you repeat
this experiment thousands of times and divide the number of points that fall
inside the circle by the total number of points you will approximate pi/4.
Classroom teachers often can have each student perform the experiment one
hundred times and then aggregate their results to get an estimate of pi based on
more readings.
In Côté's work the key to parallelization is the complete independence of the
results of each photon from all past and future events. Calculating the results
for one billion photons can take a very long time. Côté explained
that on a single Power Mac G4 this job takes between two and three days. "With
Xgrid," he says, "you take your job and send it to ten computers in ten smaller
chunks and you get the job done in under six hours. The calculation is linear,
so it is easy to split it up and send it to more machines. Because the problem
is 'embarrassingly parallel' (some use the moniker
"embarrallel"—editor), it is the kind of calculation that Xgrid is
well suited for. The time reduction is almost proportional to the number of machines involved in the calculation."
Côté would love to harness even more computing power. He smiles, referring to
the Virginia Tech Supercluster, and says, "I would love to have one thousand
Xserves available but for now I could have a maximum of 20 or 30
Macs. We have access to tons of Linux, SunOS, and Alpha machines here or on our
collaborator's networks. "
For now that means that much of Côté's calculations must be performed on these
other platforms using other solutions. He explained that "If you use other
computers, controlling who will do what task and with what parameter and
monitoring the status is extremely annoying, painful, complicated, and error
prone. You often have to set it all up yourself, and it is hard. On Mac OS X,
Xgrid bypasses all of that. Apple was smart enough to build Xgrid on top of open
source protocols such as BEEP. This should be easy to implement on other
machines."
Côté says, "I look forward to sitting at my Mac and controlling all of the
machines in the institution and sending them jobs through a nice interface. That
would be paradise. Mac OS X is already shining in an academic environment,
because it integrates extremely well with all of the Mac and Unix software out
there. Xgrid makes that even better. In addition to being the digital hub, Mac
OS X could become the High Performance Computing hub."
Using Spare Cycles
While thousand-node dedicated clusters like the one at Virginia tech are
unusual, five- and ten-node clusters for bioinformatics or other research
problems are being more common. The agents can be dedicated to performing grid
tasks first or they can operate as screen saver agents that only accept grid
calculations when the computer is otherwise idle. "I am thrilled to see a
computer being used even where there's no one sitting at the machine," says
Ethan Benatan, Reed College's Director of Computer User Services. See
Figure 3: The Cluster at Reed College. At the Center for
Advanced Computation at Reed College, Xgrid is being used for nonlinear-system
computations. A centerpiece of this work is the evolution of a very complex,
discrete epidemic model. It is estimated that 10^18 (or, 1 quintillion)
operations will be required to resolve certain questions about survivor sets in
such epidemiological scenarios. Xgrid will be able to achieve this computation
effort in about one year, with a facility full of Power Mac G5's.

Figure 3: The Cluster at Reed College
Benatan explained another component to Reed's early Xgrid investigations.
They were investigating whether Xgrid would work well on public stations taking
advantage of idle time on a computer. "This opens up a new realm of usability,"
Ethan explained. "We wanted to see if we can put Xgrid in our public labs or
clusters and make sure it works with no impact on the day-to-day users. It
worked out great. A user could sit at a machine and not know that they were
working on a machine that was part of a grid. "He says that they installed Xgrid
on machines known to students as a test wing and no one reported noticing any drop in performance. They
are a small college with seventy-five Power Mac G4's that typically spend two-thirds to three-quarters of
their time doing nothing. Xgrid allows these spare cycles to be used in much the
same way that projects such as SETI@home and Folding@Home
use spare cycles to search for extraterrestrial intelligence or protein folding while the screensaver is
active.
Mathematical researchers led by Dr. Peter Borwein at Simon Fraser University
in Burnaby, British Columbia have used Xgrid in their exploration of the
well-known, difficult problem of finding low autocorrelation binary
sequences. With the help of Xgrid, the group has harnessed the computing power of
machines in student labs at the university to create a system capable of
processing at more than 30 GHz.
Recordbreaking results produced with the Xgrid system will be highlighted in an upcoming academic
publication. Borwein says, "Xgrid has given us a tremendous number of essentially
free and easy to cluster computer cycles on our lab Power Macs. It is a very efficient
way to exploit our resources '24/7' on interesting research problems."
You can read more about the problem being solved on Joshua Knauer's LABS web
page on Least Correlated Binary Sequences.
Each calculation is straightforward. You take a sequence of ones and negative ones and add up products of pairs of
numbers that occupy the same position. You do this while shifting one of the
sequences over and add up the squares of these results to calculate the energy
of the sequence. This number is then used to calculate the merit factor which is
a measure of the self similarity of a sequence. Although none of the
calculations are difficult, there are a large number of them to make and these
calculations are easily split among many agents.
Simon Fraser's Noah Adams says, "We didn't have to do very much to move
from the version that ran on a single machine to the version that ran on Xgrid.
Mainly we had to bring in random seeds. You can't really use time as your seed
when you are starting 100 jobs at the same time. Xgrid supports providing this
random seed. We've seen the tachometer peak at around 60 GHz while running on a
grid of 700MHz eMacs."
These are just some of the projects that Xgrid is being used for. Others
include a FORTRAN-based jet noise prediction, code "Jet3D," tested at NASA
Langley Research Center in Hampton, Virginia. The project has been run across a
distributed cluster of Power Mac G5, Power Mac G4, and Xserve G4 systems using the
Xfeed job submission interface of Xgrid. A total of eight G4 and two G5 processors
were run, resulting in performance of approximately 32 GFLOPS.
Another Xgrid project is at NASA Langley Research Center's AAAC/Configuration Aerodynamics Branch, where Dr.
Craig Hunter helped to Alpha test Xgrid. Hunter's main contribution was "to
validate that Xgrid worked to run FORTRAN-based software in a typical research
environment and provide feedback to the developers."
These projects should give you some idea of what you might be able to do with Xgrid.
Creating Your Xgrid
Look around at the CPUs that you would put to use if you had an easy
way of harnessing that unused power. Consider the problems you work on each
day. If you have scientific computation that is loosely coupled and
embarrassingly parallel, Xgrid can provide you with the framework to distribute
your jobs among all of the available machines. Then take another look at the
tachometer from Reed, and consider what you could do with 30 GHz of
power.
For More Information
Updated: 2005-03-09
|