Virtual
computers reconfigure on the fly
By
Ted Smalley Bowen,
Technology Research News
Grid computing, which pieces together temporary,
virtual computers from resources on the Internet, is in theory a good
way to handle tough number-crunching tasks that change over time.
Grid software combines the muscle of a few or even hundreds of computers
by coordinating scheduling and security across the different types of
systems. These combined resources are needed to speed up scientific and
engineering applications that frequently involve complicated equations
and elaborate graphical simulations.
But early efforts have only been able to handle simple, relatively predictable
programs, rather than the complex, custom programs run by scientists,
engineers, and their ilk.
What is lacking is a steady mechanism for maintaining sufficient levels
of compute power for the duration of a virtual Grid computer's tasks.
Grid applications need to be able to monitor the resources and performance
of the systems that fuel them and switch to other appropriate systems
when the original contributors fail to meet their requirements.
Toward this end, a group of researchers at Argonne National Labs, the
University of California at Berkeley, the University of Chicago, and the
Max Planck Institute for Gravitational Physics have developed software
that reconfigures a virtual grid computer on-the-fly in order to keep
it humming.
This adaptive approach is designed to help existing Grid computing software
address the compute power problem. "Grid computing must be adaptive, because...
one is required to operate in an environment about which one has imperfect
knowledge and that has dynamically varying characteristics," said Ian
Foster, a professor of computer science at the University of Chicago,
and an associate director in the Department of Computer Science at Argonne
National Laboratory in Argonne, IL.
The researchers' software uses notification and event services to determine
when things change, said Foster.
To create the system, the researchers started with the Cactus set of Grid
computing tools, which allow programmers to run groups of calculations
in parallel across multiple computers that can range from PCs to supercomputers.
The researchers also used the Globus toolkit to provide Grid resource
discovery, access, location, migration and security functions.
The researchers added programs for adapting applications to run on different
types of computer systems, for detecting drops in performance, for finding
appropriate resources, and for handling the migration process.
They also added software that keeps tabs on the progress of a given program
through a series of checkpoints in order to carry that information over
to new systems as they are recruited.
The checkpoints save a snapshot of the computation in a form that permits
the job to be shifted to another system, even one that has a very different
architecture and operating system, or different amounts of disk space
and memory, said Foster.
The various systems involved in a virtual Grid computer using the researchers'
software must perform to the standards of a contract between the user
and the systems providing the compute power. If a contract is broken,
the software finds other resources and reconfigures the virtual computer.
The researchers evaluated their software on several Grid testbeds, loading
down virtual Grid computers with more and more tasks until performance
dropped by more than 10 percent. They set the software so it found alternative
resources that gave the bogged-down virtual computer more compute power
after three such drop-offs.
The researchers' system currently requires the operators to monitor this
performance manually. "We obtain per-time-step measurements, and monitor
according to a user-specified definition of what forms a contract violation.
Future plans have us doing this automatically," Foster said.
The experiment involved no scheduling software, although eventually computers
participating in Grid applications will be subject to random use and will
need to prioritize their resources. "So far, we assume no scheduling technology:
applications discover unloaded servers, and initiate computation there
if authorized," said Foster.
The researchers plan to add asynchronous notification of resources, meaning
an application can begin at a lower speed or fidelity, and improve if
and when more resources become available, he said.
The software is powerful and generic enough that it can accommodate many
different variables for determining application migration, said Henri
Casanova, a researcher in the Grid computing lab at the University of
California San Diego. It is likely to "motivate Grid application developers
to architect their applications in ways that will support migration,"
he said.
In doing so it will open up the interesting questions of how to decide
whether to trigger migration, and when and where to do it, Casanova said.
The work also opens the way for a more detailed exploration of Grid computing
issues like scheduling, resource selection, and application adaptability,
he said.
In general, the scheme is best suited for large applications that must
run over long periods, said Casanova. In large scientific simulations
that consume large amounts of tightly coordinated resources, migration
will be useful if the cost of migrating is not greater than the cost of
running the application on potentially sub-optimal resources, he said.
Foster’s colleagues in the study were Gabrielle Allen, Gerd Lanfermann
, Thomas Radke and Ed Seidel of the Max Planck institute, David Angulo
and Chuang Liu of the University of Chicago, and John Shalf of Lawrence
Berkeley National Laboratory. The work is slated to appear in an upcoming
issue of the International Journal of Supercomputer Applications. The
study was funded by the National Science Foundation (NSF).
Timeline: Now
Funding: Government
TRN Categories: Distributed Computing; Applied Computing;
Supercomputing
Story Type: News
Related Elements: Technical paper, “The Cactus Worm: Experiments
with Dynamic Resource Discovery and Allocation in a Grid Environment”,
slated for publication in November in the International Journal of Supercomputer
Applications.
Advertisements:
|
November
28, 2001
Page
One
Programmable DNA debuts
Device would boost
quantum messages
Virtual computers
reconfigure on the fly
Software sorts video
soundtracks
Bigger disks
won't hit quantum barrier
News:
Research News Roundup
Research Watch blog
Features:
View from the High Ground Q&A
How It Works
RSS Feeds:
News | Blog
| Books
Ad links:
Buy an ad link
Advertisements:
|
|
|
|