Yet another headache is latency — the delays that build up as data are transmitted over the Internet. The speed of light sets a limit to how fast electronic (or, indeed, optical) signals can travel. It takes about two-tenths of a second for light to travel halfway around the earth in an optical fibre, an aeon for an impatient processor. Smart software is needed to ensure just-in-time data delivery. Otherwise, the range of problems that the Grid will be able to deal with will be confined to the so-called “embarrassingly parallel”.
Such computations are carried out on different machines that do not need to wait for results from one another to proceed. This is much simpler to organise than the typical types of parallel processing run on commodity clusters, where the calculations have to move in lockstep, sharing information at regular intervals. It is even more primitive when compared with such advanced supercomputers as IBM’s Blue Gene, in which constant communication between processors is the core concept.
Challenging as this technical issue may be, more mundane problems could be a greater nuisance. Much to the chagrin of Grid purists, the system will probably have to include means for conducting virtual brokerage of computer power. This is going to be needed for accounting purposes, especially when commercial applications are involved. At the first Global Grid Forum in Amsterdam last March, Bob Aiken, a manager at Cisco Systems in San Jose, California, warned that the biggest challenges to the successful deployment of the Grid will be social and political rather than technical. Several academics have already tried to devise solutions to this problem — by incorporating some of the business tricks adopted by the peer-to-peer companies. But until there are large applications running on the Grid, such proposals remain literally academic.
Debugged by Science
As with the Internet, scientific computing will be the first to benefit from the Grid — and the first to have to deal with the Grid’s teething problems. For instance, GriPhyN is a Grid being developed by a consortium of American laboratories for physics projects. One such study aims to analyse the enormous amounts of data logged during digital surveys of the whole sky using large telescopes. The Earth System Grid is part of another American academic initiative. In this case, the object is to make huge climate simulations spanning hundreds of years, and then analyse the massive banks of data that result. Other initiatives include an Earthquake Engineering Simulation Grid, a Particle Physics Data Grid, and an Information Power Grid Project supported by NASA for massive engineering calculations.
Perhaps the most urgent example where a Grid solution is needed is at CERN, the European high-energy physics laboratory near Geneva. It is here, beneath the green fields straddling the French border, that the next generation Large Hadron Collider (LHC) will produce data at unheard-of rates when it starts running in 2005. The particle collisions in the LHC’s underground ring will spew out petabytes (billions of megabytes) of data per second — enough to fill all the hard-drives in the world within days.