Careful filtering will slow this prodigious rate down to a few petabytes a year. Even so, this is still too much for CERN to deal with on site. The data will therefore be disseminated through a four-tiered system of computer centres, allowing them to be spread over thousands of individual workstations in dozens of universities around the world.
Here, the Grid will come into its element. Repeatedly searching and retrieving large data sets from thousands of computers would be extremely inefficient. So researchers at CERN are looking at ways of processing the data where it is — allowing a physicist sitting at any participating workstation to talk to the rest of the complex web of computers and data storage systems as if they were in his own laboratory.
This raises a radical new concept of “virtual data”. The idea is that new data, derived from the analysis of raw data, should not be discarded after use but saved for retrieval, in the same way that raw data are stored. In a later calculation, the Grid’s software could then automatically detect whether a computation had to be started from scratch, or could take a short cut using a previous result stored somewhere in the system.
The European response to all the Grid activity in America is the DataGrid project co-ordinated by CERN. Here, the objective is to develop middleware for research projects in the biological sciences, earth observation and high-energy physics. As befits a project overseen by bureaucrats in Brussels, the search for an overarching Grid standard for many different science projects has become a leitmotif. Fabrizio Gagliardi, the project manager for DataGrid, despairs at the many Grid initiatives already underway. “If we each develop similar solutions,” he says, “it simply won’t work.”
A plethora of Grid standards is a real possibility. After all, even electricity has no worldwide standard of voltage or frequency. Much of the talk at the first Global Grid Forum was about giving Grid development a sense of common cause. But once the commercial potential of the Grid begins to dawn, standard-setting skirmishes will break out between companies and even countries.
While scientists gear up to use the Grid, the question remains whether — beyond the search for such exotica as Higgs bosons, running vast protein-folding calculations or simulating the weather — there is any real need for such massive computing. Cynics reckon that the Grid is merely an excuse by computer scientists to milk the political system for more research grants so they can write yet more lines of useless code. There is some truth in this. Many of the Grid projects running today resemble solutions in search of problems. And given the number of independent Grid initiatives, much of the work under way is going to be redundant in any case. But there is always the hope that the competition will breed a better class of Grid in the end.
A more serious criticism is that today’s Grid projects focus too much on storing and analysing large data sets, and making the sort of “embarrassingly parallel” calculations that only scientists need. These are obvious applications of the Grid. The real challenge is to figure out how to achieve the more ambitious goals that Grid enthusiasts have set themselves. Some of these are outlined by Mr Foster and colleagues in a paper that will appear shortly in the International Journal of Supercomputer Applications. The authors argue that the Grid will really come into its own only when people learn to build “virtual organisations”. Such a virtual organisation could be a crisis-management team dealing with an earthquake or chemical spill. In such circumstances, the Grid would be ideal for analysing local weather, soil models, water supplies and local demographics. It could even help with communications, allowing field workers to discuss problems with office staff by means of video conferencing.