GRID

From ift
Jump to: navigation, search

Purpose

The LHC experiments will produce unprecedented amounts of data, reaching several PB/year for a standard ALICE data taking year. Handling such amounts of data clearly calls for automated procedures for analysis and data management. Grid techniques will be used to handle the worldwide computing task necessary to extract the ALICE physics results.

The general idea of a computing grid is to make world wide computing resources seamlessly available in analogy to information diffusion by World Wide Web. Several Grid prototypes are being developed, but there are still a long range of challenges to provide a full fledged Grid solution.


Grid systems

Several Grid systems are being developed, among them the gLite/EGEE system being developed by a European project, and the Open Science Grid with American roots. The ARC middleware developed by the NorduGrid project is being actively used in the Nordic countries. The ALICE collaboration has developed the Grid prototype AliEn (Alice Environment). As other middleware evolves, AliEn will be changed into a common high level interface for the ALICE collaboration.

Grid dictionary

Grid is a rapidly evolving technology. With its roots in Computer Science, it is hardly surprisning to find that it contains a lot of names and acronyms. The following is a non-exhaustive list of relevant terms for the ALICE grid activity:

AliEn - Alice Environment 
Grid prototype developed by the ALICE collaboration. Originally used as a "standalone" Grid prototype, AliEn will be further developed with interfaces to major Grid middleware systems, and will be used as a common platform for all Grid based analysis in ALICE.
LCG - LHC Computing Grid 
A common project set up to coordinate computing needs for all LHC experiments. LCG is also the name of a Grid middleware package. This will be merged into the software to be developed by the common European project EGEE. The LCG project will be maintained with the perspective of organising LHC computing (which is far more than middleware development).
EGEE - Enabling Grids for E-Science in Europe 
EU-funded project to develop a general Grid middleware for scientific needs. The EGEE project has strong roots in the original LCG project. The current version of the EGEE middleware is called gLite.
gLite 
Common European middleware developed by the EGEE project (see above). This middleware will be used by most of the European Tier-1 centres (The Nordic centre being an exception).
OSG - Open Science Grid 
US computing infrastructure including Grid middleware. To be used by US and other computing centres also for LHC computing.
NorduGrid 
A Nordic project that has developed a Grid protoype now distributed as ARC. This middleware has widespread use in Scandinavia and Northern Europe, and has been widely used in ATLAS data challenges.
ARC - Advanced Resource Connector 
Middleware developed by the NorduGrid project. This middleware will be used by the Nordic Tier-1 centre. Interfacing between AliEn and ARC will be developed.
Tier-x 
The computer centres that will take part in LHC data processing have been organised in a hierarchical structure, so that Tier-0 is at CERN, where all data will originally be recorded. There will be a limited number of Tier-1 centres, which will be manned/serviced 24/7, and which will provide permanent storage facilities along with computing resources. Tier-2 centres will connect to CERN through a Tier-1 centre, and will basically provide computing resources.
Nordic Tier-1 centre 
The Nordic countries have agreed to provide distributed Tier-1 centre for LHC processing. This centre will appear as one entity towards the central facility at CERN, but will be physically distributed amongcomputing centres in the Nordic countries. All the participating centres will fulfill the service level requirements for a Tier-1 centre. The Nordic Tier-1 centre will use ARC as its middleware for the internal distribution of jobs and resources.
NDGF - Nordic Data Grid Facility 
The organisation that has been set up to manage the Nordic Tier-1 centre. No machines are owned or runned by NDGF, the organisation provides resources and some manpower to existing computing centres.
Condor 
A system to handle jobs in a heterogenous distributed computer environment. ARC uses Condor to handle jobs inside each participating computer cluster.
VO - Virtual Organisation 
Logical Grid subdivision grouping Grid users belonging to the same project, who should have access to the same resources. As an example ALICE will be a VO.
VO Box 
In a full-fledged Grid all necessary computing resources should be handled through Grid mechanisms, and the user should not care where a given job is actually handled. As Grid middleware is still a project under development, and as LHC needs data processing at startup in 2007, it has been agreed upon an "interim" solution allowing each VO to have access to dedicated computing resources at each participating centre, where specific software can be installed. Experiment responsibles may have root access at the VO Box, but not at the general computing resources.
CE - Computing element 
The part of the Grid middleware that do job management, and handles CPU resources. The AliEn installation at the hansa cluster in Bergen will use Condor as its Computing Element.
SE - Storage element 
The part of the Grid middleware that handle files residing locally. In Tier-1 centres, which will provide permanent storage for Grid-wide access, this will normally be some storage management system including a tape robot and staging space on disk. At the hansa cluster a 250 GB disk is set aside to operate as a Grid SE for AliEn.
FTD - File Transfer Daemon 
The part of the Grid middleware that handle transport of files between sites. This should happen transparently, so that the user in principle should not need to be aware of where the file is actually located. Authentication issues are handled through Grid certificates, which are somewhat parallell to SSH keys used for traditional remote computing.

AliEn in Bergen

System resources for ALICE Grid production will be operated by Parallab connected to the Nordic Data Grid Facility. The NDGF will run ARC as its basic middleware. The necessary software to interface ARC and AliEn is currently being developed.

For development and prototyping purposes, Grid software is also installed in the local cluster at the Department of Physics and Technology, called hansa. The following information provides more details on the Grid/AliEn installation on the hansa cluster.


Needed tools

The AliEn software needs a batch system underneath to run and distribute jobs. AliEn supports many different batch systems, like BQS, CONDOR, DQS, Globus, LSF and PBS. For hansa we are using Condor, in order to be able to use its Flocking feature. On top of it AliEn will be installed to participate in the AliEn grid.

Installing Condor

Condor is developed by the University of Wisconsins Madison.

A detailed description of the Condor Installation on hansa can be found here.


Installing AliEn

AliEn is developed at CERN as GRID middleware for ALICE.

A detailed description of the AliEn installation on hansa00 can be found here (preliminary).

General description for installing AliEn is here..


GRID Meeting Minutes

Here are the minutes from our (weekly) GRID meeting at Høygskolen, starting with November the 27th 2006.