Title: Accessing the Cluster Machine of VECC
1Accessing the Cluster Machine of VECC
Grid and cluster concepts Grid Requirements /
Infrastructure Hardware Requirement for making
Cluster Software Requirement for Cluster
Making Open Source Software About Quattor Quattor
Installation Other Software requirement for
Cluster How to connect VECC Cluster How to Submit
Job in VECC Cluster New Concept of using
Web/WWW Future MPI Work
2What is Grid
Grid is a type of parallel and distributed system
that enables the sharing, selection, and
aggregation of geographically distributed
"autonomous" resources dynamically at runtime
depending on their availability, capability,
performance, cost, and users' quality-of-service
requirements.
GRID computing
Grids enable the wide variety of geographically
distributed computational resources (such as
supercomputers, computing clusters, storage
systems, data sources, instruments, people) and
presents them as a single, unified resource for
solving large-scale compute and data intensive
computing applications. The emphasis is on
Distributed supercomputing High throughput
data intensive applications Large scale
storage High speed network connectivity
3Basic Elements of Grid Environment
4ALICE Data Requirement
1 year of Pb-Pb running 1 Pbytes of data 1
year of p-p running 1 Pbytes of
data Simulations 2 PBytes Total Data
storage 4 Pbtyes/year
ALICE computing requirements
Simulations, Data reconstruction Analysis will
use about 10,000 PC- years. For fast pilot
analysis and calibration 7MSI2k computing
resource will be needed
Data GRID is the solution for ALICE
Connect high performance computers from all
collaborating countries with a high speed
secured network. Implement one virtual
environment that is easy for the end user.
5Grid Applications
Scientific Applications
Distributed Supercomputing (Stellar
Dynamics) High-throughput Computing (Parametric
Studies) On-demand (Smart Instruments) Data
Intensive (Data mining) Data Exploration Collabo
rative Engineering (Collaborative Design) High
energy physics
Different Applications
Molecular modeling for drug design Brain
activity analysis Analogous to electric power
network(Grid) Nuclear Simulations Environmental
Studies Astrophysics etc
6ALICE computing model / Task distribution for
Different Tiers
- T0
- Does first pass reconstruction
- Stores one copy of RAW, calibration data and
first-pass ESDs, Making of AOD and Tag Objects - T1
- Does reconstructions and scheduled analysis
- Stores second collective copy of RAW, one copy
of all data to be kept, disk replicas of ESDs
and AODs - T2
- Does simulation and end-user analysis,
distribution of Monte-Carlo simulated data etc. - Stores disk replicas of ESDs and AODs
- T3 T4
- Does Derived Physics Data (DPD) objects will be
processed - End-user analysis (Physicists)
7Infrastructure require for Tier-2 / Tier-3 Centre
of Alice Grid
- For Individual site according to purpose or Place
in Tier Like Monarch Structure Resource
Requirement and Procurement will vary. - Individual site is known as Cluster.
- One can add oneself to Global Grid.
- Registration and Certificate is Required.
- There are two main aspects for Developing Tier-2/
Tier-3 Centre- - Hardware Aspects
- Software Aspects
8Hardware Requirement for Building Tier-3 Centre
Money Money matters. Here money can not
purchase Research but it will provide good
platform for doing Research. Key Issue
Particular Specification of Hardware in regard to
Advanced Technology will not affect our Research
Business so Make Intelligent Balance between Cost
and Latest Technology in Procuring
Hardware. Thumb Rule Purchase always one lower
to Latest hardware specification. Purchase
Server like system according to your
requirement. No need of any hardware specific
machine for building cluster. Here one thing I
want to mention that Hardware cost is always
reducing so When you about to suffer then
purchase the required things. I have one slide
for specification of Tier-3 Hardware.
9Hardware Suggestion for Tier-3 Centre
As we all have individually allotted Money for
Hardware in TIER-3 Centre.
Server Machine 4 Nos or More (Budget
Dependant) If Space/Room problem than go for 1U
and 2U type Server (Rack Mountable) According
to local market or availability decide any
Branded Company Vender (We can Rely on Branded
Products) Particular Specifications- Processor
(Dual CPU Xeon or Equivalent) 3.0 GHz and
More L2 Cache 2 MB or More RAM (DDR2) 4
GB or More NIC Dual Gigabit ChipSet
Suitable for Linux (Scientific Linux
Compatible) Technology i386 (32 Bit Machine)
10Hardware Suggestion for Tier-3 Centre
Storage 1TB or More (In 12 or 16 Port Storage
Box) NAS (SATA) (Cheaper) 250GB N (Price
and Budget Dependent) Later as
requirement procure more. (Purchase one
small UPS for Power failure protection.) NAS
(SCSI) Bit Costlier but More Reliable. SAN
(Accessing technique is different than NAS, Byte
wise Access) Costlier and not much
extendable FC (Fiber Channel) Based storage New
Technology, Fast but Costliest
11Hardware Suggestion for Tier-3 Centre
Bandwidth 512Kbps or More (Over
VLAN) This is main concern and Money spend on
this is useful because our Business is Data
Intensive. In this case Vender is Local
Supplier which one is available at nearest to
your location But make commitment to extent
according to our requirement. In Our VECC
dedicated 4MBPS leased line for Grid machine is
purchased from BHARTI. LAN One 12 or 16
Port Gigabit Switch (3 Com / Cisco
Equivalent) WAN Modem / Router Depend on the
provider (Preferable Cisco,.)
12Software Requirement for making Cluster
This is very time consuming task if we do not
want to invest single penny. If we have money,
Outsource this and take REST but not good. Main
drawback with this approach. Like - Hardware
dependency Maintenance (AMC) cost is also
high. Not giving any benefit or efficiency to
whole system.
For going Open Source Many Cluster Making S/W or
suits are available.But everyone needs some
expertise in that because it is free of cost
thing So not spoon feeding like thing. Eg.
S/W- OSCAR Free but No of Client is
limitation SCALI Paid with Network
Cards Redhat Cluster Suits Not much
suitable CPM (Central Processor Manager) IBM
Proprietary Rocks Quattor Free and Best
Suitable For selecting which one is Best
according to our requirement. So some R/D is
required.
13Software Requirement for making Cluster
Installing a Quattor Server and Client
Quattor is a large scale management system for
managing medium to very large (gt1000 node)
clusters.
quattor is an administration toolkit for
optimizing resources
- Download Quattors RPM according to your System.
- 3 Set of RPM is available-
- 1. i386 - For all Pentium or Xeon processor or
that has IA32 bit Instruction set - IA64 - For 64 bit machine means Intel Itanium
- i86X64 - For64 bit machine but also supports p86
instruction set like AMD Opetron
Site Address- http//quattor.org Package
RPMs- http//quattorsw.web.cern.ch/quattorsw/sof
tware/quatttor
Requirements SL3 (including SLC3), or RH Linux
7.3 Disk 6.5 GB for Server, 2.5 GB per client OS
No Specific Hardware or software required for
building Quattor Cluster
14Installing a Quattor Server and Client
CDB Configuration Data Base SWRep SoftWare
Repository AII Automated Installation
Infrastructure client
Available meta-packages quattor-client install
client packages (CCM, NCM basic components,
CDB CLI, SWRep CLI) quattor-cdb install
Configuration Database (CDB) server quattor-cdbsq
l install the CDBSQL backend server quattor-swre
p install the SPMA SW Repository (SWRep)
server quattor-aii install the Automated
Installation Infrastructure (AII)
- Install first SLC3 and apache, then quattor-cdb
- Start up apache
- Initialize CDB
- Adding users to CDB
- Setup of cdb-cli
- Run cdbop to check that the setup is OK
- Installing the SW Repository
- SWRep can be installed on the same or on a
different server as CDB.
15Bootstrap installation
- Create platforms
- Create areas
- Upload packages
- Query the repository
- Generate PAN template with repository contents
Installing the AII
- AII (Automated Installation Infrastructure) works
on top of native RH/SL installer using PXE. -
- Anaconda/KickStart
- DHCP server (IP address kernel location)
- TFTP server (boot kernel)
- HTTP server (OS imagespackages)
16For Installing Cluster Site Basic Requirement
Cluster Building Quattor
- Some basic steps after Quattor installation
- Disable password for switching to client nodes
- Make Password file same in all Clients
- C3 commands
- for High availability (if Dual NIC)
- install Bonding Package
Job Scheduling Condor
Download condor RPM and install According to
manual do some changes in configuration file and
start master daemon. (Some expertise is needed)
Cluster Monitoring Job Throwing Ganglia
With UDG Grid connectivity Monitoring
Monalisa (Work is going on)
17How to connect with VECC Cluster/ Grid Machine
Here I will tell 2 methods for throwing jobs.
1st Method- Command Line connectivity
By SSH- Eg - ssh vikas_at_grid.veccal.ernet.in O
ur machine name - grid.veccal.ernet.in IP
address- 202.41.93.67
- For Different Collaborators - login is
- For Jammu university - jammu
- For Jaipur Univesity - jaipur
- For Chandigarh university - chandigarh
- For IIT Bombay - mumbai
- For IOP Bhubaneswar - bhubaneswar
18In VECC Cluster / Grid Machine two types of nodes
One Interactive node- Just in present seen only
one Interactive node. After login in Grid
machine do ssh interactive001 And u can work
on grid machine itself But be NICE for
others. Other Computing type of nodes- Here
6 Computing nodes (node001 to node006). You
cannot login to these nodes. But you can compute
your jobs there. But you can use these for
Batch mode for computing not in Interactive
mode. I will explain this Online by condor batch
scheduler you can fulfill your requirement
19How to throw job on Grid Machine
After login in Grid Machine for Job scheduling -
condor is job scheduler I will show this
Online. It is very light weight process and runs
few threads in background, Write your program
as you write make every thing ready. Now for
compilation you have to use some Condor
facilities. Compile with help of
condor_compile condor_compile ltwrite your
compilation line as earliergt Eg.
condor_compile cc o myprogram.C Now we have to
execute or run our compiled or object file. Here
we have to do little work for making Submit
Description File eg sample.cmd. In this file we
have to write some simple lines. executable
myprogram input sample.in output
sample.out error sample.log log
sample.log queue condor_submit sample.cmd
20For Getting Help for all this
- All Condor commands are like the Linux commands
so every command which you can use has (man)
manual pages. - Just do
- Man condor_compile
- Man condor_submit
- Man condor_q
- Man condor_status
- Man condor_rm
2nd Methods - Through WWW / Web This one I will
show online.
21Future Work
For doing Day One Physics
Learn MPI (Massage Passing Interface) for writing
code in parallel Language of facility
Query and Discussions Suggestions