Accessing the Cluster Machine of VECC - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Accessing the Cluster Machine of VECC

Description:

... condor RPM ... will explain this Online by condor batch scheduler you can fulfill ... All Condor commands are like the Linux commands so every ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 22
Provided by: vikass9
Category:

less

Transcript and Presenter's Notes

Title: Accessing the Cluster Machine of VECC


1
Accessing the Cluster Machine of VECC
Grid and cluster concepts Grid Requirements /
Infrastructure Hardware Requirement for making
Cluster Software Requirement for Cluster
Making Open Source Software About Quattor Quattor
Installation Other Software requirement for
Cluster How to connect VECC Cluster How to Submit
Job in VECC Cluster New Concept of using
Web/WWW Future MPI Work
2
What is Grid
Grid is a type of parallel and distributed system
that enables the sharing, selection, and
aggregation of geographically distributed
"autonomous" resources dynamically at runtime
depending on their availability, capability,
performance, cost, and users' quality-of-service
requirements.
GRID computing
Grids enable the wide variety of geographically
distributed computational resources (such as
supercomputers, computing clusters, storage
systems, data sources, instruments, people) and
presents them as a single, unified resource for
solving large-scale compute and data intensive
computing applications. The emphasis is on
Distributed supercomputing High throughput
data intensive applications Large scale
storage High speed network connectivity
3
Basic Elements of Grid Environment
4
ALICE Data Requirement
1 year of Pb-Pb running 1 Pbytes of data 1
year of p-p running 1 Pbytes of
data Simulations 2 PBytes Total Data
storage 4 Pbtyes/year
ALICE computing requirements
Simulations, Data reconstruction Analysis will
use about 10,000 PC- years. For fast pilot
analysis and calibration 7MSI2k computing
resource will be needed
Data GRID is the solution for ALICE
Connect high performance computers from all
collaborating countries with a high speed
secured network. Implement one virtual
environment that is easy for the end user.
5
Grid Applications
Scientific Applications
Distributed Supercomputing (Stellar
Dynamics) High-throughput Computing (Parametric
Studies) On-demand (Smart Instruments) Data
Intensive (Data mining) Data Exploration Collabo
rative Engineering (Collaborative Design) High
energy physics
Different Applications
Molecular modeling for drug design Brain
activity analysis Analogous to electric power
network(Grid) Nuclear Simulations Environmental
Studies Astrophysics etc
6
ALICE computing model / Task distribution for
Different Tiers
  • T0
  • Does first pass reconstruction
  • Stores one copy of RAW, calibration data and
    first-pass ESDs, Making of AOD and Tag Objects
  • T1
  • Does reconstructions and scheduled analysis
  • Stores second collective copy of RAW, one copy
    of all data to be kept, disk replicas of ESDs
    and AODs
  • T2
  • Does simulation and end-user analysis,
    distribution of Monte-Carlo simulated data etc.
  • Stores disk replicas of ESDs and AODs
  • T3 T4
  • Does Derived Physics Data (DPD) objects will be
    processed
  • End-user analysis (Physicists)

7
Infrastructure require for Tier-2 / Tier-3 Centre
of Alice Grid
  • For Individual site according to purpose or Place
    in Tier Like Monarch Structure Resource
    Requirement and Procurement will vary.
  • Individual site is known as Cluster.
  • One can add oneself to Global Grid.
  • Registration and Certificate is Required.
  • There are two main aspects for Developing Tier-2/
    Tier-3 Centre-
  • Hardware Aspects
  • Software Aspects

8
Hardware Requirement for Building Tier-3 Centre
Money Money matters. Here money can not
purchase Research but it will provide good
platform for doing Research. Key Issue
Particular Specification of Hardware in regard to
Advanced Technology will not affect our Research
Business so Make Intelligent Balance between Cost
and Latest Technology in Procuring
Hardware. Thumb Rule Purchase always one lower
to Latest hardware specification. Purchase
Server like system according to your
requirement. No need of any hardware specific
machine for building cluster. Here one thing I
want to mention that Hardware cost is always
reducing so When you about to suffer then
purchase the required things. I have one slide
for specification of Tier-3 Hardware.
9
Hardware Suggestion for Tier-3 Centre
As we all have individually allotted Money for
Hardware in TIER-3 Centre.
Server Machine 4 Nos or More (Budget
Dependant) If Space/Room problem than go for 1U
and 2U type Server (Rack Mountable) According
to local market or availability decide any
Branded Company Vender (We can Rely on Branded
Products) Particular Specifications- Processor
(Dual CPU Xeon or Equivalent) 3.0 GHz and
More L2 Cache 2 MB or More RAM (DDR2) 4
GB or More NIC Dual Gigabit ChipSet
Suitable for Linux (Scientific Linux
Compatible) Technology i386 (32 Bit Machine)
10
Hardware Suggestion for Tier-3 Centre
Storage 1TB or More (In 12 or 16 Port Storage
Box) NAS (SATA) (Cheaper) 250GB N (Price
and Budget Dependent) Later as
requirement procure more. (Purchase one
small UPS for Power failure protection.) NAS
(SCSI) Bit Costlier but More Reliable. SAN
(Accessing technique is different than NAS, Byte
wise Access) Costlier and not much
extendable FC (Fiber Channel) Based storage New
Technology, Fast but Costliest
11
Hardware Suggestion for Tier-3 Centre
Bandwidth 512Kbps or More (Over
VLAN) This is main concern and Money spend on
this is useful because our Business is Data
Intensive. In this case Vender is Local
Supplier which one is available at nearest to
your location But make commitment to extent
according to our requirement. In Our VECC
dedicated 4MBPS leased line for Grid machine is
purchased from BHARTI. LAN One 12 or 16
Port Gigabit Switch (3 Com / Cisco
Equivalent) WAN Modem / Router Depend on the
provider (Preferable Cisco,.)
12
Software Requirement for making Cluster
This is very time consuming task if we do not
want to invest single penny. If we have money,
Outsource this and take REST but not good. Main
drawback with this approach. Like - Hardware
dependency Maintenance (AMC) cost is also
high. Not giving any benefit or efficiency to
whole system.
For going Open Source Many Cluster Making S/W or
suits are available.But everyone needs some
expertise in that because it is free of cost
thing So not spoon feeding like thing. Eg.
S/W- OSCAR Free but No of Client is
limitation SCALI Paid with Network
Cards Redhat Cluster Suits Not much
suitable CPM (Central Processor Manager) IBM
Proprietary Rocks Quattor Free and Best
Suitable For selecting which one is Best
according to our requirement. So some R/D is
required.
13
Software Requirement for making Cluster
Installing a Quattor Server and Client
Quattor is a large scale management system for
managing medium to very large (gt1000 node)
clusters.
quattor is an administration toolkit for
optimizing resources
  • Download Quattors RPM according to your System.
  • 3 Set of RPM is available-
  • 1. i386 - For all Pentium or Xeon processor or
    that has IA32 bit Instruction set
  • IA64 - For 64 bit machine means Intel Itanium
  • i86X64 - For64 bit machine but also supports p86
    instruction set like AMD Opetron

Site Address- http//quattor.org Package
RPMs- http//quattorsw.web.cern.ch/quattorsw/sof
tware/quatttor
Requirements SL3 (including SLC3), or RH Linux
7.3 Disk 6.5 GB for Server, 2.5 GB per client OS
No Specific Hardware or software required for
building Quattor Cluster
14
Installing a Quattor Server and Client
CDB Configuration Data Base SWRep SoftWare
Repository AII Automated Installation
Infrastructure client
Available meta-packages quattor-client install
client packages (CCM, NCM basic components,
CDB CLI, SWRep CLI) quattor-cdb install
Configuration Database (CDB) server quattor-cdbsq
l install the CDBSQL backend server quattor-swre
p install the SPMA SW Repository (SWRep)
server quattor-aii install the Automated
Installation Infrastructure (AII)
  • Install first SLC3 and apache, then quattor-cdb
  • Start up apache
  • Initialize CDB
  • Adding users to CDB
  • Setup of cdb-cli
  • Run cdbop to check that the setup is OK
  • Installing the SW Repository
  • SWRep can be installed on the same or on a
    different server as CDB.

15
Bootstrap installation
  • Create platforms
  • Create areas
  • Upload packages
  • Query the repository
  • Generate PAN template with repository contents

Installing the AII
  • AII (Automated Installation Infrastructure) works
    on top of native RH/SL installer using PXE.
  • Anaconda/KickStart
  • DHCP server (IP address kernel location)
  • TFTP server (boot kernel)
  • HTTP server (OS imagespackages)

16
For Installing Cluster Site Basic Requirement
Cluster Building Quattor
  • Some basic steps after Quattor installation
  • Disable password for switching to client nodes
  • Make Password file same in all Clients
  • C3 commands
  • for High availability (if Dual NIC)
  • install Bonding Package

Job Scheduling Condor
Download condor RPM and install According to
manual do some changes in configuration file and
start master daemon. (Some expertise is needed)
Cluster Monitoring Job Throwing Ganglia
With UDG Grid connectivity Monitoring
Monalisa (Work is going on)
17
How to connect with VECC Cluster/ Grid Machine
Here I will tell 2 methods for throwing jobs.
1st Method- Command Line connectivity
By SSH- Eg - ssh vikas_at_grid.veccal.ernet.in O
ur machine name - grid.veccal.ernet.in IP
address- 202.41.93.67
  • For Different Collaborators - login is
  • For Jammu university - jammu
  • For Jaipur Univesity - jaipur
  • For Chandigarh university - chandigarh
  • For IIT Bombay - mumbai
  • For IOP Bhubaneswar - bhubaneswar

18
In VECC Cluster / Grid Machine two types of nodes
One Interactive node- Just in present seen only
one Interactive node. After login in Grid
machine do ssh interactive001 And u can work
on grid machine itself But be NICE for
others. Other Computing type of nodes- Here
6 Computing nodes (node001 to node006). You
cannot login to these nodes. But you can compute
your jobs there. But you can use these for
Batch mode for computing not in Interactive
mode. I will explain this Online by condor batch
scheduler you can fulfill your requirement
19
How to throw job on Grid Machine
After login in Grid Machine for Job scheduling -
condor is job scheduler I will show this
Online. It is very light weight process and runs
few threads in background, Write your program
as you write make every thing ready. Now for
compilation you have to use some Condor
facilities. Compile with help of
condor_compile condor_compile ltwrite your
compilation line as earliergt Eg.
condor_compile cc o myprogram.C Now we have to
execute or run our compiled or object file. Here
we have to do little work for making Submit
Description File eg sample.cmd. In this file we
have to write some simple lines. executable
myprogram input sample.in output
sample.out error sample.log log
sample.log queue condor_submit sample.cmd
20
For Getting Help for all this
  • All Condor commands are like the Linux commands
    so every command which you can use has (man)
    manual pages.
  • Just do
  • Man condor_compile
  • Man condor_submit
  • Man condor_q
  • Man condor_status
  • Man condor_rm

2nd Methods - Through WWW / Web This one I will
show online.
21
Future Work
For doing Day One Physics
Learn MPI (Massage Passing Interface) for writing
code in parallel Language of facility
Query and Discussions Suggestions
Write a Comment
User Comments (0)
About PowerShow.com