Title: Grid Computing: an introduction
1Grid Computing an introduction
- José C. Cunha, DI-FCT/UNL
2Distributed and Parallel Computing
- Distributed Computing
- Parallel Computing
- Grid Computing
3(No Transcript)
4Distributed Computing
- Physically distributed computations and data
- Goals
- Adapt to geographical application distribution
- Provide appropriate levels of transparency
- Geographical distribution (LAN or WAN)
- Users / Access / Processing / Archiving Sites
- Availability and Reliability
- Fault tolerance / Redundancy
5Transparency
- Depends on the layer
- Failure
- Communication (message,RPC,memory)
- Design choices can be revised
- Interactions events, uncertainty, causality
- Loose / tight interactions / collaboration
- Pessimistic / Optimistic Choices (DBs)
- Sometimes there is no choice
- mobility, disconnected operation
6Transparency and Virtualisation
- Transparency and Awareness
- The concept of transparency has been revised as
time passes - Raw hardware, Assembly, High-Level Languages,
etc....Operating Systems,...., Text editors and
processing tools.... - The Grid is one of the current revisions....
7(No Transcript)
8Parallel Computing
- Goal to reduce execution time, compared to
sequential execution. - Computer System Architectures
- Supercomputers
- Shared / Distributed memory multiprocessors
- LANs and Clusters of PCs
- Parallel Programming requires
- Decompose application in parts
- Launch tasks in parallel processes
- Plan the cooperation between tasks
9(No Transcript)
10In the 2006... There are increased reasons to
exploit Parallelism
11 Classical Application Areas
- Science and Engineering
- Fluid Dynamics
- Particle Systems in Physics
- Weather Forecast and Climate
- Simulation of VLSI systems
- Parallel Search in Databases
- Artificial Intelligence
- ...
12Great Application Successes
The development of scalable massively parallel
computers was motivated largely by a set of Grand
Challenge applications (courtesy Prof. David
Walker)
- Climate modelling to understand the Earth's
climate and predict future changes - Computational fluid dynamics to design aerospace
vehicles and cars - Numerical turbulence to develop realistic fluid
and particle simulations of plasma turbulence to
optimise performance of fusion devices - Rational drug design to discover / design new
drugs with simulations of macro-molecular
structure
13- But the application profiles have changed
14Evolution ofApplication Characteristics
- Complex models simulations
- Large volumes of input / generated data
- Difficult interpretation and classification
- High degree of User interaction
- Offline / online data processing / visualisation
- Distinct user interfaces
- Computational steering
- Multidisciplinary
- Heterogeneous models / components
- Interactions among multiple users /
collaboration - Require parallel and distributed processing
15Heterogeneous Components
- Sequential, Parallel, Distributed Problem Solvers
(simulators, mathematical packages,etc.) - Tools for data / result processing,
interpretation and visualisation - Online access to scientific data sets and
databases - Interactive (online) computational steering
16(No Transcript)
17Ambitious application requirements
- Distinct operation modes (offline/online)
- Distinct user interfaces
- User / Agent driven control
- Dynamic modification of operation modes and
interactions - Multiple users concurrently join ongoing
experiments with distinct roles (observers,
controllers)
18Complex cycle of user activities
- Problem specification
- Configuration of the environment
- Component selection (simulation, control,
visualisation) and configuration - Component activation and mapping
- Initial set up of simulation parameters
- Start of execution, possibly with monitoring,
visualisation and steering - Analysis of intermediate / final results
19- Requirements
- To meet more complex applications
- To ease the cycle of application development,
deployment and control - To integrate heterogeneous components into an
environment - To allow transparent access to parallel and
distributed resources - To support collaborative modeling and simulation
20Problem-solving perspective
- Integrated environments for solving a class of
related problems in an application domain
21Problem-Solving Environments
- A different approach
- -- specific methods for each problem domain
- are encapsulated in components (libraries,
packages, class OO repositories) - -- development and runtime support tools
- are also made available.
- Application components and computational tools
are integrated into a single unified environment
(PSE) - Easy-to-use by the end-user
22PSE Functionalities
- Support for problem specification
- Resource management
- Execution support services
23PSEs Users and Developers
- End-users (scientists,engineers,etc.)
- Solve a particular problem in a specific
application domain - Perform experiments
- PSE Developers
- Develop new algorithms and techniques
- Integrate them into components and place them in
component repositories - Develop tools to support problem specification
and application composition - Develop tools to help the user choose the best
solutions and to locate the resources - Use the services and interfaces built by the
System Developers
24PSEs Problem Specification / Solution
- Use either
- Visual programming environment to link software
components - High-level language specification
- Recommender systems can be used to help user
choose best way to solve problem and locate
required software - They are very important to enable use of a
complex computing environment
25Software Components
- Components specified in terms of their input and
output interfaces - User ignores internal details of components
- Components can be interconnected within a visual
development environment
26Plug and Play Components (courtesy Prof. David
Walker)
- Can link the output of one component to the input
of another. - Store components in a repository.
See Triana for example http//triana.co.uk/
27An Example
28Impact of PSEs in many areas (1990-1999-2000...)
- Fully developed PSEs in the Industry, e.g.
Automotive, Aerospace - Many applications in Science and Engineering
- Design optimisation
- Application behavior studies (parameter sweeping)
- Rapid prototyping
- Decison support
- Process control
- Emerging areas Education, Environment, Health,
Finance - A new profile of end-user, beyond the scientist
and engineer
29(No Transcript)
30Computer technology advances
31Storage evolution (Carl Kesselman)
- Storage density doubles every 12 months
- Dramatic growth in online data (1 petabyte 1000
terabyte 1,000,000 gigabyte) - 2000 0.5 petabyte
- 2005 10 petabytes
- 2010 100 petabytes
- 2015 1000 petabytes?
- Transforming entire disciplines in physical and,
increasingly, biological sciences etc.
32Networks evolution (Carl Kesselman)
- Network vs. computer performance
- Computer speed doubles every 18 months
- Network speed doubles every 9 months
- Difference order of magnitude per 5 years
- 1986 to 2000
- Computers x 500
- Networks x 340,000
- 2001 to 2010
- Computers x 60
- Networks x 4000
Moores Law vs. storage improvements vs. optical
improvements. Graph from Scientific American
(Jan-2001) by Cleo Vilett, source Vined Khoslan,
Kleiner, Caufield and Perkins.
33Enabling factors
- The Internet
- Broadband communications (eg optical-based)
- Faster processors / HPC using standard / open OS
- World Wide Web infrastructure and services
34Major Phases
- Networking TCP/IP
- Communications Internet and e-mail
- Information the Web
- Computing the Grid
35Computing milestones
- Mainframes, time-sharing, Unix, minicomputers
1960-70 - PCs, commercial Unix, Crays, Workstations, MPPs
1980s - Clusters, PVM, Linux, PDAs, Open Source, P2P
1990s - Globus Project, 1G Grids 1995-2000
- 2G Grids, OGSI/OGSA, 3G Grids 2000-05
- Mainframes, 1960s focus on efficient and
exploitation of shared resources, virtual monitor
concept, time-sharing - Minicomputers, microcomputers, desktops,
1970-80s large dissemination of computing power - Client-server computing, 1990 distributed
functionality to the endpoints, the clients - Developments on networks and interconnects,
1980-90s rise of commercial Internet - Now, is the Grid time
36Communication milestones
- Packet switching, e-mail, ARPAnet, LANs/Ethernet,
TCP/IP 1960-70-80 - Internet era, broadband, WWW, wireless 1990s
- Fibre channel, Gigabit Ethernet, Web services,
XML 1995-2000 - Internet from a military Project DARPA to
academic NSF projects - 1969 ARPAnet had 4 nodes
- mid-1970s around 30 university, military, and
gov. sites - 1974/78 TCP/IP Transmission Control Protocol /
Internet Protocol - 1983 ARPAnet had hundreds of nodes
- NSFnet, 1980s scientific communication network
to access NSF supercomputer centers - mid-1980s NSFDARPA joined efforts IETF,
Internet Engineering Task Force shaped modern
Internet
37Communication milestones
- WWW mid-late 1980s, goal to share information
- HyperTextMarkupLanguage HTML a standard to
create/organise docs - HyperTextTransferProtocol HTTP, browsers and
servers, to link and access docs online,
transparently - W3C(onsortium) mid-1990s new standards for
information interchange (XML etc) - SONET/DWDM (Synchronous Optical Network/Dense
Wavelenght Division Multiplexing) optical
technology, late 1990s, early 2000s - - Provides broadband connectivity and services
at reasonable prices - - Corporate WANs at 155 Mbps vs USA 56Kbps in the
mid-1980s - - at 2.5 Gbps since mid-1990s
- - OC-768 (about 40 Gbps)
- - A single fiber in the range 1 Tbps using
high-density DWDM but still out of reach to
individual organisations
38Communication milestones
- Recent past
- Theoretical WAN performance doubled every 9-12
months, supported by optical technology - But Commercial user-available bandwidth (BW) has
grown at a much slower rate -
39How about the theoretical maximum communication
speed?
- 1. In general, the max. available speed is not
affordable by the end-user - Communication speed/cost will continue to
increase - But quality high-speed BW will hardly ever be
free - Providers must react to the continuous pressure
for more multiplexing of wavelenghts in DWDM
products - And profits are in order
- the individual end-user will hardly afford the
theoretical performance - Cf actual cost of a bottle of mineral water
-
- 2. Plus the effects of the Overheads due to the
communication protocol layers - 3. And also It all depends on the application
profile Is it CPU-bound or I/O-bound? - A grid job splits into multiple components which
are spread on the grid - ? need to locate one another
- ? to establish communication connections
- ? to send data
40History of sharing
- 1965 MIT Multics operating system (multi-user
time-sharing system) - A computer facility should operate like a power
company or water company - Late 1960s-early 1970s when computers were first
linked by data communication networks, the ARPA
net supported early experiments on exploiting
unused remote machine cycles - 1973 Xerox PARC worm program replicated itself
in about 100 Ethernet-connected computers - Each worm used idle resources to perform a
computation - Could replicate and send clones to other nodes
- Since 1990s parallel and distributed computing
- Widely available PCs and workstations
- High-speed networks such as Gigabit Ethernet
- Clusters for HPC
41History of sharing
- Clusters motivated interest in
- Aggregating distributed resources to solve
complex problems via parallel computing and also
to support reliability via redundancy - 2002 NSF installed the TeraGrid transcontinental
(virtual) supercomputer set up HPC clusters at 4
sites (NCSA /ANL Illinois, and Caltech/SDSCS
California) - aimed at problems with requirements in the TFLOPS
range
42(No Transcript)
43Modern applications demanding more ambitious
goals
- Enable heavy applications in science and
engineering - Complex simulations with visualisation and
steering - Access and analysis of large remote datasets
- Access to remote data sources and special
instruments (satellite data, particle
accelerators) - distributed in wide-area networks, and
- accessed through collaborative and
multi-disciplinary PSE, via Web Portals.
44(No Transcript)
45The Grid
- Treat CPU cycles and software like commodities.
- Enable the coordinated use of geographically
distributed resources in the absence of central
control and existing trust relationships. - Computing power is produced much like utilities
such as power and water are produced for
consumers. - Users will have access to power on demand
- When the Network is as fast as the computers
internal links, the machine disintegrates across
the Net into a set of special purpose appliances - Gilder Technology Report June 2000
This slide is courtesy of Professor Jack Dongarra
46US Software Infrastructure, 1998
The Grid is a computational and network
infrastructure providing pervasive, uniform, and
reliable access to distributed resources.
- Globus provides core services for grid-enabled
computing http//www.globus.org/
47Concept of a Grid
- Gathers a diversity of resources, distributed at
large-scale - supercomputers and parallel machines, and
clusters - massive storage systems
- databases and data sources
- special devices
- Provides globally unified access to virtual
resources - Transient to support experiments
- (computation, data, scientific
instruments) - Persistent
- (databases, catalogues, archives)
- Collaboration spaces
48What is a Grid Computing System
- A virtualised computing environment
- Enabling dynamic runtime selection, sharing,
aggregation of geog distributed autonomous
resources - Based on the availability, capability,
performance and cost - Based on an applications or organisations
requirements - Relies on a highly interconnected networking
infrastructure
49(No Transcript)
50Related concepts
- Virtualisation
- IBM allows several OS to run simultaneously on
one large computer (VirtualMachineMonitor) - Generic approach to
- Allow logical access to types of remote,
heterogeneous, and distributed resources - As if they were a single larger homogeneous
resource, locally available - Applies to computation, storage, and network
resources and to any other LOGICAL RESOURCE - Dynamically adjust resource mappings to match
application demands
51Virtualisation
- The logical functions of the server, storage and
network resources are separated from their
physical functions and representations
(processors, memories, I/O devices, switches). - Resources are aggregated into pools
- Elements from the pools are allocated,
provisioned, managed, manually or automatically,
to meet application demands
52Virtualisation examples
- Processes
- Server
- Network
- Storage
- Data center groups of servers, storage, and
network resources can be reallocated on the fly - Software resources
53Cluster computing
- Aggregate processors locally in parallel-based
configurations, integrate them and provide access
as a single unified resource - Central resource manager and scheduler
- Centralised control and knowledge of system and
user states - Typically owned by single organisation
54Cluster vs Grids
- Clusters focus on datacenter, single
organisation - Grids focus on geo distributed multiorganisation
utility-based (outsourced) networking
55Changing perspectives - Grid Views
- The Grid. Use distributed hardware and software
infrastructure ? reliable, pervasive, inexpensive
access to computational resources irrespective of
physical location or access point. - The Consumer Grid. Services and resources
anywhere. Issues of dynamic resource discovery,
trust, and digital reputation. - Application Service Provider. Provide or sell
computational or data services via Web. - Virtual Organisation. Group of people or
institutions with some common purpose that need
to share resources .
56Grids Towards uniform and standard large-scale
computing environments
- Analogy to the Electrical Power Grid
- Simple local interface
- Transparency
- Pervasive access
- Secure
- Dependable
- Efficient
- Inexpensive
- The Computational, Data, and Interaction Grids
- Not really true (yet!?)
57The Transparent Grid
- Transparency The user is not aware (and doesnt
care) what computing resources are used to solve
their problem - Similarly, in an electrical grid we ignore the
source of the power
- Heterogeneity
- Resource discovery
- Scheduling
Distributed computing issues
58(No Transcript)
59EGEEEnabling Grids for E-science in
Europewww.eu-egee.orgEU IST project
60The Grid metaphor
61(No Transcript)
62the future Grid!
63The Pervasive Grid
- Pervasive The Grid can be accessed from any
networked device, eg, laptop, mobile phone, PDA,
etc. - In electrical analogy, any appliance can access
power through a standard interface, eg, a wall
socket.
- Standard interfaces
- Protocols
- Legacy software
64The transparent grid access
65Grid is an evolving field
- Multiple views, perspectives
- Concepts, models and architectures still being
defined and tested - Applications still emerging
- Wide variety of interests
66The main questions
- Grid benefits, challenges, status and directions
- Grid architectures
- Portal and UI, User and node security, Brokers,
Schedulers, Data managers, Job and resource
managers - Standardisation efforts
- Architecture OGSA/OGSI (Open Grid Service
Arch/Infrast) - Execution Models Workflows, Events, Transactions
- System services Security, Monitoring, Billing
and Accounting, Implementation (Globus Toolkit) - Economics
- Grid deployment
- Local, national, and global grids
67Applications and benefits
- The Grid can be seen as an evolution of
- Parallel and Distributed computing
- The Web
- And Virtualisation concepts
- As such, the Grid will probably improve existing
application types, and will enable new types of
applications
68Applications example
- Virtual access to special instruments
- electron microscopes, particle accelerators, wind
tunnels, - coupled with remote supercomputers, DBs,
- to enable
- interactive use,
- online scenario comparisons,
- and collaborative data analysis
69Applications example
- Virtual access to distributed supercomputing
- For complex computations
- Migrate CPU-bound operations to more powerful
remote computing resources supported by large
virtual supercomputers, assembled to solve
problems too large to fit on a single computer
system
70Applications example
- Collaborative engineering
- Design of complex systems
- Based on highly interactive environments
- Relying on high-bandwidth access to shared
virtual spaces, supporting - Interactive manipulation of shared datasets
- Management of complex simulations
71Applications example
- Parameter studies
- Rapid, large-scale parametric studies
- A single program is run many times
- To explore a multidimensional parameter space
72Summary Grid applications
- Distributed supercomputing for Computational
science and Engineering - High-capacity throughput large-scale
simulation/chip design, and parameter studies - Content sharing digital contents
- Data-intensive drug design, particle physics,
stock prediction, etc. - On-demand real-time medical instrumentation,
mission-critical - Collaborative e-science, e-engineering, design,
data exploration, education, e-learning - Remote software access/renting services (ASP and
Web services) - Utility/service-oriented computing
73Question Is this just an academic exercise? No!
- Real applications needs
- Solve new or larger problems by aggregating
available resources at large-scale - for bigger, longer experiments, and more accurate
models - Easier access to remote resources
- a large diversity of computation, data and
information services - Increased levels of interaction for increased
productivity and capability to analyse and react - enable coordinated resource sharing and
collaboration across virtual organisations
74Applications and User Profiles
- Computational Grids
- provide a single point of access to a
high-performance computing service - Scientific Data Grids
- Access large datasets with optimized data
transfers and interactions for data processing - Virtual Organisations and Interactions
- Access to virtual environments for resource
sharing, user interaction and collaboration - Real-time interactions for decision support
- Information and Knowledge services
- Access large geographically distributed data
repositories, e.g. for data mining applications
75Grid benefits
- Resource sharing
- Transparent access to remote resources
- Efficient exploitation of resources, reduce
execution time large-scale data processing,
support load smoothing across the network,
exploit time and work differences - Enable the concept of a virtual data center
- Access to remote DB and software
- Reduce the local services needed
- On-demand aggregation of resources, to meet
dynamic needs (including real-time response) - Fault-tolerance and dependability
76Ultimate goal
- Allows an organisation to
- Integrate and share heterogeneous pools of
resources (physical and logical) - Presenting them as one large, cohesive, virtual,
transparent computing system - In order to deliver agreed services at specified
levels of quality (application functionality,
efficiency and performance)
77Grid mechanisms
- To enable online discovery and access to
distributed resources - And online collaboration
78Grid ideas
- Internet a network of communication
- Grid a network of cooperation / computation
- Grid relies on the ability to negotiate
resource-sharing among partners (providers and
consumers) and using the resulting resource pool
for some specific application goal
79Grid Views
80View - Computational Grids
- Service-oriented view
- Netsolve an example
81 View Grids as Frameworks for Application
Service Providers
- Application Service Provider. Provide or sell
computational services via web interface. - Provide remote services such as compute cycles,
specific applications, or storage. - Selective outsourcing certain functions are
performed remotely. - Application hosting remote sites act as
application servers. - Browser-based computing online applications
accessible through web site. - (Courtesy Prof. David Walker)
82An Example NetSolve as a Scientific ASP
- A client-server system for remote solutions of
complex scientific problems - On request performs computational tasks on a set
of servers - Searches for computational resources on a
network, chooses the best one available, and
returns the answers to the user. - Based on agents or resource brokers
- Developed by Professor Jack Dongarra and
colleagues at University of Tennessee, Knoxville
83NetSolve The Big Picture (David Walker)
Client
Schedule Database
AGENT(s)
Matlab Mathematica C, Fortran Java, Excel
S3
S4
S1
S2
C
A
84Data grids
- Aggregate underused/unused storage
- Into a larger virtual data store
- For improved performance and reliability and for
increased capacity
85- Storage
- a file or a DB can span multiple physical
devices - a unifying distributed file system can solve
this problem - storage hierarchy
- - primary (attached to a CPU)
- - secondary (in hard disks such as RAID)
- - tertiary (in near-real-time accessible media
as tape ) - --- distributed
-
- Using mountable network file systems
- as Network File System (NFS), Distributed File
System (DFS) or - General Parallel File System (GPFS)
- DB management software can federate a group of
individual DBs and files to build a larger DB
86- Grid file systems can manage automatic file or
data sets replication - for performance and reliability
- Applications may require different semantics for
synchronous replication of data files and so
require specific data placement decisions
exploiting locality of access this may
critically affect the resulting performance - ? an intelligent grid data scheduler can
consider, not only the computational requirements
of an application but also its data requirements,
based on usage patterns and replication needs - and then can schedule jobs closer to the
data - and/or on processors with direct SAN access to
storage devices - ? Need to revise traditional scheduling
strategies and models typically based on
computational requirements only -
87View Scientific Data Grids
- EU DataGrid projects
- Large-scale environment for accessing and
analysing large amounts of data - High energy physics, Biology, Earth observation
- Petabytes of data (1 000 000 Giga)
- Thousands of researchers
- Scalable storage of datasets replicated,
catalogued, distributed in distinct sites
88Distributed Computing Grid Experiences in CMS
Data Challenge
A.Fanfani Dept. of Physics and INFN, Bologna
- Introduction about LHC and CMS
- CMS Production on Grid
- CMS Data challenge
89Large Hadron Collider LHC
bunch-crossing rate 40 MHz
?20 p-p collisions for each bunch-crossing p-p
collisions ? 109 evt/s ( Hz )
90CMS detector
91CMS Data Acquisition
Bunch crossing 40 MHz
1event is ? 1MB in size
? GHz ( ? PB/sec)
Online system
Level 1 Trigger - special hardware
- multi-level trigger to
- filter out not interesting events
- reduce data volume
75 KHz (75 GB/sec)
100 Hz (100 MB/sec)
data recording
Offline analysis
92CMS Computing
- Large amounts of events will be available when
the detector will start collecting data - Large scale distributed Computing and Data Access
- Must handle PetaBytes per year
- Tens of thousands of CPUs
- Tens of thousands of jobs
- heterogeneity of resources
- hardware, software, architecture and Personnel
- Physical distribution of the CMS Collaboration
93CMS Computing Hierarchy
1PC ? PIII 1GHz
? PB/sec
? 100MB/sec
Offline farm
recorded data
Online system
- Filter?raw data
- Data Reconstruction
- Data Recordin
- Distribution to Tier-1
CERN Computer center
Tier 0
?10K PCs
. .
- Permamnet data storage and management
- Data-heavy analysis
- re-processing
- Simulation
- ,Regional support
Italy Regional Center
Fermilab Regional Center
France Regional Center
Tier 1
?2K PCs
? 2.4 Gbits/sec
. . .
Tier 2
- Well-managed disk storage
- Simulation
- End-user analysis
Tier2 Center
Tier2 Center
Tier2 Center
?500 PCs
? 0.6 2. Gbits/sec
workstation
Tier 3
InstituteB
InstituteA
? 100-1000 Mbits/sec
94View - Virtual Organisations
- Resource sharing and collaboration between
dynamically changing collections of individuals
and organisations - e.g. Consortium of companies collaborating in a
design of a new product - Sharing design data, Collaborative simulations,
etc - e.g. Scientists collaborating in common
experiments via a distributed virtual laboratory
95Example Collaborative Immersive Visualisation
- Scientific simulations, experiments, and
observations generate vast amounts of data that
often overwhelm data management, analysis, and
visualization capabilities. - Observer appears to be in the same space as the
visualised data and can navigate within the
visualisation space relative to the data. - Important in interpreting and extracting insights
from the data. - Several observers can co-exist in the same
visualisation space - ideal for remote
collaboration. - CAVE a fully immersive environment. Systems with
stereoscopic projections onto 3 walls and the
floor. - ImmersaDesk or stereoscopic workstation projects
stereoscopic images onto a single flat panel
display.
96CAVE
97(No Transcript)
98Virtual organisations (VO)
- A set of entities (individuals and institutions)
defining a set of resource sharing and access
rules - Highly controlled sharing
- What is shared
- Who is allowed access
- Conditions to allow such sharing
99Keys
- Resource sharing and problem-solving in dynamic
multi-institutional VOs - Service providers
- Application
- Storage
- Machine-cycles (computation)
- Collaboration in industry consortia
100Commercial, IT, data center applications
- First grid generations had limitations namely for
database interoperability - This has motivated approaches for
business-centric solutions, developed by
commercial software and DB suppliers
101Commercial and financial
- Enabling
- Data-mining, pattern-detection, scenario-modeling
processes - Applied to banks, credit card processing,
financial institutions - Improve the financial transaction flow, better
understanding of customer profitability, and risk
modeling done in real time (knowledge-based
analysis and simulation are common in financial
firms)
102Financial applications
- Instead of
- Manually subdivide algorithms
- Run them on separate machines
- Manually merge and integrate the results
- Exploit grid tools to the same, more or less
automatically, in a virtualised environment
103Business goals
- Improve
- Utilisation
- Responsiveness
- Reduce IT costs
104Traditionally
- Business applications
- Dedicated platforms of servers and storage
devices associated to each server - Not able to share resources
- Not exploiting abilities to predict, anticipate,
and exploit expected levels of processing loads - Design for excess capacity to handle excess peak
loads - Higher overall costs
105Virtualisation of resources
- Exploit
- Synergistic integration
- Economies of scale
- Load smoothing
- Due to the sharing and aggregation of distributed
resources - And the delivery of services in a highly
transparent way to the end-user - Several solutions
- Dedicated local Clusters
- Grids
106Cost savings
- Cluster computing
- Aggregating processors in parallel-based
configurations - Cost reductions in IT costs and costs of
operations, confirmed. - Enterprise grids
- Middleware-based to exploit unused CPU cycles ?
avoiding growth/expansion costs - Expected savings.
107Expectations
- 2005-06 the Grid will become commercially viable
- Early adoption for enteprise applications, at
single-site and multi-site - Exploitation of solutions from Web services and
utility computing - By 2005, significant 50 of companies were
already aware of the IT utility model for
outsourcing (IT services from Service Providers
as a commodity) - A significant of companies have some sort of
utility computing and a significant of IT
services are being delivered from offshore
centers - Uncertainties remain about cost, security, and
integration with existing IT systems
108Grid for entreprises
- Obtain computing services over networks from
remote Service Providers - Aggregate an organisations dispersed set of
independent resources into one unified single
virtual environment
109Data grids
- Connection
- connect DBs at different locations in a single
company - Significant savings in finding information ?
staff efficiency gains - Requires large investment in broadband links
to connect remote data centers -
110Cluster Computational Grid
- Processing power for HPC
- Big saving in processing time ? efficiency and
savings in RD costs - No initial impact on broadband until cluster
computing evolves to an enterprise grid
111- Cluster/Local Grid
- few homogeneous processors connected in a data
center on a LAN or SAN(StorageAreaNetwork) ? more
a cluster than a grid - under the same OS and a central administration
- Enterprise or IntraGrids
- heterogeneous processors and OS, geo distributed
and interconnected by Intranet links (or
high-quality high-throughput, high-security
communications) - owned by different departments of a single
organisation - may be structured as a hierarchy cluster of
clusters
112Enterprise Grid
- Processing power connection within a
single company, links RD centers at different
geo locations - Efficiency due to processing power access to
data - Savings on RD times and time-to-market
- Investment in broadband links require very high
speed due to large amount of data transmitted
113Partner Grid
- Processing power Connection for multiple
companies - Savings in design time and RD time, and
time-to-market - More efficient collaboration between partners in
a supply chain relationship - Significant investment in secure,
high-performance, broadband links between the
companies
114- Enterprise Grids
- require policies and operations to control
actual use of grid resources, based on
priorities, and kinds of applications - also requiring security control across distinct
departments - Global Grids
- crosses organisation borders
- more critical security
- allows sharing, trading, brokering resources
over global pools -
115Web Services
- Provide secure Internet access to new services
for consumers and business - Closely develops with cluster and data grids
- Big gain in productivity
- savings in cost of offering services and
time-to-market new services - requires a data grid-like structure to provide
rapid updating of information - Large spending on broadband to link data centers
- Significant spending on software and integration
services - Example Bank of America over the Internet
116How to evolve to a Grid?
- Transform individual components (computers,
storage, networks) into aggregated and virtual
pool of resources, to be allocated and monitored
automatically - Provide defined business services on the basis of
specified goals and priorities develop and
automate policies and service-level objectives to
manage the needed applications and resources - Build an enterprise grid infrastructure and use
open-source and vendors proprietary tools - Enable these tools to comply with new standards,
and combine components together.
117How?
- Concept of outsourcing
- Delegate the provision of a service in an
external reliable and trusted supplier - Install the concept of utility computing
- ? Expected as a major trend in the 2010s
- Virtualisation of resources
- Dynamically manage and adjust a logical pool of
resources and their mappings to share the
physical infrastructure
118Virtualisation without limit
- ? Application software and licenses
- Specific business software may be installed on a
few designated grid processors and be shared
among clients. - eventually limiting the nº of current users
? virtual licenses - Cf vs installing the same licensed software in
thousands of servers
119Grid requirements include
- Online negotiation of access to services who,
what, why, when, how - Establishment of applications and systems able to
deliver multiple qualities of service - Autonomic management of infrastructure elements
- Dynamic formation and management of virtual
organisations - Open, extensible, evolvable infrastructure
120(No Transcript)
121More Complex Applications and Environments
- Large number of components
- Complex interactions
- Dynamic configuration
122Software Engineering Challenges
- Suitable levels of flexibility in all stages of
the software lifecycle - Application specification and design
- Program transformation and refinement
- Simulation
- Code generation
- Configuration and deployment
- Coordination and control of the execution
123Issues - 1
- Clear separation and representation of concepts
- Computation and interaction
- Structure and behaviour
- Specification of multiple components
- Enabling alternative mappings
- Varying degrees of automated processing
- Supported by pattern and template repositories
with relevant attributes
124Issues - 2
- Mapping the programming models into the
underlying computing platforms - Interacting with resource descriptions and
discovery services - For flexible configuration and deployment
- Coordination of distributed execution
- Allowing workflow descriptions
- With adaptability and dynamic reconfiguration
125Component Based Development /Software
Architecture
Repositories (Skeletons/Templates/Patterns)
Abstract Description Language
specify, design, compose
For structure, behaviour, computation, and
interaction
Mappings
verify, analyse, evaluate, predict
Programming Levels (Models)
Resource Description and Discovery
Deploy and Configure
Grid Execution Environments
control, coordinate execute, reconfigure
Methodology
126Global conceptual layers
- Software architectures
- Coordination models
- Resource management
- Execution, monitoring and control
- Support infrastructures
127(No Transcript)
128 1 - Software Architectures
- Specification of components, their composition
and interactions - Modeling and reasoning on global structure and
behavior - Specification languages
- for structure and behavior
- incremental refinement and dynamic composition
1292 - Coordination models
- Represent and manage interaction patterns among
components - Communication and cooperation models
- Consistency guarantees
- Abstract, logical, dynamic organisation models
- Dynamic application structure, interaction
patterns and operation modes
130Handle dynamic characteristics
- Looking at the past
- Fault tolerance, Load balancing, Task spawning
- At present and in the future
- Changes in the configuration and availability of
resources, variations of characteristics and
behaviour - Changes at the application level user control of
a dynamic experiment - Flexibility to build PSEs
- Mobility of agents and devices
1313 - Resource management
- Configuration of parallel and distributed virtual
machines - Resource discovery, scheduling, and reservation
- Execution and monitoring at local and large
scales - Quality of service
132- Need to be fair and efficient in
- locating software resources
- negotiating for use of resources
- scheduling components on distributed resources to
achieve - Minimum execution time
- Maximum throughput
- Need to be able to monitor resource usage and
level of availability - Need of Resource Specification Languages
- A difficult problem in dynamic environment.
133New challenges
- New problem-solving strategies with adaptive
behaviour - Awareness to Quality of Service factors
- Management at intermediate layers
- By intermediate agents planners
- Contract negotiation
- Dynamic revision of plans
- Reconfiguration
- Specify, compose, develop, understand dynamic
distributed large-scale applications models,
languages, and tools
134Two Views of Components
- A component as an executable that runs on a
certain specified machine. - A component can be viewed as a contract. It says
If you give me these inputs then Ill give you
these outputs.
- In the second case the component is not tied to
any particular executable. Problem specification
is separate from service provision.
135Binding Service Requests to Resources
- In a fully transparent system the scheduler would
decide where components execute based on - Availability and performance of resources
- Cost and time constraints
- This is a hard problem.
- Possible solution is to supply hints about
where it can run, eg, in a components XML
specification.
136High Level View of Network Computing
- Services are advertised on the network
- A service typically consists of
- A component that actually provides the service,
and - An agent that mediates access to the service.
- Scheduler must be able to locate services and
then schedule use.
137Service-oriented architecture
- Defines how two entities interact so that one
performs a unit of work to the other - - the unit of work is a service
- - service interactions are defined in a
description language - - each interaction is self-contained and
loosely-coupled and independent of other
interactions - - applications are assembled as collections
of services, each with different functions - and are exposed as services on the network, to
be (re)used - . different users can communicate with the
services differently - an intermediate layer between providers and
consumers - - building applications is
- to identify required components, find them,
glue them together
138Service Providers and Brokers
- NetSolve is an example of an ASP providing
numerical software, still limited to
client-server style. - Trend to network-based computing paradigm.
- Nodes offer sets of computing services with known
advertised interfaces. - Software seen as a pay-as-you-go service
rather than a product that you buy once - ?Computational Economies
- Open Service-Oriented Architectures
- Shifting paradigms to master-slave and more
tight cooperation models
139Grids Key components
- Resource management
- Security
- Data management
- Services management
140Grid types
- Space scale Local, metropolitan, regional,
national, global - Time scale logically aggregate resources for
long or short periods of time - Crossing borders Resources can span a single or
multiple organisations, or a service provider
space
141Very complex systems
- Aim at providing unifying abstractions to the
end-user - Large-scale universe of distributed,
heterogeneous, and dynamic resources - Critical aspects
- Distributed
- Large-scale
- Multiple administrative domains
- Security and access control
- Heterogeneity
- Dynamic
142Layers of a Grid Architecture
- User Interfaces, Applications, PSEs
- Programming Models, Development Tools and
Environments - Grid middleware Services and Resource
Management - Heterogeneous Resources and Infrastructure
143Elements of a Grid Architecture
- Applications, User interfaces, Grid portals and
PSEs - Models, tools and environments for application
composition, programming and deployment - Grid operating environment (middleware)
- Services and resource management, discovery and
scheduling - Information registration and querying
- Authentication, Security
- Computation, data management, and communication
- Monitoring, Quality of Service
- Heterogeneous resources and infrastructure
144Grid tools(1)
- Infrastructure include hardware and software
components (file systems, resource managers,
messaging systems, security applications,
certificate authorities, file transfer
mechanisms) - Middleware software plug-ins that facilitate
using the Grid - open source Globus GT 3 - first implementation
of OGSI, as a set of services and software
libraries - based on a security model plus a mechanism
for hierarchically collecting data about the grid - includes support for
- security
- information infrastructure
- resource management
- data management
- communication
- fault detection
- portability
145Grid tools(2)
- Directory services to discovery available
services, to define and monitor the grid topology - generally based on the Lightweight Directory
Access Protocol LDAP - and Domain Name Server (DNS)
- Schedulers and load balancers ensure job
completion under priority, deadline or urgency
constraints and distribute tasks and data across
systems to reduce the chance of bottlenecks - Developer tools for file transfer,
communications, environment control, ranging from
utilities to APIs - Security authenticate and authorise, control
who/what can access a grids resources. Includes - message integrity
- message confidentiality
-
146Grid architecture concepts
- Influenced by the Globus Toolkit
- a de facto standard for security, info.
discovery, resource data management,
communication, fault detection, and portability - Driven by the Global Grid Forum (GGF)
- An industry advisory group for community-driven
development of new standards - Grid architectures heavily dependant on former
Internet protocols and services (for
communication, routing, name resolution..)
147Grid logical hierarchy
- L1Grid fabric resources (computers, storage,
networks, special devices) -gt managed by a local
RM with a local policy, and interconnected in
LAN, MAN, or WAN - L2Security infrastructure (authenticate secure
connectivity access to resources) - L3Core Grid middleware (job management, storage
access, accounting) ? uniform access to the
fabric resources, and hides partitioning,
distribution, and load-balancing - L4 User-level middleware resource aggregators
(scheduling services and resource brokers) - L5 Grid programming environments and tools
(languages, libraries, compilers, and support
tools) - L6 Applications (commercial, scientific,
engineering)
148GGF layered architecture
- Fabric controlling things locally
- Connectivity talking to things
communication (Internet protocols) security - Resource sharing single resources
negotiating access, controlling use - Collective coordinating multiple resources
ubiquitous infrastructure services,
application-specific, distributed services - Applications putting things to work
149Critical Grid Issues
- Security When resources are shared across
organisation boundaries security is an important
issue. - Dependability The Grid must be robust and
resilient to failure. - Efficiency Resources should not be wasted, good
load balancing needed. - Cost For broad impact The Grid should be
inexpensive. - Portability Grid applications should be able to
run on a wide range of hardware.
150Functional perspective of a Grid
- a) Grid Portal UI
- interface to launch applications
- with transparent access to resources and
services - b) Grid Security
- b1) USERs view
- - provides authentication, authorization, data
confidentiality, data integrity, and
availability, from the users view - - a single sign-on run-anywhere uniform
authentication service - - a user job requires on-the-fly confidential
message-passing services - or may require a long-lived service
- - user must be allowed to check availability of
such security services
151- provide security across organisation borders
- with support for local control over access
rights and mapping - uniform authentication, authorization and
message-protection - with delegation of credentials for computations
involving multiple geo distributed resources - usually relies on public key technology
- b2) SYSTEMs view
- - the user needs to be authentication but remote
resources too! - - secure (authenticated and confidential)
communication between internal grid components - - a Certificate Authority establishes the
identity of users and grid resources
152- c) Broker and Directory
- users request to launch an application ?
- requires to identify suitable resources
- based on applicatÃons parameters
- ?
- -- informs about available resources and
working status - -- allows to define and monitor grid
topology/resources - ? supported by a Directory mechanism (LDAP
and/or DNS) - d) Scheduler
- to coordinate the concurrent execution of jobs
components - in a simple case
- - selection of suitable processor
- - grid request to send the job code and data to
the selected processor -
153- in general cases
- - a scheduler must dynamically react to grid
load - by getting measurement information obtained by
grid monitoring and resource management - scheduler strategies
- - simple round-robin (cf default PVM)
- - usually, try to find most appropriate
processor(s) - - hierarchical scheduling
- metascheduler submit a job to a cluster
scheduler - cluster scheduler manages a cluster as a
single resource and uses an internal scheduling
strategy
154- schedulers also monitor job progression
- - to automatically resubmit to other nodes, in
case of losses - - to check for job completion (eg with
timeouts) -
- some use a resource static reservation system
- -- a calendar-based mechanism (like in old batch
processing) - managing pools of resources
- - processors automatically report their
availability to grid management - ? allows reassignment of jobs to such
processors - - local nodes may report start of local NONGRID
work - ? forces node availability for grid work
- may originate umpredictable completion times
- -- suggests use of DEDICATED grid resources
155- e) Grid data management
- reliable and secure method for moving files and
data - f) Grid job/resource management
- Grid Resource Allocation (GRAM)
- f1) keeps track of grid available resources,
node capacities and current utilisation levels,
and of current grid users - ?passes this information to the Scheduler, for
deciding where to submit jobs - ? also uses this to monitor grids
unpredictable incidents outages, congestion - ? and for administration overall usage
patterns, statistics, log resource usage for
accounting purposes - f2) services to launch a job on a set of
resources - to check status
- to get results when job is complet