P1258758007iLDGX - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

P1258758007iLDGX

Description:

Gridware (Sun Microsystems) .Net Grid Computing. Avaki. Entropia. Platform Computing ... Then, the Reliability for Stage 1 can be obtained by ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: daiyua
Category:

less

Transcript and Presenter's Notes

Title: P1258758007iLDGX


1
CSCI 504Computer Organization Lecture 14
Basic Concepts ofGrid Computing its
Reliability(not included in Quiz 1)
Dr. Yuan-Shun Dai Computer Science 504 Fall 2004
2
Why Grid?
  • Computing System Development

Future
Large-scale resource sharing
C
C
C
Network
Multiple
Distributed Networked
Global-area collaboration
C
C
C
C
Open-service architecture
C
C
Distributed Computing
Parallel Computing
GRID Computing
3
GRID Computing
  • Next Generation of Computing Systems

Commercial Grid IBM Grid Gridware (Sun
Microsystems) .Net Grid Computing Avaki Entropia
Platform Computing
Possible Applications Life sciencesE-Business Mi
litary Usage Aerospace Financial
services Research DevelopmentGovernmentEducati
onand so on
Competition in Reliability
Highly Reliable Grid
4
GRID Structure
P1, R1,
P,
Programs
Program Layer
Resource descriptions
Resource sites
R,
Request queue
Interrequest RM
Request Layer
RMS
Resource access
Resource requests
Matches
P,
claiming
Global RM
Management Layer
Network
Resource Management System (RMS)
Resource offers
Matches
Network Layer
Access Control
Notations
PProgram RResource RMResource
Manager RMSResource Management System
Resources
Resource Layer
5
Failure Analysis of the Grid
  • Program layer Software Failure
  • Request layer
  • Block Failure
  • Time-out Failure
  • Management layer RM Servers Failure
  • Resource layer Resource Failure
  • Network layer Network Failure

6
Grid Computing
  • Stage 1 Matchmaking
  • RM servers down
  • Block failure
  • Time-out failure

7
Grid Computing
P1, R1,
P,
  • Stage 2 Executing
  • Network failure
  • Program failure
  • Resource failure

R,
P,
Network
PProgram RResource
8
Grid Reliability for RMS
  • Markov model for N RM servers

0
1
2
N-1
N
N-2
(k1,2,N)
  • Given that k RM Servers (k1N) are Reliable
  • Then, the Reliability for Stage 1 can be
    obtained by

9
Grid Reliability for RMS
  • Markov Model for Request Queue (M is length
    limitation)


1
0
M
M-1
k1
k
k-1
2
Arriving rate of a Grid service
X Number of Requests of an unknown service, a
Discrete r.v.
k Number of RM servers in working (k1,2,N)
completing rate of each request by one RM server
10
Grid Reliability for RMS
  • Markov Model for Request Queue (M is length
    limitation)


1
0
M
M-1
k1
k
k-1
2
(m1,,k-1)
(mk,,M-1)
And
11
Grid Reliability for RMS
  • Markov Model for Request Queue (M is length
    limitation)


1
0
M
M-1
k1
k
k-1
2
probability for the system staying at state m
(m0,1,,M)
  • Suppose the grid service under consideration
    contains H requests
  • Then

12
Grid Reliability for RMS
  • Given (m, k, H), the time to complete the mH
    requests is a r.v. with the p.d.f.
  • Due-time Td , then

13
Grid Reliability for RMS
  • Thus, the reliability model for the stage 1 can
    be obtained by

Y.S. Dai and M. Xie, Hierarchical Markov Reward
Model for Reliability Analysis and Optimization
of the Grid Computing System, International
Conference on Mathematical Methods in Reliability
(MMR), 2004, Santa Fe, USA.
14
Grid Reliability for Network
  • Executing Stage
  • Network, Programs and Resources

P1, R1,
P,
PProgram RResource
R,
Wide-Area Network
P,
1 Y.S. Dai, M. Xie, K.L. Poh, Reliability
Analysis of Grid Computing Systems, IEEE Pacific
Rim International Symposium on Dependable
Computing, IEEE Computer Press, pp. 97-103, 2002,
Japan.
15
Grid Reliability for Programs Resources
  • Adaptation

P1
Px
Network and Other Nodes
Pm
Node
Rk
Ry
R1
Failures of Programs and Resources
1 Y.S. Dai, M. Xie, K.L. Poh, Reliability
Analysis of Grid Computing Systems, IEEE Pacific
Rim International Symposium on Dependable
Computing, IEEE Computer Press, pp. 97-103, 2002,
Japan.
16
HRGrid (Highly Reliable Grid)
  • Difference and Relationship with Other Grid
    Projects
  • A Grid System with High Reliability.
  • Not Conflict.
  • Support other Grid systems.

17
HRGrid
  • Reliability of Components
  • Software, Hardware, Network, etc.
  • Design, Schedule, Allocate
  • Internal factors
  • Integration of Grid Services
  • Number of RM servers
  • Length Limitation of the Request Queue
  • Network Topology
  • Routing Algorithm
  • External factors
  • Human Resource
  • Testing Resource
  • Protection

18
HRGrid
  • Integration of Grid Services (Chapter 9.5, pp.
    269-272).
  • Programs and Resources are integrated on the Grid
    Nodes.
  • To Maximize the Grid Service Reliability
  • Heuristic Algorithm

G.Q. Liu, M. Xie, Y.S. Dai and K.L. Poh (2004),
On program and file assignment for distributed
systems, Computer Systems Science and
Engineering, Accepted for publication.
19
HRGrid
  • The Length Limitation of the Request Queue
  • Affect Block Failure
  • Affect Time-out Failure
  • To Maximize the Reliability
  • The Number of RM servers
  • To Maximize the Profit
  • The Network Topology (Chapter 9.4, pp. 266-268)
  • The Routing Algorithm

M. Xie, Y.S. Dai, K.L. Poh, C.D. Lai, Optimal
number of redundant hosts in distributed systems
based on cost criteria, International Journal of
Systems Science, Accepted for publication.
20
HRGrid
  • The Human Resource Allocation
  • The Testing Resource Allocation
  • The Protection Design

Y.S. Dai, M. Xie, K.L. Poh and B. Yang (2003),
Optimal testing-resource allocation with genetic
algorithm for modular software systems, Journal
of Systems and Software, vol. 66, no. 1, pp.
47-55.
G. Levitin, Y.S. Dai, M. Xie, K.L Poh (2003),
Optimizing Survivability of Multi-State Systems
with Multi-Level Protection by Multi-Processor
Genetic Algorithm, Reliability Engineering and
System Safety, vol. 82, no. 1, pp. 93-104.
21
Conclusion
  • Blooming of Grid Computing
  • Grid Structure
  • Resource Management System
  • Wide-area Network
  • Programs Resources
  • Highly Reliable Grid
  • Improve Components
  • Allocation and Scheduling
Write a Comment
User Comments (0)
About PowerShow.com