Information Resources Management

About This Presentation

Title:

Information Resources Management

Description:

Single site for data. Very Large databases. Operations performed simultaneously ... Data fragmented by site - no replication. Query (in Pgh) ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 89

Provided by: KevinSt4

Learn more at: https://www.andrew.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Information Resources Management

1
Information Resources Management

April 17, 2001

2
Agenda

Administrivia
Database Architectures

3
Administrivia

Homework 8

4
Database Architectures

Centralized
Client-Server
Parallel - single site
Distributed - multiple sites

5
Database Architectures
Centralized (Parallel)
Distributed
Client-Server
Function
Data
6
Centralized

PC, Mini, or Mainframe
Single Database
Single Database Manager
One or More Users
Data and Function in One Place

7
Client-Server

PCs to Mainframes to Minis
PC to PC
Mainframe to Mainframe
Use Desktop Processing Power
Better User Interface
Greater Functionality
Retain Centralized Control of Data

8
Client-Server Basic Model
Request
Server
Client
Client
Result
Client
Client
Client
9
Servers

Supercomputer
Mainframe
Mini
PC Server
All retain all data

10
Client-Server Architecture
Data
Function
Thin Client
Fat Client
Server (Back-End)
Client (Front-End)
11
Functionality

Presentation
I/O Processing
Validation
Business Rules
Application Logic
Data Management
Validation
Error Handling

12
Thin Client

Presentation Services Only
Accept Input
Format Output
Display
Server does all processing

13
Fat Client

Presentation
Validation
Application Logic - Programs
Data Management
Send SQL to Server
Server is just DBMS

14
In Between Client

Client
Presentation
Some Application Logic
Server
Some Applicaton Logic
Data Management and Services

15
Benefits of Client-Server

Use Local Processing Power
Better User Interface
Some Functionality if System Down
Use Sunk Costs of PCs
Support Reengineering
Support Intranets
Flexibility, Scalability, Customizeability

16
Challenges of Client-Server

Cost of (Upgraded) PCs
Network Reliance
Distributing Application Updates
Management of Complex System
Problem Identification Resolution
Application Partitioning

17
Other Client-Server Architectures

Traditional is Two-Tiered (client-server)
Three-Tiered
Client-Application Server-DB Server
(PC - Mini - Mainframe)
(PC - PC Server - Mainframe)
Beyond Three
PC - PC Server - Web Server - Mini - Mainframe

18
Client-Server vs. Distributed

Client-Server Application Distribution
Distributed Data Distribution
Often, client-server is used to refer to
either application distribution or data
distribution or both.

19
Middleware

What if
Multiple databases (sources) need to be accessed
from a single client?
Different kinds of clients?
Mix of clients and servers?
Want to take advantage of existing base of
applications (legacy systems)?

20
Middleware

Fat Clients just send SQL transactions
Other types of transactions may be needed based
on the server (system)

21
Middleware
Software that shields applications from the
complexity of the operating environment.
Client
Client
Client
Middleware
System (Legacy)
System (Legacy)
22
Types of Middleware

Transaction Process (TP) Monitor
Database Middleware
Remote Procedure Call (RPC)
Message-Oriented Middleware (MOM)
Object-Request Brokers
(CORBA - ORB)

23
TP Monitor

Synchronous - sender must wait
Queuing
Message Delivery
Insured Delivery
Either Direction

24
Database Middleware

Variety of Clients/Platforms
Variety of Servers/DBMSs/Platforms
Specific to DB transactions (SQL)

25
Message-Oriented Middleware (MOM)

Asynchronous - clients do not wait
Queues Queue Management/Recovery
Message Delivery
Insured Delivery
Either Direction
(like email or EDI only transactions)

26
Advantages of Middleware

Leverage sunk costs (legacy systems)
Reduce development cost
Reduce development time
Increase responsiveness
Improve overall systems management
Consolidate diffuse information

27
Challenges of Middleware

Cost
Session management - Transaction state
Security
Network reliance
Diversity of systems - lack of standards
Constant technology change
Availability of talent
Middleware Management

28
Parallel and Distributed

Client-Server is an attempt to improve
performance
Reduce time to execute a transaction
Parallel
Reduce time to get the data
Distributed

29
Parallel Systems

Single site for data
Very Large databases
Operations performed simultaneously

30
Parallel Database Architecures

Shared Memory
Shared Disk
Shared Nothing
Hierarchical

31
Shared Memory
P
M
P
P
32
Shared Memory

Advantages
Extremely efficient communications
Disadvantages
Max of 32/64 processors
Bus becomes bottleneck

33
Shared Disk
P
M
P
M
P
M
34
Shared Disk

Advantages
No bus bottleneck
Fault tolerance provided
Disadvantages
Disk access becomes bottleneck

35
Shared Nothing
P
M
P
M
P
M
36
Shared Nothing

Advantages
No disk bottleneck
Highly scaleable
Disadvantages
High communication overhead/cost
Between processors
To another processors data

37
Hierarchical
P
M
P
M
38
Hierarchical

Advantages
Best of all worlds
Disadvantages
Worst of all worlds
Some high communcation overhead/cost
Between subsystems
Complexity

39
Distributed Databases

Client-Server - distribute functionality
What about distributing data?

40
Distributed Databases

Overview
Distributed Storage
Distributed Queries
Distributed Transactions
Multidatabase (Middleware)

41
Distributed Databases

Multiple locations
Single logical database
Several physical databases
Network connections

42
Advantages

Sharing across locations
Local control
Availability

43
Challenges

Development costs
People Equipment
Testing
Problem identification resolution
Technical expertise
Network dependence
Increased processing overhead

44
Distributed Data Storage

Replication
Fragmentation
Both

45
Replication

Data is repeated
Spectrum of options available
Temporary replication of specific rows
Replicate infrequently changed data
Replicate by site
Central site - all / each local site - their data
only
Full replication
Everything everywhere

46
Concerns with Replication

Availability needed
Amount of parallelism in reads
Overhead of updates
Keeping replicas updated
Conflicting updates

47
Fragmentation

Partitioning
Divide data into subsets based on need
Have to be able to pull back together to get
original tables

48
Fragmentation

Horizontal
by rows
specified conditions
Vertical
by column
each requires primary key (or created key)
Mixed
by row and column

49
Fragmentation Replication

Repeat as necessary
Replicate fragments
Fragment replicas
Dont lose track of what you have and where it
is!

50
Network Transparency

Distributing data should not require that the
user know where or how its been distributed.
The database should be seen as a single entity no
matter how fragmented and replicated it becomes.

51
Network Transparency

Some DBMSs are starting to provide this level of
functionality so transparency exists even at the
program level, but in many cases this
transparency must be programmed into the
applications.
It must always be designed into the database.

52
Distributed Queries

How do you query data that is everywhere?

53
Effeciency vs. Overhead

Splitting the query apart
Keeping track of the data/locations
Making sure everything gets executed
Putting the results back together
Generating network traffic
Handling partial results

54
Distributed Queries

Full replication can avoid the overhead
Huge increase in update overhead
Parallel execution no longer possible
Additional costs of replication

55
Example

5 sites - NY, Pgh, Chicago, Dallas, Los Angeles
Data fragmented by site - no replication
Query (in Pgh)
SELECT Name, Max (Salary) from Employee

56
Option 1 - High Bandwidth

1. Have all sites send their full employee tables
to Pgh.
2. Build a temporary employee table.
3. Run the query against this table.

57
Option 2 - Not so High Bandwidth

1. Examine the query and determine it can be run
separately at each location and the results
combined.
2. Submit just the query to each location.
3. Wait for the results from each city.
4. As results return, build a temporary table (5
rows only).
5. Find the max using the temporary table.

58
Distributed Transactions

Transaction Types
Coordinators
Commit Protocols
Concurrency Controls
Deadlocks

59
Transaction Types

Local - transaction only needs local data
Global - transaction uses non-local data
My global becomes someone elses local
Either type of transaction must still have ACID
properties - global is the concern

60
System Structure

Things to do
1. Process local transactions
(transaction manager)
2. Process and track global transactions
(transaction coordinator)

61
Global Processing

1. Recognize as global
2. Break up transaction
3. Distribute pieces
4. Assemble results
5. Coordinate termination
6. Handle problems

62
Coordinator of Coordinators

Coordinate among sites
Detect problems
Attempt to fix
Share status with others

63
Coordinator Failure

Backup Coordinator
receives all messages - maintains state
monitors coordinator
automatically takes over if coordinator down
avoids delays - increases overhead
Election
highest pre-assigned number

64
Commit Protocols

Two-Phase
Three-Phase
All sites must commit or all sites have to
rollback
Replicated data only

65
Two-Phase Commit

Phase 1
Send PREPARE to all sites
Sites respond READY or ABORT
Phase 2
If all sites READY,
COMMIT locally - Send COMMITs
If not READY or time expires
ROLLBACK locally - Send ROLLBACK

66
Two-Phase Commit
Coordinator
Site
Site
Site
Site requests commit
67
Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Send PREPARE - all sites
68
Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Sites respond READY
69
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
COMMIT locally
70
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
Send COMMIT - all sites
71
Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Site responds ABORT or does not respond
72
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
ROLLBACK locally
73
Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
Send ROLLBACK - all sites
74
Site Failure - Recovery

COMMIT and ROLLBACK as normal
If READY only
Check with coordinator or other sites
Either COMMIT or ROLLBACK
If no one found, ROLLBACK

75
Coordinator Failure

Ask the sites
If one has COMMIT, then REDO
If one has ROLLBACK, then UNDO
If one doesnt have READY, UNDO
If all READY only
Coordinator must decide
Sites must wait and locks are held
Blocking occurs

76
Three-Phase Commit

Phase 1
Sent PREPARE
Sites respond READY or ABORT
Phase 2
If all sites READY, send PRECOMMIT
Else, ROLLBACK
Sites must ACKNOWLEDGE
Phase 3
If at least K sites ACKNOWLEDGE, send COMMIT

77
Coordinator Failure

Three-Phase Commit prevents blocking
If coordinator fails
New coordinator is selected
Sites queried to determine status
New coordinator resumes

78
Network Partitioning

Network split creates two separate networks
Each half selects a coordinator
Coordinators make independent decisions
Result could be different decisions
Resolution of network problem may create need to
resolve database problems

79
Concurrency Control

Single Lock Manager
Multiple Lock Managers

80
Single Lock Manager

One site for all locking
All other sites must go to it
Can read from anywhere
Updates must be to all copies
Advantages Simple, Easy deadlock detection
Disadvantages Bottleneck, Vulnerability

81
Simple Multiple Lock Mgrs

Each site locks a unique partition of the data
non-replicated data
Advantages Fairly simple, reduced bottlenecks
Disadvantages Complicated deadlock detection

82
Majority Protocol

Each site locks its own data
replication possible
Request owner for lock on data that isnt local
When multiple owners, n/2 1 (majority) must
provide the lock
Advantages No bottlenecks
Disadvantages More messages sent, Complicated
deadlock detection, More deadlocks (each gets 1/2)

83
Biased Protocol

Reduced form of Majority Protocol
For a READ, only need any single lock
For a WRITE, need all locks
Advantages No bottle necks, Reduced traffic
Disadvantages Update traffic, Deadlocks

84
Primary Copy

Site designated to hold primary copy
Multiple sites
Replicated Data
All locks through that site
Advantages Fairly simple, reduced bottlenecks
Disadvantages Vulnerability, Complicated
deadlock detection

85
Other Than Locking

Timestamps
Centralized generation
Local generation
Timestamp tests determine ability to read or write

86
Deadlocks Distributed Data

Centralized
One Site
Distributed
Centralized - same advantages and disadvantages
as other centralized control (database or locking)

87
Distributed Deadlock Detection

Each site tracks all transactions accessing its
own data
Dummy transaction for transactions that
originated here but are executing elsewhere
If deadlock found that includes dummy transaction
Must send deadlock information to other sites
They check for deadlock
May have to pass on to another site

Information Resources Management - PowerPoint PPT Presentation

Information Resources Management

Single site for data. Very Large databases. Operations performed simultaneously ... Data fragmented by site - no replication. Query (in Pgh) ... – PowerPoint PPT presentation