Title: Information Resources Management
1Information Resources Management
2Agenda
- Administrivia
- Database Architectures
3Administrivia
4Database Architectures
- Centralized
- Client-Server
- Parallel - single site
- Distributed - multiple sites
5Database Architectures
Centralized (Parallel)
Distributed
Client-Server
Function
Data
6Centralized
- PC, Mini, or Mainframe
- Single Database
- Single Database Manager
- One or More Users
- Data and Function in One Place
7Client-Server
- PCs to Mainframes to Minis
- PC to PC
- Mainframe to Mainframe
- Use Desktop Processing Power
- Better User Interface
- Greater Functionality
- Retain Centralized Control of Data
8Client-Server Basic Model
Request
Server
Client
Client
Result
Client
Client
Client
9Servers
- Supercomputer
- Mainframe
- Mini
- PC Server
- All retain all data
10Client-Server Architecture
Data
Function
Thin Client
Fat Client
Server (Back-End)
Client (Front-End)
11Functionality
- Presentation
- I/O Processing
- Validation
- Business Rules
- Application Logic
- Data Management
- Validation
- Error Handling
12Thin Client
- Presentation Services Only
- Accept Input
- Format Output
- Display
- Server does all processing
13Fat Client
- Presentation
- Validation
- Application Logic - Programs
- Data Management
- Send SQL to Server
- Server is just DBMS
14In Between Client
- Client
- Presentation
- Some Application Logic
- Server
- Some Applicaton Logic
- Data Management and Services
15Benefits of Client-Server
- Use Local Processing Power
- Better User Interface
- Some Functionality if System Down
- Use Sunk Costs of PCs
- Support Reengineering
- Support Intranets
- Flexibility, Scalability, Customizeability
16Challenges of Client-Server
- Cost of (Upgraded) PCs
- Network Reliance
- Distributing Application Updates
- Management of Complex System
- Problem Identification Resolution
- Application Partitioning
17Other Client-Server Architectures
- Traditional is Two-Tiered (client-server)
- Three-Tiered
- Client-Application Server-DB Server
- (PC - Mini - Mainframe)
- (PC - PC Server - Mainframe)
- Beyond Three
- PC - PC Server - Web Server - Mini - Mainframe
18Client-Server vs. Distributed
- Client-Server Application Distribution
- Distributed Data Distribution
- Often, client-server is used to refer to
either application distribution or data
distribution or both.
19Middleware
- What if
- Multiple databases (sources) need to be accessed
from a single client? - Different kinds of clients?
- Mix of clients and servers?
- Want to take advantage of existing base of
applications (legacy systems)?
20Middleware
- Fat Clients just send SQL transactions
- Other types of transactions may be needed based
on the server (system)
21Middleware
Software that shields applications from the
complexity of the operating environment.
Client
Client
Client
Middleware
System (Legacy)
System (Legacy)
22Types of Middleware
- Transaction Process (TP) Monitor
- Database Middleware
- Remote Procedure Call (RPC)
- Message-Oriented Middleware (MOM)
- Object-Request Brokers
- (CORBA - ORB)
23TP Monitor
- Synchronous - sender must wait
- Queuing
- Message Delivery
- Insured Delivery
- Either Direction
24Database Middleware
- Variety of Clients/Platforms
- Variety of Servers/DBMSs/Platforms
- Specific to DB transactions (SQL)
25Message-Oriented Middleware (MOM)
- Asynchronous - clients do not wait
- Queues Queue Management/Recovery
- Message Delivery
- Insured Delivery
- Either Direction
- (like email or EDI only transactions)
26Advantages of Middleware
- Leverage sunk costs (legacy systems)
- Reduce development cost
- Reduce development time
- Increase responsiveness
- Improve overall systems management
- Consolidate diffuse information
27Challenges of Middleware
- Cost
- Session management - Transaction state
- Security
- Network reliance
- Diversity of systems - lack of standards
- Constant technology change
- Availability of talent
- Middleware Management
28Parallel and Distributed
- Client-Server is an attempt to improve
performance - Reduce time to execute a transaction
- Parallel
- Reduce time to get the data
- Distributed
29Parallel Systems
- Single site for data
- Very Large databases
- Operations performed simultaneously
30Parallel Database Architecures
- Shared Memory
- Shared Disk
- Shared Nothing
- Hierarchical
31Shared Memory
P
M
P
P
32Shared Memory
- Advantages
- Extremely efficient communications
- Disadvantages
- Max of 32/64 processors
- Bus becomes bottleneck
33Shared Disk
P
M
P
M
P
M
34Shared Disk
- Advantages
- No bus bottleneck
- Fault tolerance provided
- Disadvantages
- Disk access becomes bottleneck
35Shared Nothing
P
M
P
M
P
M
36Shared Nothing
- Advantages
- No disk bottleneck
- Highly scaleable
- Disadvantages
- High communication overhead/cost
- Between processors
- To another processors data
37Hierarchical
P
M
P
M
38Hierarchical
- Advantages
- Best of all worlds
- Disadvantages
- Worst of all worlds
- Some high communcation overhead/cost
- Between subsystems
- Complexity
39Distributed Databases
- Client-Server - distribute functionality
- What about distributing data?
40Distributed Databases
- Overview
- Distributed Storage
- Distributed Queries
- Distributed Transactions
- Multidatabase (Middleware)
41Distributed Databases
- Multiple locations
- Single logical database
- Several physical databases
- Network connections
42Advantages
- Sharing across locations
- Local control
- Availability
43Challenges
- Development costs
- People Equipment
- Testing
- Problem identification resolution
- Technical expertise
- Network dependence
- Increased processing overhead
44Distributed Data Storage
- Replication
- Fragmentation
- Both
45Replication
- Data is repeated
- Spectrum of options available
- Temporary replication of specific rows
- Replicate infrequently changed data
- Replicate by site
- Central site - all / each local site - their data
only - Full replication
- Everything everywhere
46Concerns with Replication
- Availability needed
- Amount of parallelism in reads
- Overhead of updates
- Keeping replicas updated
- Conflicting updates
47Fragmentation
- Partitioning
- Divide data into subsets based on need
- Have to be able to pull back together to get
original tables
48Fragmentation
- Horizontal
- by rows
- specified conditions
- Vertical
- by column
- each requires primary key (or created key)
- Mixed
- by row and column
49Fragmentation Replication
- Repeat as necessary
- Replicate fragments
- Fragment replicas
- Dont lose track of what you have and where it
is!
50Network Transparency
- Distributing data should not require that the
user know where or how its been distributed. - The database should be seen as a single entity no
matter how fragmented and replicated it becomes.
51Network Transparency
- Some DBMSs are starting to provide this level of
functionality so transparency exists even at the
program level, but in many cases this
transparency must be programmed into the
applications. - It must always be designed into the database.
52Distributed Queries
- How do you query data that is everywhere?
53Effeciency vs. Overhead
- Splitting the query apart
- Keeping track of the data/locations
- Making sure everything gets executed
- Putting the results back together
- Generating network traffic
- Handling partial results
54Distributed Queries
- Full replication can avoid the overhead
- Huge increase in update overhead
- Parallel execution no longer possible
- Additional costs of replication
55Example
- 5 sites - NY, Pgh, Chicago, Dallas, Los Angeles
- Data fragmented by site - no replication
- Query (in Pgh)
- SELECT Name, Max (Salary) from Employee
56Option 1 - High Bandwidth
- 1. Have all sites send their full employee tables
to Pgh. - 2. Build a temporary employee table.
- 3. Run the query against this table.
57Option 2 - Not so High Bandwidth
- 1. Examine the query and determine it can be run
separately at each location and the results
combined. - 2. Submit just the query to each location.
- 3. Wait for the results from each city.
- 4. As results return, build a temporary table (5
rows only). - 5. Find the max using the temporary table.
58Distributed Transactions
- Transaction Types
- Coordinators
- Commit Protocols
- Concurrency Controls
- Deadlocks
59Transaction Types
- Local - transaction only needs local data
- Global - transaction uses non-local data
- My global becomes someone elses local
- Either type of transaction must still have ACID
properties - global is the concern
60System Structure
- Things to do
- 1. Process local transactions
- (transaction manager)
- 2. Process and track global transactions
- (transaction coordinator)
61Global Processing
- 1. Recognize as global
- 2. Break up transaction
- 3. Distribute pieces
- 4. Assemble results
- 5. Coordinate termination
- 6. Handle problems
62Coordinator of Coordinators
- Coordinate among sites
- Detect problems
- Attempt to fix
- Share status with others
63Coordinator Failure
- Backup Coordinator
- receives all messages - maintains state
- monitors coordinator
- automatically takes over if coordinator down
- avoids delays - increases overhead
- Election
- highest pre-assigned number
64Commit Protocols
- Two-Phase
- Three-Phase
- All sites must commit or all sites have to
rollback - Replicated data only
65Two-Phase Commit
- Phase 1
- Send PREPARE to all sites
- Sites respond READY or ABORT
- Phase 2
- If all sites READY,
- COMMIT locally - Send COMMITs
- If not READY or time expires
- ROLLBACK locally - Send ROLLBACK
66Two-Phase Commit
Coordinator
Site
Site
Site
Site requests commit
67Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Send PREPARE - all sites
68Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Sites respond READY
69Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
COMMIT locally
70Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
Send COMMIT - all sites
71Two-Phase Commit -Phase 1
Coordinator
Site
Site
Site
Site responds ABORT or does not respond
72Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
ROLLBACK locally
73Two-Phase Commit -Phase 2
Coordinator
Site
Site
Site
Send ROLLBACK - all sites
74Site Failure - Recovery
- COMMIT and ROLLBACK as normal
- If READY only
- Check with coordinator or other sites
- Either COMMIT or ROLLBACK
- If no one found, ROLLBACK
75Coordinator Failure
- Ask the sites
- If one has COMMIT, then REDO
- If one has ROLLBACK, then UNDO
- If one doesnt have READY, UNDO
- If all READY only
- Coordinator must decide
- Sites must wait and locks are held
- Blocking occurs
76Three-Phase Commit
- Phase 1
- Sent PREPARE
- Sites respond READY or ABORT
- Phase 2
- If all sites READY, send PRECOMMIT
- Else, ROLLBACK
- Sites must ACKNOWLEDGE
- Phase 3
- If at least K sites ACKNOWLEDGE, send COMMIT
77Coordinator Failure
- Three-Phase Commit prevents blocking
- If coordinator fails
- New coordinator is selected
- Sites queried to determine status
- New coordinator resumes
78Network Partitioning
- Network split creates two separate networks
- Each half selects a coordinator
- Coordinators make independent decisions
- Result could be different decisions
- Resolution of network problem may create need to
resolve database problems
79Concurrency Control
- Single Lock Manager
- Multiple Lock Managers
80Single Lock Manager
- One site for all locking
- All other sites must go to it
- Can read from anywhere
- Updates must be to all copies
- Advantages Simple, Easy deadlock detection
- Disadvantages Bottleneck, Vulnerability
81Simple Multiple Lock Mgrs
- Each site locks a unique partition of the data
- non-replicated data
- Advantages Fairly simple, reduced bottlenecks
- Disadvantages Complicated deadlock detection
82Majority Protocol
- Each site locks its own data
- replication possible
- Request owner for lock on data that isnt local
- When multiple owners, n/2 1 (majority) must
provide the lock - Advantages No bottlenecks
- Disadvantages More messages sent, Complicated
deadlock detection, More deadlocks (each gets 1/2)
83Biased Protocol
- Reduced form of Majority Protocol
- For a READ, only need any single lock
- For a WRITE, need all locks
- Advantages No bottle necks, Reduced traffic
- Disadvantages Update traffic, Deadlocks
84Primary Copy
- Site designated to hold primary copy
- Multiple sites
- Replicated Data
- All locks through that site
- Advantages Fairly simple, reduced bottlenecks
- Disadvantages Vulnerability, Complicated
deadlock detection
85Other Than Locking
- Timestamps
- Centralized generation
- Local generation
- Timestamp tests determine ability to read or write
86Deadlocks Distributed Data
- Centralized
- One Site
- Distributed
- Centralized - same advantages and disadvantages
as other centralized control (database or locking)
87Distributed Deadlock Detection
- Each site tracks all transactions accessing its
own data - Dummy transaction for transactions that
originated here but are executing elsewhere - If deadlock found that includes dummy transaction
- Must send deadlock information to other sites
- They check for deadlock
- May have to pass on to another site
88Homework 9
- Continuuing with the Carnegie Library
- Client/Server
- Distrributed Database