Title: Clustering
1Clustering
2Objectives
- At the end of this module the student will
understand the following tasks and concepts. - What clustering is and why you would want it
- Clustering options
- Differences between various types of clustering
advantages and disadvantages - Factors to consider when choosing a cluster type
3What is a cluster?
- My definition
- Multiple systems performing a single function
- Black box
4Why Cluster?
- Performance
- Availability
- Recoverability
5Features
- Speedup
- Faster response times
- Transactions finish faster
- Scaleup
- More work done
- More capacity, more concurrent transactions
- Scalability
6Single Node Scaling
- Scales to multiple CPUs
- Doesnt scale beyond one node
- Multiple single points of failure
Users
Database
Database
7Cluster Definitions
- Shared Nothing (Federated)
- Replicated Site
- Shared Disk
- Failover
- Active/Passive
- Active/Active
- Shared Everything
8Shared Nothing Cluster
- Only one CPU is connected to a disk
- May have shared memory
- MPP Systems are Shared Nothing
- Other vendors have Shared Nothing clusters
9Federated (Shared Nothing) Cluster
- Distributed database (separate database on each
machine) - Data is spread across nodes each machine has
part of the data - Function is spread across nodes
- Two-Phase Commit
Got it?
1.
Good!
3.
Got it!
2.
Database
Database
10Replicated System
- Data replicated at the server (network) level or
at the storage (SAN) level - Multiple copies of the same database
- Most common implementation is Active/Passive
- Failover between nodes
Passive Node
Active Node
Server level Replication
or
Storage level Replication
Database
Database
11Shared Disk Cluster
- Shared file system
- Multiple systems attached to the same disk
- All nodes must have access to data
- Only one database instance only one node has
ownership of the shared disk - Synchronization between systems If one node
fails, then the other takes over
12Cluster Interconnect
- Most Shared Disk clusters require some form of
Cluster Interconnect - Network i.e. Gigabit Ethernet
- Specialized i.e. Infiniband, Myrinet
- Most clusters implement a heartbeat between
cluster nodes to monitor node health - Multiple nodes require a switch
- Usually separated from the LAN
- Some shared disk clusters implement a heartbeat
mechanism to a quorum disk via the SAN in
addition to/instead of network heartbeat - Oracle RAC implements Cache Fusion across the
interconnect - Extra network traffic increases the throughput
requirements - UDP implementation requires a separate network
13Failover Cluster
- One system is a standby system for another
- Only one system doing work at a time
- Pseudo-Shared Disk
- Limited scalability in active/passive mode
14Failover Clustering
Users
- Fault tolerant systems highly available
- Basic failover clusters dont scale beyond two
nodes
Database
Database
15Active/Passive vs. Active/Active
- Both are failover only
- Active/Passive
- One node is active
- The other is passive until failover
- Active/Active
- Still uses active/passive technology
- 2 separate databases
- One is active on node A and passive on node B
- The second database is active on node B and
passive on node A. - Separate applications and user connections to
each of the different databases
16Active/Passive
Node A
Node B
- Node A is active
- Node B is passive until/unless Node A fails
- Only one Oracle license is required
17Active/Passive
X
Node A
Node B
If Node A fails
18Active/Passive
X
Node A
Node B
- Node B becomes active
- Node A is dead (definitely passive!) until
repaired and then failed back if necessary.
19Active/Active
- Application Group A and User Group A are active
on Node A - Application Group B and User Group B are active
on Node B - Each node serves as failover for the other.
- 2 separate databases. Both nodes are not
accessing the same data at the same time. - Oracle license required on each node
Node A
Node B
Application A User Group A
Application B User Group B
Passive Fail-over for B
Passive Fail-over for A
20Switchover vs. Failover
- Many cluster systems utilize the concept of
Service Groups - Service Groups allow granular control of
individual software packages (i.e. individual
Oracle instances) - An individual group can be manually moved to
another server without affecting other service
groups a switchover versus a failover - Adds greater management flexibility
21N-to-1 Failover Configuration
- Node D is a dedicated failover node for failures
on Node A, B, and C - Extends number of active nodes
- A problem is that once the failed node is
available, the Service Groups on Node D (failover
node) must failback to original server to restore
High Availability
22N 1 Failover Configuration
Node A
Node B
Node C
Node D
- Node D is a dedicated failover node for failures
on Node A, B, and C - Extends number of active nodes
- Once Node C is restored, it becomes the failover
node, leaving Node D in production.
Failover
Application A User Group A
Application D User Group D
Application G User Group G
Failover G
X
Application B User Group B
Application E User Group E
Application H User Group H
Failover H
Application C User Group C
Application F User Group F
Application I User Group I
Failover I
23N-to-N Failover Configuration
Node A
Node B
Node C
Node D
- Node C fails, and its Service Groups are
re-distributed across surviving nodes - Optimal solution for gt 2 nodes
- Implemented on third party failover clusters and
Oracle RAC
Failover G
Failover H
Failover I
Application A User Group A
Application D User Group D
Application G User Group G
Application J User Group J
X
Application B User Group B
Application E User Group E
Application H User Group H
Application K User Group K
Application C User Group C
Application F User Group F
Application I User Group I
Application L User Group L
24Third Party Clusters
- Support for extended cluster nodes up to 32
nodes for vendor Clustering - Supports N 1 and N - N failover clustering
- Integrated with hardware and/or software
replication for long distance clusters
25Clustering Solutions from Oracle
- Oracle Failsafe
- Oracle Data Guard
- Advanced Replication
- Shared Nothing Cluster
- Oracle Parallel Server
- Real Application Clustering (RAC)
26Failsafe
- MS Clustering Enabled
- Two servers one disk subsystem
- Switches in the event of a hardware failure
- Requires recovery
27Standby Database
- Copy of Database (usually remote)
- Kept up to date with Archive Logs
- Oracle 8i feature
- Oracle 9i-10g version of a standby database is
Data Guard
28Oracle Data Guard
- Mirrored Server
- Physical Standby
- Archive Logs are applied to the remote database
- Switchover occurs in the event of a failure
- Logical Standby
- Log Miner technology is used to generate SQL
- Standby Database can also be used for read-only
reporting - Advantages
- Safe from user failure
- Can be in different location
- No recovery required
29Advanced Replication
- Uses Updatable-Snapshots
- Replicates to another system
- Systems stay in sync
30Oracle Parallel Server
- Shared disk cluster product
- Loosely Coupled
- Scalable performance
- No downtime in the event of a system failure
- Replaced by RAC in 9i
31True Shared Disk Server (RAC)
- ONE database
- Separate multiple instances (processes memory)
- All nodes can access data simultaneously
- Shared Everything Cluster
- Transparent Application Failover
- Oracle license required on each node
- Highest level of cluster functionality
Node A
Node B
32Factors to Consider for Clustering
- Which do you need most?
- High Availability Failover Clusters,
Synchronous Replication, Data Guard - Performance scalability Active/Active failover
clusters, N-to-N failover clusters - Both Oracle RAC
- Administration complexity
- Failover clusters relatively low
- Oracle RAC relatively high
- Substantially less complex for 10g RAC than 9i
RAC - Local or long distance?
- Local Failover, RAC
- Remote Federated database, Replication, Standby
database/Data Guard - Oracle license costs
- Active/Passive failover clusters active nodes
only - Active/Active failover clusters, RAC per node
33Review
- What type of commit is required for a Federated
(shared nothing) cluster? - What is the difference in how the database is
kept up-to-date in Oracle Data Guard vs. Advanced
Replication? - What is the difference between N-to-1 failover
clusters and N 1 failover clusters? - How many databases are there in an 8 node Oracle
RAC cluster?
34Summary
- Types of clusters
- Shared Nothing Clusters
- Federated databases
- Replication
- Shared Disk Clusters
- Failover
- Oracle RAC
- Failover Clusters
- Active/Passive
- Active/Active
- N-to-1
- N 1
- N-to-N
- Shared Everything Clusters
- Oracle RAC
- Choosing a cluster type involves trade-offs in
functionality, costs, and administration
complexity