High Availability and Fault-Tolerance in Real-Time Databases - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

High Availability and Fault-Tolerance in Real-Time Databases

Description:

Real-Time data locked in main memory and API provides precompiled transactions. ... failover: Hot-standby data is up to date, ... reloading. Software upgrade ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 25

Provided by: depts184

Category:

more less

Transcript and Presenter's Notes

Title: High Availability and Fault-Tolerance in Real-Time Databases

1
High Availability and Fault-Tolerance in
Real-Time Databases

Jan Lindström
University of Helsinki
Department of Computer Science

2
Overview

The causes of the downtime
Availability solutions
CASE 1 Clustra
CASE 2 TelORB
CASE 3 RODAIN

3
The Causes of Downtime

Planned downtime
Hardware expansion
Database software upgrades
Operating system upgrades
Unplanned downtime
Hardware failure
OS failure
Database software bugs
Power failure
Disaster
Human error

4
Traditional Availability Solutions

Replication
Failover
Primary restart

5
CASE 1 Clustra

Developed for telephony applications such as
mobility management and intelligent networks.
Relational database with location and replication
transparency.
Real-Time data locked in main memory and API
provides precompiled transactions.
NOT a Real-Time Database !

6
Clustra hardware architecture
7
Data distribution and replication
8
How Clustra Handles Failures

Real-Time failover Hot-standby data is up to
date, so failover occurs in milliseconds.
Automatic restart and takeback Restart of the
failed node and takeback of operations is
automatic, and again transparent to users and
operators.
Self-repair If a node fails completely, data is
copied from the complementary node to standby.
This is also automatic and transparent.
Limited failure effects

9
How Clustra Handles Upgades

Hardware, operating system, and database
software upgrades without ever going down.
Process called rolling upgrade
I.e. required changes are performed node by node.
Each node upgraded to catch up to the status of
complementary node.
When this is completed, the operation is
performed to next node.

10
CASE 2 TelORB

Characteristics
Very high availability (HA), robustness
implemented in SW
(soft) Real Time
Scalability by using loosely coupled processors
Openness
Hardware Intel/Pentium
Language C, Java
Interoperability CORBA/IIOP, TCP/IP, Java RMI
3rd party SW Java

11
TelORB Availability

Real-time object-oriented DBMS supporting
Distributed Transactions
ACID properties expected from a DBMS
Data Replication (providing redundancy)
Network Redundancy
Software Configuration Control
Automatic restart of processes that originally
executed on a faulty processor on the ones that
are working
Self healing
In service upgrade of software with no
disturbance to operation
Hot replacement of faulty processors

12
Automatic Reconfiguration
13
Software upgrade

Smooth software upgrade when old and new version
of same process can coexist
Possibility for application to arrange for state
transfer between old and new static process
(unless important states arent already stored in
the database)

14
Partioning Types and Data
15
Advantages

Standard interfaces through Corba
Standard languages C, Java
Based on commercial hardware
(Soft) Real-time OS
Fault tolerance implemented in software
Fully scalable architecture
Includes powerful middleware A database
management system and functions for software
management
Fully compatible simulated environment for
development on Unix/Linux/NT workstations

16
CASE 3 RODAIN

Real-Time Object-Oriented Database Architechture
for Intelligent Networks
Real-Time Main-Memory Database System
Runs on Real-Time OS Chorus/ClassiX (and Linux)

17
Rodain Cluster
18
Rodain Database Node
Database Primary Unit
User Request Interpreter Subsystem
Object- Oriented Database Management Subsystem
Watchdog Subsystem
Distributed Database Subsystem
Fault-Tolerance and Recovery Subsystem
Database Mirror Unit
Fault-Tolerance and Recovery Subsystem
Object- Oriented Database Management Subsystem
Distributed Database Subsystem
Watchdog Subsystem
User Request Interpreter Subsystem
19
RODAIN Database Node II
Database Primary Unit
User Request Interpreter Subsystem
Object- Oriented Database Management Subsystem
Watchdog Subsystem
Distributed Database Subsystem
Fault-Tolerance and Recovery Subsystem
Database Mirror Unit
Fault-Tolerance and Recovery Subsystem
Object- Oriented Database Management Subsystem
Distributed Database Subsystem
Watchdog Subsystem
User Request Interpreter Subsystem
20
ORD Architechture
OCC
Data
Index
TRP
DDS
ORD
FTRS
21
Fault-Tolerance

Based on logs and mirroring
Logs send to Mirror
Mirror stores the logs on disk in SSS
Mirror maintains copy of main-memory database
Mirror makes disk copies of its database image

22
Recovery

Based on role switching
When Primary fails
Mirror updates its MMDB up to date
Mirror starts acting as new Primary
Active transactions are restarted or lost
When Mirror fails
Primary stores logs directly to SSS

23
Recovery II

During recovery the failed Node
always starts as a mirror node
loads most recent database image from disks in
SSS
updates the log tail to loaded image
receives the logs from primary node
continues as normal mirror node

24
Further reading

Bratsberg, Humborstad Online Scaling in a Highly
Available Database, Proceedings of the 27th VLDB
Conference, Rome, Italy, pp 451-460, 2001.
Clustra Database Technical Overview,
http//www.clustra.com
Björnerstedt, Ketoja, Sintorn, Sköld Replication
between Geographically Separated Clusters - An
Asynchronous Scalable Replication Mechanism for
Very High Availability, Proceedings of the
International Workshop on Databases in
Telecommunications II, LNCS vol 2209, pp.
102-115, 2001.
Lindström, Niklander, Porkka, Raatikainen A
Distributed Real-Time Main-Memory Database for
Telecommunications, Proceedings of the
International Workshop on Databases in
Telecommunications, LNCS vol 1819, pp 158-173,
2000.