High Availability and Fault-Tolerance in Real-Time Databases - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

High Availability and Fault-Tolerance in Real-Time Databases

Description:

Real-Time data locked in main memory and API provides precompiled transactions. ... failover: Hot-standby data is up to date, ... reloading. Software upgrade ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 25
Provided by: depts184
Category:

less

Transcript and Presenter's Notes

Title: High Availability and Fault-Tolerance in Real-Time Databases


1
High Availability and Fault-Tolerance in
Real-Time Databases
  • Jan Lindström
  • University of Helsinki
  • Department of Computer Science

2
Overview
  • The causes of the downtime
  • Availability solutions
  • CASE 1 Clustra
  • CASE 2 TelORB
  • CASE 3 RODAIN

3
The Causes of Downtime
  • Planned downtime
  • Hardware expansion
  • Database software upgrades
  • Operating system upgrades
  • Unplanned downtime
  • Hardware failure
  • OS failure
  • Database software bugs
  • Power failure
  • Disaster
  • Human error

4
Traditional Availability Solutions
  • Replication
  • Failover
  • Primary restart

5
CASE 1 Clustra
  • Developed for telephony applications such as
    mobility management and intelligent networks.
  • Relational database with location and replication
    transparency.
  • Real-Time data locked in main memory and API
    provides precompiled transactions.
  • NOT a Real-Time Database !

6
Clustra hardware architecture
7
Data distribution and replication
8
How Clustra Handles Failures
  • Real-Time failover Hot-standby data is up to
    date, so failover occurs in milliseconds.
  • Automatic restart and takeback Restart of the
    failed node and takeback of operations is
    automatic, and again transparent to users and
    operators.
  • Self-repair If a node fails completely, data is
    copied from the complementary node to standby.
    This is also automatic and transparent.
  • Limited failure effects

9
How Clustra Handles Upgades
  • Hardware, operating system, and database
    software upgrades without ever going down.
  • Process called rolling upgrade
  • I.e. required changes are performed node by node.
  • Each node upgraded to catch up to the status of
    complementary node.
  • When this is completed, the operation is
    performed to next node.

10
CASE 2 TelORB
  • Characteristics
  • Very high availability (HA), robustness
    implemented in SW
  • (soft) Real Time
  • Scalability by using loosely coupled processors
  • Openness
  • Hardware Intel/Pentium
  • Language C, Java
  • Interoperability CORBA/IIOP, TCP/IP, Java RMI
  • 3rd party SW Java

11
TelORB Availability
  • Real-time object-oriented DBMS supporting
  • Distributed Transactions
  • ACID properties expected from a DBMS
  • Data Replication (providing redundancy)
  • Network Redundancy
  • Software Configuration Control
  • Automatic restart of processes that originally
    executed on a faulty processor on the ones that
    are working
  • Self healing
  • In service upgrade of software with no
    disturbance to operation
  • Hot replacement of faulty processors

12
Automatic Reconfiguration
13
Software upgrade
  • Smooth software upgrade when old and new version
    of same process can coexist
  • Possibility for application to arrange for state
    transfer between old and new static process
    (unless important states arent already stored in
    the database)

14
Partioning Types and Data
15
Advantages
  • Standard interfaces through Corba
  • Standard languages C, Java
  • Based on commercial hardware
  • (Soft) Real-time OS
  • Fault tolerance implemented in software
  • Fully scalable architecture
  • Includes powerful middleware A database
    management system and functions for software
    management
  • Fully compatible simulated environment for
    development on Unix/Linux/NT workstations

16
CASE 3 RODAIN
  • Real-Time Object-Oriented Database Architechture
    for Intelligent Networks
  • Real-Time Main-Memory Database System
  • Runs on Real-Time OS Chorus/ClassiX (and Linux)

17
Rodain Cluster
18
Rodain Database Node
Database Primary Unit
User Request Interpreter Subsystem
Object- Oriented Database Management Subsystem
Watchdog Subsystem
Distributed Database Subsystem
Fault-Tolerance and Recovery Subsystem
Database Mirror Unit
Fault-Tolerance and Recovery Subsystem
Object- Oriented Database Management Subsystem
Distributed Database Subsystem
Watchdog Subsystem
User Request Interpreter Subsystem
19
RODAIN Database Node II
Database Primary Unit
User Request Interpreter Subsystem
Object- Oriented Database Management Subsystem
Watchdog Subsystem
Distributed Database Subsystem
Fault-Tolerance and Recovery Subsystem
Database Mirror Unit
Fault-Tolerance and Recovery Subsystem
Object- Oriented Database Management Subsystem
Distributed Database Subsystem
Watchdog Subsystem
User Request Interpreter Subsystem
20
ORD Architechture
OCC
Data
Index
TRP
DDS
ORD
FTRS
21
Fault-Tolerance
  • Based on logs and mirroring
  • Logs send to Mirror
  • Mirror stores the logs on disk in SSS
  • Mirror maintains copy of main-memory database
  • Mirror makes disk copies of its database image

22
Recovery
  • Based on role switching
  • When Primary fails
  • Mirror updates its MMDB up to date
  • Mirror starts acting as new Primary
  • Active transactions are restarted or lost
  • When Mirror fails
  • Primary stores logs directly to SSS

23
Recovery II
  • During recovery the failed Node
  • always starts as a mirror node
  • loads most recent database image from disks in
    SSS
  • updates the log tail to loaded image
  • receives the logs from primary node
  • continues as normal mirror node

24
Further reading
  • Bratsberg, Humborstad Online Scaling in a Highly
    Available Database, Proceedings of the 27th VLDB
    Conference, Rome, Italy, pp 451-460, 2001.
  • Clustra Database Technical Overview,
    http//www.clustra.com
  • Björnerstedt, Ketoja, Sintorn, Sköld Replication
    between Geographically Separated Clusters - An
    Asynchronous Scalable Replication Mechanism for
    Very High Availability, Proceedings of the
    International Workshop on Databases in
    Telecommunications II, LNCS vol 2209, pp.
    102-115, 2001.
  • Lindström, Niklander, Porkka, Raatikainen A
    Distributed Real-Time Main-Memory Database for
    Telecommunications, Proceedings of the
    International Workshop on Databases in
    Telecommunications, LNCS vol 1819, pp 158-173,
    2000.
Write a Comment
User Comments (0)
About PowerShow.com