GORDA Kickoff meeting INRIA

About This Presentation

Title:

GORDA Kickoff meeting INRIA

Description:

JOnAS. OpenCCM. OSCAR. ProActive. JORM/MEDOR. JOTM. OSCAR. Fractal. Think ... JOnAS clustering (http://jonas.objectweb.org) Database replication middleware ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 91

Provided by: emmanuel68

Learn more at: https://c-jdbc.ow2.org

Category:

more less

Transcript and Presenter's Notes

Title: GORDA Kickoff meeting INRIA

1
GORDA Kickoff meetingINRIA Sardes project

Emmanuel Cecchet
Sara Bouchenak

2
Outline

INRIA, ObjectWeb Sardes
GORDA

3
INRIA key figures
A public scientific and technological research
institute in computer science and control under
the dual authority of the Ministry of Research
and the Ministry of Industry
Jan. 2003

A scientific force of 3,000
900 permanent staff 400 researchers 500
engineers, technical and administrative 450
researchers from other organizations 700 Ph.D
students 200 external collaborators 750
trainees, post-doctoral students, visiting
researchers from abroad (universities or
industry))
INRIA Rhône-Alpes
6 Research Units
Budget 120 M (tax not incl.)
4
iCluster 2

Itanium-2 processors
104 nodes (Dual 64 bits 900 MHz processors, 3 GB
memory, 72 GB local disk) connected through a
Myrinet network
208 processors, 312 GB memory, 7.5 TB disk
Connected to the GRID
Linux OS (RedHat Advanced Server)
First Linpack experiments at INRIA (Aug. 03)
have reached 560 GFlop/s
Applications Grid computing,classical
scientific computing, high performance Internet
servers,

5
ObjecWeb key figures

Open source middleware development
Based on open standard
J2EE, CORBA, OSGi
International consortium
Founded by INRIA, Bull and FT RD in 2001
Academic partners
European universities and research centers
Industrial partners
RedHat, Suse, MySQL,
NEC, Bull, France Telecom, Dassault, Cap Gemini,

6
Common Software Architecture for Component
Based Development
JMOB
JOnAS
OSCAR
OpenCCM
ProActive
Speedo
RUBiS
JORAM
DotNetJ
CAROL
Enhydra
XMLC
JORM/MEDOR
JOTM
OSCAR
Kilim
Zeus
C-JDBC
Fractal
Jonathan
RmiJdbc
Bonita
Think
JAWE
Octopus
7
Sardes project

Distributed Systems group
Main research themes
Reflective component technology
Autonomous systems management
Applications areas
high-availability J2EE servers
dynamic monitoring, configuration and resource
management in large scale distributed systems
(embedded system networks, ubiquitous computing)
Result dissemination by ObjectWeb

8
Outline

INRIA, ObjectWeb Sardes
GORDA

9
Sardes experiences

Component-based open source middleware
ObjectWeb (http//www.objectweb.org)
J2EE application servers
JOnAS clustering (http//jonas.objectweb.org)
Database replication middleware
C-JDBC (http//c-jdbc.objectweb.org)
Benchmarking
RUBiS (http//rubis.objectweb.org)
TPC-W (http//jmob.objectweb.org)
CLIF (http//clif.objectweb.org)
Monitoring
LeWYS (http//lewys.objectweb.org)

10
Common scalability practice

Cons
Cost
Scalability limit

App. server
Web frontend
Internet
11
Replication with shared disks

Cons
still expensive hardware
availability

App. server
Disks
Database
Web frontend
Internet
Another well-known database vendor
12
Master/Slave replication

Cons
consistency
failover time on master failure
scalability

App. server
Master
Web frontend
Internet
13
Atomic broadcast-based replication

Database tier should be
scalable
highly available
without modifying the client application
database vendor independent
on commodity hardware

Internet
Atomic broadcast
14
C-JDBC

JDBC compliant (no client application
modification)
database vendor independent
JDBC driver required
heterogeneity support
no 2PC, no group communication between databases
group communication for controller replication
only

JDBC
Internet
15
RAIDb - Definition

Redundant Array of Inexpensive Databases
better performance and fault tolerance than a
single database, at a low cost, by combining
multiple database instances into an array of
databases
RAIDb levels offers various tradeoff of
performance and fault tolerance

16
RAIDb

Redundant Array of Inexpensive Databases
better performance and fault tolerance than a
single database, at a low cost, by combining
multiple database instances into an array of
databases
RAIDb controller
gives the view of a single database to the client
balance the load on the database backends
RAIDb levels
RAIDb-0 full partitioning
RAIDb-1 full mirroring
RAIDb-2 partial replication
composition possible

17
C-JDBC Key ideas

Middleware implementing RAIDb
Two components
generic JDBC 2.0 driver (C-JDBC driver)
C-JDBC Controller
C-JDBC Controller provides
performance scalability
high availability
failover
caching, logging, monitoring,
Supports heterogeneous databases

18
C-JDBC Overview
19
Heterogeneity support

unload a singleOracle DB withseveral MySQL
RAIDb-2 forpartial replication

20
Inside the C-JDBC Controller
Sockets
Sockets
JMX
21
C-JDBC features

unified authentication management
tunable concurrency control
automatic schema detection
tunable replication full partitioning, partial
replication, full replication
caching metadata, parsing, result with various
invalidation granularities
various load balancing strategies
on-the-fly query rewriting for macros and
heterogeneity support
recovery log for dynamic backend adding and
failure recovery
database backup/restore using Octopus
JMX based monitoring and administration
graphical administration console

22
Functional overview
23
Functional overview
24
Failures
execute INSERT INTO t

No 2 phase-commit
parallel transactions
failed nodes are automatically disabled

25
Controller replication
jdbcc-jdbc//node125322,node212345/myDB

Prevent the controller from being a single point
of failure
Group communication for controller
synchronization
C-JDBC driver supports multiple controllers with
automatic failover

26
Controller replication
27
Mixing horizontal vertical scalability
28
Lessons learned

SQL parsing cannot be generic
many discrepancies in JDBC implementations
minimize the use of group communications
IP multicast does not scale
notification infrastructure needed
users want
no single point of failure
control (monitoring, plug-able recovery policies,
)
no database vendor locking
no database modification
need for an exhaustive test suite
benchmarking accurately is very difficult
load injection requires resources
monitoring and exploiting results is tricky

29
Sardes role in GORDA

provide input
GORDA APIs
group communication requirements
monitoring and management requirements
middleware implementation based on C-JDBC
dissemination effort
ObjectWeb
possible participation to JCP for JDBC extensions
hardware resources for experiments
eCommerce benchmarks

30
Other interests

LeWYS (http//lewys.objectweb.org)
monitoring infrastructure
generic hardware/kernel probes for Linux/Windows
software probes JMX, SNMP,
monitoring repository
autonomic behavior
building supervision loops
self-healing clusters
self-sizing (expand or shrink)
SLAs

31
Q without A

do we consider distributed query execution?
XA support?
cluster size targeted?
do we target grids or cluster of clusters?
reconciliation
consistency/caching
network architecture considered?
are relaxed or loose consistency options?
what will really cover the GRI?
do we impose a specific way of doing replication?
access to read-set/write-set difficult to
implement with legacy databases
which workloads are considered?
which WP deals with backup/recovery?
licensing issues?

32
QA_________Thanks to all users and
contributors ...
33
Bonus slides
34
INTERNALS
35
Virtual Database

gives the view of a single database
establishes the mapping between the database name
used by the application and the backend specific
settings
backends can be added and removed dynamically
configured using an XML configuration file

36
Authentication Manager

Matches real login/password used by the
application with backend specific login/ password
Administrator login to manage the virtual database

37
Scheduler

Manages concurrency control
Specific implementations for Single DB, RAIDb 0,
1 and 2
Query-level
Optimistic and pessimistic transaction level
uses the database schema that is automatically
fetched from backends

38
Request cache

caches results from SQL requests
improved SQL statement analysis to limit cache
invalidations
table based invalidations
column based invalidations
single-row SELECT optimization
request parsing possible in theC-JDBC driver
offload the controller
parsing caching in the driver

39
Load balancer 1/2

RAIDb-0
query directed to the backend having the needed
tables
RAIDb-1
read executed by current thread
write executed in parallel by a dedicated thread
per backend
result returned if one, majority or all commit
if one node fails but others succeed, failing
node is disabled
RAIDb-2
same as RAIDb-1 except that writes are sent only
to nodes owning the written table

40
Load balancer 2/2

Static load balancing policies
Round-Robin (RR)
Weighted Round-Robin (WRR)
Least Pending Requests First (LPRF)
request sent to the node that has the shortest
pending request queue
efficient if backends are homogeneous in terms of
performance

41
Connection Manager

Connection pooling for a backend
Simple no pooling
RandomWait blocking pool
FailFast non-blocking pool
VariablePool dynamic pool
Connection pools defined on a per login basis
resource management per login
dedicated connections for admin

42
Recovery Log

Checkpoints are associated with database dumps
Record all updates and transaction markers since
a checkpoint
Used to resynchronize a database from a
checkpoint
JDBCRecoveryLog
store information in a database
can be re-injected in a C-JDBC cluster for fault
tolerance

43
SCALABILITY
44
C-JDBC scalability

Horizontal scalability
prevents the controller to be a Single Point Of
Failure (SPOF)
distributes the load among several controllers
uses group communications for synchronization
C-JDBC Driver
multiple controllers automatic failover
jdbcc-jdbc//node125322,node212345/myDB
connection caching
URL parsing/controller lookup caching

45
C-JDBC scalability

Vertical scalability
allows nested RAIDb levels
allows tree architecture for scalable write
broadcast
necessary with large number of backends
C-JDBC driver re-injected in C-JDBC controller

46
C-JDBC vertical scalability

RAIDb-1-1with C-JDBC
no limit tocompositiondeepness

47
C-JDBC vertical scalability

RAIDb-0-1with C-JDBC

48
CHECKPOINTING
49
Fault tolerant recovery log
UPDATE statement
50
Checkpointing

Octopus is an ETL tool
Use Octopus to store a dump of the initial
database state
Currently done by the user using the database
specific dump tool

51
Checkpointing

Backend is enabled
All database updates are logged (SQL statement,
user, transaction, )

52
Checkpointing

Add new backends while system online
Restore dump corresponding to initial checkpoint
with Octopus

53
Checkpointing

Replay updates from the log

54
Checkpointing

Enable backends when done

55
Making new checkpoints

Disable one backend to have a coherent snapshot
Mark the new checkpoint entry in the log
Use Octopus to store the dump

56
Making new checkpoints

Replay missing updates from log

57
Making new checkpoints

Re-enable backend when done

58
Recovery

A node fails!
Automatically disabled but should be fixed or
changed by administrator

59
Recovery

Restore latest dump with Octopus

60
Recovery

Replay missing updates from log

61
Recovery

Re-enable backend when done

62
HORIZONTAL SCALABILITY
63
Horizontal scalability

JGroups for controller synchronization
Groups messages for writes only

64
Horizontal scalability

Centralized write approach issues
Issues with transactions assigned to connections

65
Horizontal scalability

General case for a write query
3 multicast 2n unicast

66
Horizontal scalability

Solution No backend sharing
1 multicast n unicast 1 multicast

67
Horizontal scalability

Issues with JGroups
resources needed by a channel
instability of throughput with UDP
performance scalability
TCP better than UDP but
unable to disable reliability on top of TCP
unable to disable garbage collection
ordering implementation is sub-optimal
Need for a new group communication layer
optimized for cluster

68
Horizontal scalability

JGroups performance on UDP/FastEthernet

69
USE CASES
70
Budget High Availability

High availability infrastructure on a budget
Typical eCommercesetup
http//www.budget-ha.com

71
OpenUSS University Support System

eLearning
High availability
Portability
Linux, HP-UX, Windows
InterBase, Firebird, PostgreSQL, HypersonicSQL
http//openuss.sourceforge.net

72
Flood alert system

Disaster recovery
Independent nodes synchronized with C-JDBC
VPN for security issues
http//floodalert.org

73
J2EE benchmarking

Large scaleJ2EE clusters
http//jmob.objectweb.org

74
PERFORMANCE
75
TPC-W

Browsing mix performance

76
TPC-W

Shopping mix performance

77
TPC-W

Ordering mix performance

78
Result cache

Cache contains a list of SQL-gtResultSet
Policy defined by queryPattern-gtPolicy
3 policies
EagerCaching variable granularities for
invalidations
RelaxedCaching invalidations based on timeout
NoCaching never cached

RUBiS bidding mix with 450 clients No cache Coherent cache Relaxed cache
Throughput (rq/min) 3892 4184 4215
Avg response time 801 ms 284 ms 134 ms
Database CPU load 100 85 20
C-JDBC CPU load - 15 7
79
Outline

Motivations
RAIDb
C-JDBC
Performance
Lessons learned
Conclusion

80
Open problems

Partition of clusters
Users want control on failure policy
Reconciliation must also be user controlled

81
LeWYS overview
Observer
Observer
DREAM
DREAM
DREAM
DREAM
DREAM
Observer
Observer
Monitoring repository
82
LeWYS
83
LeWYS components

Library of probes
hardware resources cpu, memory, disk, network
generic sensors SNMP, JMX, JVMPI,
Monitoring pump
dynamic deployment of sensors
manages monitoring leases
Event channels
propagate monitored events to interested
observers
allows for filtering, aggregation, content-based
processing,
Optional monitoring repository

84
LeWYS design choices

Component-based framework
probes, monitoring pump, event channels
provides (re)configurability capabilities
Minimize intrusiveness on monitored nodes
No global clock
timestamp generated locally by pump
Information processing in DREAM channels

85
Centralized monitoring using a monitoring
repository (1)
86
Centralized monitoring using a monitoring
repository (2)

Monitoring repository
stores monitoring information
service to retrieve monitoring information
Pros
DB allows for storing large amount of data
powerful queries
correlate data from various probes at different
locations
resynchronize clocks
browsing history to diagnose failures
use history for system provisioning
Cons
requires a DB (heavy weight solution)

87
Outline

J2EE Cluster
Group communications
Monitoring
motivations
LeWYS
implementation
Status Perspectives

88
Monitoring pump implemention
Component
ProbeManager
Probe
ProbeManager
CachedProbe
Cache
Probe
Probe
Probe Repository
Binding Controller
MonitoringMumpManager
Monitoring Pump Thread
TimeStamp
MonitoringPump Manager
PullPush Multiplexer
OutputManager
ChannelOut
OutputManager
RMI
Component
89
Hardware Probes

Pure Java probes
using /proc
cost 0.01ms/call (Linux)

cpu, mem, disk, net, kernel, probes
cpu, probes

JNI
JNI
JNI
/proc
.DLL
C
C
C
Linux
Solaris
Windows
Linux
Hardware resources
90
Software Probes

Application level monitoring
JMX
ad-hoc
JVM

SNMP, ad-hoc, probes
JVM probes
JMX based probes
JMX
JVMPI
JVM
Linux
Solaris
Windows
Linux
Hardware resources

Write a Comment

User Comments (0)

About PowerShow.com

GORDA Kickoff meeting INRIA - PowerPoint PPT Presentation

GORDA Kickoff meeting INRIA

JOnAS. OpenCCM. OSCAR. ProActive. JORM/MEDOR. JOTM. OSCAR. Fractal. Think ... JOnAS clustering (http://jonas.objectweb.org) Database replication middleware ... – PowerPoint PPT presentation