GORDA Kickoff meeting INRIA - PowerPoint PPT Presentation

1 / 90
About This Presentation
Title:

GORDA Kickoff meeting INRIA

Description:

JOnAS. OpenCCM. OSCAR. ProActive. JORM/MEDOR. JOTM. OSCAR. Fractal. Think ... JOnAS clustering (http://jonas.objectweb.org) Database replication middleware ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 91
Provided by: emmanuel68
Learn more at: https://c-jdbc.ow2.org
Category:

less

Transcript and Presenter's Notes

Title: GORDA Kickoff meeting INRIA


1
GORDA Kickoff meetingINRIA Sardes project
  • Emmanuel Cecchet
  • Sara Bouchenak

2
Outline
  • INRIA, ObjectWeb Sardes
  • GORDA

3
INRIA key figures
A public scientific and technological research
institute in computer science and control under
the dual authority of the Ministry of Research
and the Ministry of Industry
Jan. 2003

A scientific force of 3,000
900 permanent staff 400 researchers 500
engineers, technical and administrative 450
researchers from other organizations 700 Ph.D
students 200 external collaborators 750
trainees, post-doctoral students, visiting
researchers from abroad (universities or
industry))
INRIA Rhône-Alpes
6 Research Units
Budget 120 M (tax not incl.)
4
iCluster 2
  • Itanium-2 processors
  • 104 nodes (Dual 64 bits 900 MHz processors, 3 GB
    memory, 72 GB local disk) connected through a
    Myrinet network
  • 208 processors, 312 GB memory, 7.5 TB disk
  • Connected to the GRID
  • Linux OS (RedHat Advanced Server)
  • First Linpack experiments at INRIA (Aug. 03)
    have reached 560 GFlop/s
  • Applications Grid computing,classical
    scientific computing, high performance Internet
    servers,

5
ObjecWeb key figures
  • Open source middleware development
  • Based on open standard
  • J2EE, CORBA, OSGi
  • International consortium
  • Founded by INRIA, Bull and FT RD in 2001
  • Academic partners
  • European universities and research centers
  • Industrial partners
  • RedHat, Suse, MySQL,
  • NEC, Bull, France Telecom, Dassault, Cap Gemini,

6
Common Software Architecture for Component
Based Development
JMOB
JOnAS
OSCAR
OpenCCM
ProActive
Speedo
RUBiS
JORAM
DotNetJ
CAROL
Enhydra
XMLC
JORM/MEDOR
JOTM
OSCAR
Kilim
Zeus
C-JDBC
Fractal
Jonathan
RmiJdbc
Bonita
Think
JAWE
Octopus
7
Sardes project
  • Distributed Systems group
  • Main research themes
  • Reflective component technology
  • Autonomous systems management
  • Applications areas
  • high-availability J2EE servers
  • dynamic monitoring, configuration and resource
    management in large scale distributed systems
  • (embedded system networks, ubiquitous computing)
  • Result dissemination by ObjectWeb

8
Outline
  • INRIA, ObjectWeb Sardes
  • GORDA

9
Sardes experiences
  • Component-based open source middleware
  • ObjectWeb (http//www.objectweb.org)
  • J2EE application servers
  • JOnAS clustering (http//jonas.objectweb.org)
  • Database replication middleware
  • C-JDBC (http//c-jdbc.objectweb.org)
  • Benchmarking
  • RUBiS (http//rubis.objectweb.org)
  • TPC-W (http//jmob.objectweb.org)
  • CLIF (http//clif.objectweb.org)
  • Monitoring
  • LeWYS (http//lewys.objectweb.org)

10
Common scalability practice
  • Cons
  • Cost
  • Scalability limit

App. server
Web frontend
Internet
11
Replication with shared disks
  • Cons
  • still expensive hardware
  • availability

App. server
Disks
Database
Web frontend
Internet
Another well-known database vendor
12
Master/Slave replication
  • Cons
  • consistency
  • failover time on master failure
  • scalability

App. server
Master
Web frontend
Internet
13
Atomic broadcast-based replication
  • Database tier should be
  • scalable
  • highly available
  • without modifying the client application
  • database vendor independent
  • on commodity hardware

Internet
Atomic broadcast
14
C-JDBC
  • JDBC compliant (no client application
    modification)
  • database vendor independent
  • JDBC driver required
  • heterogeneity support
  • no 2PC, no group communication between databases
  • group communication for controller replication
    only

JDBC
Internet
15
RAIDb - Definition
  • Redundant Array of Inexpensive Databases
  • better performance and fault tolerance than a
    single database, at a low cost, by combining
    multiple database instances into an array of
    databases
  • RAIDb levels offers various tradeoff of
    performance and fault tolerance

16
RAIDb
  • Redundant Array of Inexpensive Databases
  • better performance and fault tolerance than a
    single database, at a low cost, by combining
    multiple database instances into an array of
    databases
  • RAIDb controller
  • gives the view of a single database to the client
  • balance the load on the database backends
  • RAIDb levels
  • RAIDb-0 full partitioning
  • RAIDb-1 full mirroring
  • RAIDb-2 partial replication
  • composition possible

17
C-JDBC Key ideas
  • Middleware implementing RAIDb
  • Two components
  • generic JDBC 2.0 driver (C-JDBC driver)
  • C-JDBC Controller
  • C-JDBC Controller provides
  • performance scalability
  • high availability
  • failover
  • caching, logging, monitoring,
  • Supports heterogeneous databases

18
C-JDBC Overview
19
Heterogeneity support
  • unload a singleOracle DB withseveral MySQL
  • RAIDb-2 forpartial replication

20
Inside the C-JDBC Controller
Sockets
Sockets
JMX
21
C-JDBC features
  • unified authentication management
  • tunable concurrency control
  • automatic schema detection
  • tunable replication full partitioning, partial
    replication, full replication
  • caching metadata, parsing, result with various
    invalidation granularities
  • various load balancing strategies
  • on-the-fly query rewriting for macros and
    heterogeneity support
  • recovery log for dynamic backend adding and
    failure recovery
  • database backup/restore using Octopus
  • JMX based monitoring and administration
  • graphical administration console

22
Functional overview
23
Functional overview
24
Failures
execute INSERT INTO t
  • No 2 phase-commit
  • parallel transactions
  • failed nodes are automatically disabled

25
Controller replication
jdbcc-jdbc//node125322,node212345/myDB
  • Prevent the controller from being a single point
    of failure
  • Group communication for controller
    synchronization
  • C-JDBC driver supports multiple controllers with
    automatic failover

26
Controller replication
27
Mixing horizontal vertical scalability
28
Lessons learned
  • SQL parsing cannot be generic
  • many discrepancies in JDBC implementations
  • minimize the use of group communications
  • IP multicast does not scale
  • notification infrastructure needed
  • users want
  • no single point of failure
  • control (monitoring, plug-able recovery policies,
    )
  • no database vendor locking
  • no database modification
  • need for an exhaustive test suite
  • benchmarking accurately is very difficult
  • load injection requires resources
  • monitoring and exploiting results is tricky

29
Sardes role in GORDA
  • provide input
  • GORDA APIs
  • group communication requirements
  • monitoring and management requirements
  • middleware implementation based on C-JDBC
  • dissemination effort
  • ObjectWeb
  • possible participation to JCP for JDBC extensions
  • hardware resources for experiments
  • eCommerce benchmarks

30
Other interests
  • LeWYS (http//lewys.objectweb.org)
  • monitoring infrastructure
  • generic hardware/kernel probes for Linux/Windows
  • software probes JMX, SNMP,
  • monitoring repository
  • autonomic behavior
  • building supervision loops
  • self-healing clusters
  • self-sizing (expand or shrink)
  • SLAs

31
Q without A
  • do we consider distributed query execution?
  • XA support?
  • cluster size targeted?
  • do we target grids or cluster of clusters?
  • reconciliation
  • consistency/caching
  • network architecture considered?
  • are relaxed or loose consistency options?
  • what will really cover the GRI?
  • do we impose a specific way of doing replication?
  • access to read-set/write-set difficult to
    implement with legacy databases
  • which workloads are considered?
  • which WP deals with backup/recovery?
  • licensing issues?

32
QA_________Thanks to all users and
contributors ...
33
Bonus slides
34
INTERNALS
35
Virtual Database
  • gives the view of a single database
  • establishes the mapping between the database name
    used by the application and the backend specific
    settings
  • backends can be added and removed dynamically
  • configured using an XML configuration file

36
Authentication Manager
  • Matches real login/password used by the
    application with backend specific login/ password
  • Administrator login to manage the virtual database

37
Scheduler
  • Manages concurrency control
  • Specific implementations for Single DB, RAIDb 0,
    1 and 2
  • Query-level
  • Optimistic and pessimistic transaction level
  • uses the database schema that is automatically
    fetched from backends

38
Request cache
  • caches results from SQL requests
  • improved SQL statement analysis to limit cache
    invalidations
  • table based invalidations
  • column based invalidations
  • single-row SELECT optimization
  • request parsing possible in theC-JDBC driver
  • offload the controller
  • parsing caching in the driver

39
Load balancer 1/2
  • RAIDb-0
  • query directed to the backend having the needed
    tables
  • RAIDb-1
  • read executed by current thread
  • write executed in parallel by a dedicated thread
    per backend
  • result returned if one, majority or all commit
  • if one node fails but others succeed, failing
    node is disabled
  • RAIDb-2
  • same as RAIDb-1 except that writes are sent only
    to nodes owning the written table

40
Load balancer 2/2
  • Static load balancing policies
  • Round-Robin (RR)
  • Weighted Round-Robin (WRR)
  • Least Pending Requests First (LPRF)
  • request sent to the node that has the shortest
    pending request queue
  • efficient if backends are homogeneous in terms of
    performance

41
Connection Manager
  • Connection pooling for a backend
  • Simple no pooling
  • RandomWait blocking pool
  • FailFast non-blocking pool
  • VariablePool dynamic pool
  • Connection pools defined on a per login basis
  • resource management per login
  • dedicated connections for admin

42
Recovery Log
  • Checkpoints are associated with database dumps
  • Record all updates and transaction markers since
    a checkpoint
  • Used to resynchronize a database from a
    checkpoint
  • JDBCRecoveryLog
  • store information in a database
  • can be re-injected in a C-JDBC cluster for fault
    tolerance

43
SCALABILITY
44
C-JDBC scalability
  • Horizontal scalability
  • prevents the controller to be a Single Point Of
    Failure (SPOF)
  • distributes the load among several controllers
  • uses group communications for synchronization
  • C-JDBC Driver
  • multiple controllers automatic failover
  • jdbcc-jdbc//node125322,node212345/myDB
  • connection caching
  • URL parsing/controller lookup caching

45
C-JDBC scalability
  • Vertical scalability
  • allows nested RAIDb levels
  • allows tree architecture for scalable write
    broadcast
  • necessary with large number of backends
  • C-JDBC driver re-injected in C-JDBC controller

46
C-JDBC vertical scalability
  • RAIDb-1-1with C-JDBC
  • no limit tocompositiondeepness

47
C-JDBC vertical scalability
  • RAIDb-0-1with C-JDBC

48
CHECKPOINTING
49
Fault tolerant recovery log
UPDATE statement
50
Checkpointing
  • Octopus is an ETL tool
  • Use Octopus to store a dump of the initial
    database state
  • Currently done by the user using the database
    specific dump tool

51
Checkpointing
  • Backend is enabled
  • All database updates are logged (SQL statement,
    user, transaction, )

52
Checkpointing
  • Add new backends while system online
  • Restore dump corresponding to initial checkpoint
    with Octopus

53
Checkpointing
  • Replay updates from the log

54
Checkpointing
  • Enable backends when done

55
Making new checkpoints
  • Disable one backend to have a coherent snapshot
  • Mark the new checkpoint entry in the log
  • Use Octopus to store the dump

56
Making new checkpoints
  • Replay missing updates from log

57
Making new checkpoints
  • Re-enable backend when done

58
Recovery
  • A node fails!
  • Automatically disabled but should be fixed or
    changed by administrator

59
Recovery
  • Restore latest dump with Octopus

60
Recovery
  • Replay missing updates from log

61
Recovery
  • Re-enable backend when done

62
HORIZONTAL SCALABILITY
63
Horizontal scalability
  • JGroups for controller synchronization
  • Groups messages for writes only

64
Horizontal scalability
  • Centralized write approach issues
  • Issues with transactions assigned to connections

65
Horizontal scalability
  • General case for a write query
  • 3 multicast 2n unicast

66
Horizontal scalability
  • Solution No backend sharing
  • 1 multicast n unicast 1 multicast

67
Horizontal scalability
  • Issues with JGroups
  • resources needed by a channel
  • instability of throughput with UDP
  • performance scalability
  • TCP better than UDP but
  • unable to disable reliability on top of TCP
  • unable to disable garbage collection
  • ordering implementation is sub-optimal
  • Need for a new group communication layer
    optimized for cluster

68
Horizontal scalability
  • JGroups performance on UDP/FastEthernet

69
USE CASES
70
Budget High Availability
  • High availability infrastructure on a budget
  • Typical eCommercesetup
  • http//www.budget-ha.com

71
OpenUSS University Support System
  • eLearning
  • High availability
  • Portability
  • Linux, HP-UX, Windows
  • InterBase, Firebird, PostgreSQL, HypersonicSQL
  • http//openuss.sourceforge.net

72
Flood alert system
  • Disaster recovery
  • Independent nodes synchronized with C-JDBC
  • VPN for security issues
  • http//floodalert.org

73
J2EE benchmarking
  • Large scaleJ2EE clusters
  • http//jmob.objectweb.org

74
PERFORMANCE
75
TPC-W
  • Browsing mix performance

76
TPC-W
  • Shopping mix performance

77
TPC-W
  • Ordering mix performance

78
Result cache
  • Cache contains a list of SQL-gtResultSet
  • Policy defined by queryPattern-gtPolicy
  • 3 policies
  • EagerCaching variable granularities for
    invalidations
  • RelaxedCaching invalidations based on timeout
  • NoCaching never cached

RUBiS bidding mix with 450 clients No cache Coherent cache Relaxed cache
Throughput (rq/min) 3892 4184 4215
Avg response time 801 ms 284 ms 134 ms
Database CPU load 100 85 20
C-JDBC CPU load - 15 7
79
Outline
  • Motivations
  • RAIDb
  • C-JDBC
  • Performance
  • Lessons learned
  • Conclusion

80
Open problems
  • Partition of clusters
  • Users want control on failure policy
  • Reconciliation must also be user controlled

81
LeWYS overview
Observer
Observer
DREAM
DREAM
DREAM
DREAM
DREAM
Observer
Observer
Monitoring repository
82
LeWYS
83
LeWYS components
  • Library of probes
  • hardware resources cpu, memory, disk, network
  • generic sensors SNMP, JMX, JVMPI,
  • Monitoring pump
  • dynamic deployment of sensors
  • manages monitoring leases
  • Event channels
  • propagate monitored events to interested
    observers
  • allows for filtering, aggregation, content-based
    processing,
  • Optional monitoring repository

84
LeWYS design choices
  • Component-based framework
  • probes, monitoring pump, event channels
  • provides (re)configurability capabilities
  • Minimize intrusiveness on monitored nodes
  • No global clock
  • timestamp generated locally by pump
  • Information processing in DREAM channels

85
Centralized monitoring using a monitoring
repository (1)
86
Centralized monitoring using a monitoring
repository (2)
  • Monitoring repository
  • stores monitoring information
  • service to retrieve monitoring information
  • Pros
  • DB allows for storing large amount of data
  • powerful queries
  • correlate data from various probes at different
    locations
  • resynchronize clocks
  • browsing history to diagnose failures
  • use history for system provisioning
  • Cons
  • requires a DB (heavy weight solution)

87
Outline
  • J2EE Cluster
  • Group communications
  • Monitoring
  • motivations
  • LeWYS
  • implementation
  • Status Perspectives

88
Monitoring pump implemention
Component
ProbeManager
Probe
ProbeManager
CachedProbe
Cache
Probe
Probe
Probe Repository
Binding Controller
MonitoringMumpManager
Monitoring Pump Thread
TimeStamp
MonitoringPump Manager
PullPush Multiplexer
OutputManager
ChannelOut
OutputManager
RMI
Component
89
Hardware Probes
  • Pure Java probes
  • using /proc
  • cost 0.01ms/call (Linux)

cpu, mem, disk, net, kernel, probes
cpu, probes


JNI
JNI
JNI
/proc
.DLL
C
C
C
Linux
Solaris
Windows
Linux
Hardware resources
90
Software Probes
  • Application level monitoring
  • JMX
  • ad-hoc
  • JVM

SNMP, ad-hoc, probes
JVM probes
JMX based probes
JMX
JVMPI
JVM
Linux
Solaris
Windows
Linux
Hardware resources
Write a Comment
User Comments (0)
About PowerShow.com