Going Large-Scale in P2P Experiments Using the JXTA Distributed Framework

About This Presentation
Title:

Going Large-Scale in P2P Experiments Using the JXTA Distributed Framework

Description:

A file containing the list of nodes and the path of the JVM on each node. 8 ... JDF uses the notion of delay. Time to wait before launching peers ... –

Number of Views:60
Avg rating:3.0/5.0
Slides: 31
Provided by: non8130
Category:

less

Transcript and Presenter's Notes

Title: Going Large-Scale in P2P Experiments Using the JXTA Distributed Framework


1
Going Large-Scale in P2P Experiments Using the
JXTA Distributed Framework
  • Mathieu Jan Sébastien Monnet
  • Projet PARIS

Paris, 13 February 2004
2
Outline
  • How to test P2P systems at a large-scale?
  • The JDF tool
  • Experimenting with various network configurations
  • Experimenting with various volatility conditions
  • Ongoing and future work

3
How to test P2P systems at a large-scale?
  • How to reproduce and test P2P systems?
  • Volatility
  • Heterogeneous architectures
  • Large-scale
  • Many papers on Gnutella, KaZaA, etc
  • Behavior not yet fully understood
  • Experiments on CFS, PAST, etc
  • Mostly simulation
  • Real experiments up to a few tens of physical
    nodes
  • Large-scale (thousands of node) via emulation
  • The methodology for testing in not discussed
  • Deployment
  • How to control the volatility?
  • A need for infrastructures

4
Solutions used for testing P2P prototypes
  • Simulation
  • Results are reproducible
  • May require significant adaptations
  • Simplified model compared to the reality
  • Emulation
  • Configure network with various characteristics
  • Heterogeneity not fully captured
  • Results are not reproducible
  • Deployment and management
  • Experiments on real testbeds
  • Needed step when validating software
  • Real heterogeneity
  • Results are not reproducible
  • Deployment and management

5
Conducting JXTA-based experiments with JDF (1/2)
  • A framework for automated testing of JXTA-based
    systems from a single node (control node)
  • http//jdf.jxta.org/
  • Two modes
  • Run one distributed test
  • Multiple tests called batch mode (useful with
    crontab)
  • We added the support of PBS

6
Conducting JXTA-based experiments with JDF (2/2)
  • Hypothesis
  • All the nodes must be visible by the control
    node
  • Requirements
  • Java Virtual Machine
  • Bourne shell
  • SSH/RSH configured to run with no password on
    each node
  • JDF several shell scripts
  • Deployment of the needed resources for a test or
    several tests
  • Jar files and script used on each node
  • Configuration of JXTA peers
  • Launching peers
  • Collect logs and results files of each node
  • Analyze results on the control node
  • Cleanup deployed and generated files for the test
  • Kill remaining processes
  • Update resources for a test

7
How to define a test using JDF?
  • An XML description file of the JXTA-based network
  • Type of peers (rendezvous, edge peers)
  • How peers are interconnected, etc
  • A set of Java classes describing the behavior of
    each peer
  • Extend the JDFs framework (start, stop JXTA,
    etc)
  • A Java class for analyzing collected results
  • A file containing the list of nodes and the path
    of the JVM on each node

8
Describing a simple JuxMem network (1/2)
  • Notion of profile
  • A set of peers having the same behavior
  • Instance attribute of profile
  • Specify the total number of nodes hosting this
    type of peer via the instance attribute
  • Instance attribute of peer
  • Specify the total number of peers of this type on
    1 node
  • Simplest example one cluster manager and 1
    provider

Provider A
Cluster Manager A
Cluster A group
9
Describing a simple JuxMem network (2/2)
  • ltprofile nameclusterManagerA" instances"1 gt
  • ltpeer base-nameclusterManagerA"
    instances"1 /gt
  • ltrdvs is-rdv"true /gt
  • lttransportsgt
  • lttcp enabled"true" base-port"13000 /gt
  • lt/transportsgt
  • ltbootstrap class"juxmem.service.test.load.Cluste
    rManagergt ltjvmarg valuexxxx /gt
  • ltarg valuexxx /gt
  • lt/bootstrapgt
  • lt/profilegt
  • ltprofile name"providerA" instances1 gt
  • ltpeer base-name"providerA" instances"1 /gt
  • ltrdvs is-rdv"false"gt
  • ltrdv clusterclusterManagerA"/gt
  • lt/rdvsgt
  • lttransportsgt
  • lttcp enabled"true" base-port"13000"/gt
  • lt/transportsgt

10
A more complex JuxMem network (1/2)
cluster C group
cluster A group
juxmem group
cluster B group
11
A more complex JuxMem network (2/2)
  • ltprofile nameclusterManagerA"
    instances"1"gt lt/profilegt
  • ltprofile nameclusterManagerB"
    instances"1"gt lt/profilegt
  • ltprofile nameclusterManagerC" instances"1"gt
    lt/profilegt
  • ltprofile name"providerA" instances42 gt
  • ltpeer base-name"providerA" instances4 /gt
  • ltrdv clusterclusterManagerA"/gt
  • lt/profilegt
  • ltprofile name"providerB" instances42 gt
  • ltpeer base-name"providerC" instances5 /gt
  • ltrdv clusterclusterManagerB"/gt
  • lt/profilegt
  • ltprofile name"providerC" instances35 gt
  • ltpeer base-name"providerC" instances6 /gt
  • ltrdv clusterclusterManagerC"/gt

12
Usage of JDFs scripts
  • runAll.sh ltflagsgt ltlist-of-hostsgt
    ltnetwork-descriptorgt
  • -debug show all script commands executed
  • -unsecure use rsh instead of ssh
  • -cleanup cleanup JDF directory on each host
  • -bundle create bundle for distribution
  • -install install distribution bundle
  • -update update files on each peer
  • -config configure JXTA network
  • -kill kill existing JDF processes
  • -run run test
  • -nohup run and return without waiting for peers
    to exit
  • -analyze analyze test results
  • -log keep test results and log4j logs from peers
  • batchAll.sh ltflagsgt ltfile-listing-each-test-dire
    ctoriesgt

13
Experimental results with JDF (1/2)
  • Experimental setup
  • Distributed ASCI Supercomputer 2 (DAS-2) managed
    by PBS (The Netherlands)
  • 5 clusters for a total number of 200 Dual 1-GHz
    Pentium-III nodes
  • Site mainly used 72 nodes
  • SSH/SCP used
  • Experiments with JDF on up to 64 nodes
  • Deployment of JXTA JDF JuxMem
  • Configuration of JuxMem peers
  • Update only JuxMem

14
Experimental results with JDF (2/2)
15
Standard JDF vs Optimized JDF
16
Launching peers
  • For each peer a JVM is started
  • Several JXTA can not share the same JVM
  • How to deal with connections between edge and
    rendezvous peers?
  • Rendezvous peers must be started before edge
    peers
  • JDF uses the notion of delay
  • Time to wait before launching peers
  • Need a mechanism for distributed synchronization

17
Getting the logs and the results
  • Framework of JDF
  • Start and stop JXTA (net peergroup as well as
    custom groups as in JuxMem)
  • Store the results in a property file
  • Retrieve log files generated on each node
  • Library used Log4j
  • Files starting with log.
  • Retrieve result files on each node
  • The specified analyze class is called
  • Display results

18
Experimenting with various volatility conditions
  • Goals
  • Provide multiple failure conditions
  • Experiment various failure detection techniques
  • Experiment various replication strategies
  • Identify class of application and system states
  • Adapt fault tolerance mechanisms

19
Providing multiple failure conditions
  • Go large scale
  • Control faults upon thousands of nodes
  • Precision
  • Possibility to kill a node at a given time/state
  • Some nodes may be fail-safe
  • Easy to use
  • Changing the failure model should not affect the
    code being tested

20
Failure injection going large scale
  • Using statistical distributions
  • Advantages
  • Ease of use permit to generate multiple failure
    dates automatically
  • Suitable large scale
  • Which statistical distributions ?
  • Exponential (to model life expectancy)
  • Uniform (to choose between numerous nodes)

21
Failure injection precision
  • Why ?
  • Play the role of the enemy
  • Kill a node that handles a lock
  • Kill multiple nodes during some data replication
  • Model reality
  • Some nodes may be almost fail-safe
  • A particular node may have a very high MTBF
  • How ?
  • Combine statistical flows and a more precise
    configuration file

22
Failure injection in JDF design
  • Add a unique configuration file
  • Generated by a set of tools
  • Using The Probability/Statistics Object Library
    (http//www.math.uah.edu/psol)
  • Deployed on each node by JDF
  • Launch a new Java thread
  • Reads the configuration file
  • Sleeps for a while
  • Kills its node at a given time

23
Failure injection execution flow
New Killer().start()
read
suicide
kill
Configuration file (fi.properties)
write
kill
Thread Killer
write
Main flow (test class)
Result file (fi.results)
24
Failure injection sample experiment
  • 64 peers running on 64 nodes
  • Creating fi.properties for an initial MTBF of 1
    minute
  • Each node life time follows an exponential law
    with a rate of 1/64
  • With JDF it becomes easy to use
  • java cp .PSOL.jar CreateFiProperties 60000
  • new fi.Killer.start() // in the test class
  • runAll.sh -cleanup with-nfs -install -config
    -run -analyze -log paraci_01-64 test.xml

25
Failure injection sample experiment
26
Failure injection ongoing work
  • Time deviation
  • Initial time (t0)
  • Clocks drift
  • Tools to precisely specify fi.properties
  • Suicide interface (event handler)
  • More flexibility

27
Failure detection and replication strategies
  • Running the same test multiple times
  • Failure detection
  • Change the failure detection techniques
  • Tune the ? (delay between heartbeats)
  • Which ? for which MTBF
  • Replication strategies
  • Adapt replication degree to current MTBF (level
    of risk)
  • Experiment multiple replication strategies in
    various conditions (failures/detection)

28
Fault tolerance in JuxMem road map
  • Finalize failure injection tools
  • Experiment Marin Bertiers failure detectors with
    JXTA/JDF
  • Integrate of the failure detectors in JuxMem
  • Experiment with various replication strategies
  • Automatic adaptation

29
Ongoing work
  • Improving JDF
  • There is a lot to do ?
  • Enable concurrent tests via PBS
  • Submitting issues to bugzilla
  • Write more tests for JuxMem
  • Measuring the cost of elementary operations in
    JuxMem
  • Various consistency protocols at large-scale
  • Benchmarking other elementary steps of JDF
  • Launch peers
  • Collect result and log files
  • Use of emulation tools like Dummynet or NIST NET
  • Visit of Fabio Picconi at IRISA

30
Future work
  • Hierarchical deployment
  • Ka-run/Taktuk-like (ID IMAG)
  • Distributed synchronization mechanism
  • Support more complex tests
  • Allow the use of JDF over Globus
  • Support other protocols than SSH/RSH
  • Especially when updating resources
Write a Comment
User Comments (0)
About PowerShow.com