Going Large-Scale in P2P Experiments Using the JXTA Distributed Framework

About This Presentation

Title:

Going Large-Scale in P2P Experiments Using the JXTA Distributed Framework

Description:

A file containing the list of nodes and the path of the JVM on each node. 8 ... JDF uses the notion of delay. Time to wait before launching peers ... –

Number of Views:60

Avg rating:3.0/5.0

Slides: 31

Provided by: non8130

Category:

more less

Transcript and Presenter's Notes

Title: Going Large-Scale in P2P Experiments Using the JXTA Distributed Framework

1
Going Large-Scale in P2P Experiments Using the
JXTA Distributed Framework

Mathieu Jan Sébastien Monnet
Projet PARIS

Paris, 13 February 2004
2
Outline

How to test P2P systems at a large-scale?
The JDF tool
Experimenting with various network configurations
Experimenting with various volatility conditions
Ongoing and future work

3
How to test P2P systems at a large-scale?

How to reproduce and test P2P systems?
Volatility
Heterogeneous architectures
Large-scale
Many papers on Gnutella, KaZaA, etc
Behavior not yet fully understood
Experiments on CFS, PAST, etc
Mostly simulation
Real experiments up to a few tens of physical
nodes
Large-scale (thousands of node) via emulation
The methodology for testing in not discussed
Deployment
How to control the volatility?
A need for infrastructures

4
Solutions used for testing P2P prototypes

Simulation
Results are reproducible
May require significant adaptations
Simplified model compared to the reality
Emulation
Configure network with various characteristics
Heterogeneity not fully captured
Results are not reproducible
Deployment and management
Experiments on real testbeds
Needed step when validating software
Real heterogeneity
Results are not reproducible
Deployment and management

5
Conducting JXTA-based experiments with JDF (1/2)

A framework for automated testing of JXTA-based
systems from a single node (control node)
http//jdf.jxta.org/
Two modes
Run one distributed test
Multiple tests called batch mode (useful with
crontab)
We added the support of PBS

6
Conducting JXTA-based experiments with JDF (2/2)

Hypothesis
All the nodes must be visible by the control
node
Requirements
Java Virtual Machine
Bourne shell
SSH/RSH configured to run with no password on
each node
JDF several shell scripts
Deployment of the needed resources for a test or
several tests
Jar files and script used on each node
Configuration of JXTA peers
Launching peers
Collect logs and results files of each node
Analyze results on the control node
Cleanup deployed and generated files for the test
Kill remaining processes
Update resources for a test

7
How to define a test using JDF?

An XML description file of the JXTA-based network
Type of peers (rendezvous, edge peers)
How peers are interconnected, etc
A set of Java classes describing the behavior of
each peer
Extend the JDFs framework (start, stop JXTA,
etc)
A Java class for analyzing collected results
A file containing the list of nodes and the path
of the JVM on each node

8
Describing a simple JuxMem network (1/2)

Notion of profile
A set of peers having the same behavior
Instance attribute of profile
Specify the total number of nodes hosting this
type of peer via the instance attribute
Instance attribute of peer
Specify the total number of peers of this type on
1 node
Simplest example one cluster manager and 1
provider

Provider A
Cluster Manager A
Cluster A group
9
Describing a simple JuxMem network (2/2)

ltprofile nameclusterManagerA" instances"1 gt
ltpeer base-nameclusterManagerA"
instances"1 /gt
ltrdvs is-rdv"true /gt
lttransportsgt
lttcp enabled"true" base-port"13000 /gt
lt/transportsgt
ltbootstrap class"juxmem.service.test.load.Cluste
rManagergt ltjvmarg valuexxxx /gt
ltarg valuexxx /gt
lt/bootstrapgt
lt/profilegt
ltprofile name"providerA" instances1 gt
ltpeer base-name"providerA" instances"1 /gt
ltrdvs is-rdv"false"gt
ltrdv clusterclusterManagerA"/gt
lt/rdvsgt
lttransportsgt
lttcp enabled"true" base-port"13000"/gt
lt/transportsgt

10
A more complex JuxMem network (1/2)
cluster C group
cluster A group
juxmem group
cluster B group
11
A more complex JuxMem network (2/2)

ltprofile nameclusterManagerA"
instances"1"gt lt/profilegt
ltprofile nameclusterManagerB"
instances"1"gt lt/profilegt
ltprofile nameclusterManagerC" instances"1"gt
lt/profilegt
ltprofile name"providerA" instances42 gt
ltpeer base-name"providerA" instances4 /gt
ltrdv clusterclusterManagerA"/gt
lt/profilegt
ltprofile name"providerB" instances42 gt
ltpeer base-name"providerC" instances5 /gt
ltrdv clusterclusterManagerB"/gt
lt/profilegt
ltprofile name"providerC" instances35 gt
ltpeer base-name"providerC" instances6 /gt
ltrdv clusterclusterManagerC"/gt

12
Usage of JDFs scripts

runAll.sh ltflagsgt ltlist-of-hostsgt
ltnetwork-descriptorgt
-debug show all script commands executed
-unsecure use rsh instead of ssh
-cleanup cleanup JDF directory on each host
-bundle create bundle for distribution
-install install distribution bundle
-update update files on each peer
-config configure JXTA network
-kill kill existing JDF processes
-run run test
-nohup run and return without waiting for peers
to exit
-analyze analyze test results
-log keep test results and log4j logs from peers
batchAll.sh ltflagsgt ltfile-listing-each-test-dire
ctoriesgt

13
Experimental results with JDF (1/2)

Experimental setup
Distributed ASCI Supercomputer 2 (DAS-2) managed
by PBS (The Netherlands)
5 clusters for a total number of 200 Dual 1-GHz
Pentium-III nodes
Site mainly used 72 nodes
SSH/SCP used
Experiments with JDF on up to 64 nodes
Deployment of JXTA JDF JuxMem
Configuration of JuxMem peers
Update only JuxMem

14
Experimental results with JDF (2/2)
15
Standard JDF vs Optimized JDF
16
Launching peers

For each peer a JVM is started
Several JXTA can not share the same JVM
How to deal with connections between edge and
rendezvous peers?
Rendezvous peers must be started before edge
peers
JDF uses the notion of delay
Time to wait before launching peers
Need a mechanism for distributed synchronization

17
Getting the logs and the results

Framework of JDF
Start and stop JXTA (net peergroup as well as
custom groups as in JuxMem)
Store the results in a property file
Retrieve log files generated on each node
Library used Log4j
Files starting with log.
Retrieve result files on each node
The specified analyze class is called
Display results

18
Experimenting with various volatility conditions

Goals
Provide multiple failure conditions
Experiment various failure detection techniques
Experiment various replication strategies
Identify class of application and system states
Adapt fault tolerance mechanisms

19
Providing multiple failure conditions

Go large scale
Control faults upon thousands of nodes
Precision
Possibility to kill a node at a given time/state
Some nodes may be fail-safe
Easy to use
Changing the failure model should not affect the
code being tested

20
Failure injection going large scale

Using statistical distributions
Advantages
Ease of use permit to generate multiple failure
dates automatically
Suitable large scale
Which statistical distributions ?
Exponential (to model life expectancy)
Uniform (to choose between numerous nodes)

21
Failure injection precision

Why ?
Play the role of the enemy
Kill a node that handles a lock
Kill multiple nodes during some data replication
Model reality
Some nodes may be almost fail-safe
A particular node may have a very high MTBF
How ?
Combine statistical flows and a more precise
configuration file

22
Failure injection in JDF design

Add a unique configuration file
Generated by a set of tools
Using The Probability/Statistics Object Library
(http//www.math.uah.edu/psol)
Deployed on each node by JDF
Launch a new Java thread
Reads the configuration file
Sleeps for a while
Kills its node at a given time

23
Failure injection execution flow
New Killer().start()
read
suicide
kill
Configuration file (fi.properties)
write
kill
Thread Killer
write
Main flow (test class)
Result file (fi.results)
24
Failure injection sample experiment

64 peers running on 64 nodes
Creating fi.properties for an initial MTBF of 1
minute
Each node life time follows an exponential law
with a rate of 1/64
With JDF it becomes easy to use
java cp .PSOL.jar CreateFiProperties 60000
new fi.Killer.start() // in the test class
runAll.sh -cleanup with-nfs -install -config
-run -analyze -log paraci_01-64 test.xml

25
Failure injection sample experiment
26
Failure injection ongoing work

Time deviation
Initial time (t0)
Clocks drift
Tools to precisely specify fi.properties
Suicide interface (event handler)
More flexibility

27
Failure detection and replication strategies

Running the same test multiple times
Failure detection
Change the failure detection techniques
Tune the ? (delay between heartbeats)
Which ? for which MTBF
Replication strategies
Adapt replication degree to current MTBF (level
of risk)
Experiment multiple replication strategies in
various conditions (failures/detection)

28
Fault tolerance in JuxMem road map