Title: UP2P
1U-P2P
- A Peer-to-peer System for Description and
Discovery of Resource-sharing Communities - Aloke Mukherjee, Carleton University
- August 28, 2003
2Peer-to-peer File-sharing
- Exploit storage capability of the edge
- Balance load
- Robustness to failure
- Weaknesses Search and Communities
3Search Problem
- Lack of structured metadata
- Filenames, Keyword matching
- Opaque identifiers
- Support for popular formats
- Ignoring structured metadata
- Implicit indicators
- Collaborative filtering
4State of the Art Search
5Community Problem
- Not simple to create a community for sharing a
new file format - Current state
- Different protocols/apps (gnutella, fasttrack,
jxtasearch) - Inadequate metadata (filename matching, limited
schemas) - Ad-hoc attempts aimed at specific domains
- Scattered and isolated there is no easy way to
discover communities
6State of the Art Communities
7Improving Search
- Standard metadata layer
- Explicit structured metadata
- All resources are XML files
- XML Schema used to describe format (e.g. MP3,
design pattern)
8Schema instantiates resource
typestring typestring typestring typestring typestring typeanyURI
singleton
gang of four when
creating a new class ensure
a class only has make the
class itself responsible
http//example.com/singleton.jpg
9Automated interface generation
xslt
instantiates
xslt
10(No Transcript)
11(No Transcript)
12Community Creation and DiscoveryWhat is a
Community?
- Concrete object with defined tuple of attributes
- Simplest form (format, protocol, )
- Known examples
- (mp3, napster) (video, kazaa)
- Examples that dont exist
- (design patterns, gnutella) (p2p papers,
jxtasearch) - Tuple is specified as a XML file
13Simplifying Community Creation
designpatterns
designpattern.xsd
gnutella designpatt
ern.stylesheet
- User-designed communities
- Compose schema to describe format
- Compose community XML file
14Community as class
15Metaclass analogy
16Community discovery is File discovery
- MP3 community shares MP3 files
- Community community shares communities
17Simplifying Community Discovery
- A Community for Communities The Root Community
- Communities are files shared in a real community
- Root Community includes schema for communities
- (format, protocol) (community, centralized db)
18Schema for Communities
name"name" type"xsdstring"/ name"protocol" type"protocolTypes"/
root community
community.xsd
central-db
community.stylesheet y
The Root Community
19What is U-P2P?
- A framework that breathes life into these ideas
- Explicit metadata search and creation for every
Community - Creation of Community tuples
- (format, protocol etc)
- Discovery of Community tuples
20Design
21Technologies
- Java
- Tomcat Servlet Container
- Java Server Pages (JSP) Servlets
- XSLT (transforms), XPath (queries)
- Java components for XSLT, XPath (Xerces, Xalan)
- eXist XML Database
- Log4j (logging infrastructure), JUnit (unit
testing)
22Evaluation and Validation Areas of Interest
- Publish and Search times as Community size
increases - Breaking down Publish and Search operations
- Community effect
- Multiple central servers
23Publish
24Search
25Community Effect
26Multiple Central Servers
27Publish with Multiple Servers
28Vs. Without Multiple Central Servers
29Contributions
- Standard Metadata Layer
- All communities include support for explicit
metadata search and creation - User-designed Communities
- Users can easily share new formats with full
support for metadata - Community for Communities
- Prevents fragmented, isolated communities by
providing metadata about communities and a
standard method for discovering them - Performance and Scalability Gains
- Communities can improve performance and
scalability vs. systems where resources are
undifferentiated
30Future Work
- Performance improvements
- Protocol independence (adapters for Gnutella,
Freenet, etc.) - Community-aware Gnutella routing
- More Community parameters (security,
authentication, etc.)
31Future Work continued
- Trust metrics (to differentiate between
communities, metadata quality) - Community evolution
- Inheritance and multiple inheritance for
Communities
32U-P2P Publications
- A. Mukherjee, B. Esfandiari, N. Arthorne, U-P2P
A Peer-to-peer System for Description and
Discovery of Resource-sharing Communities, ICDCS
Workshops 2002 701-705, July 2002. - Neal Arthorne, Babak Esfandiari and Aloke
Mukherjee, "U-P2P A Peer-to-peer Framework for
Universal Resource Sharing and Discovery,
Proceedings of Freenix track of Usenix 2003,
29-38, June 2003. - http//u-p2p.sourceforge.net
33Backup slides
34WebAdapter User Interaction Model
35Repository Design
36Repository Design Resource IDs
37Repository Design XML Database
- Requirements
- Flexibility to store wide variety of formats
- Handle powerful queries over all metadata
- XML Database better suited than RDBMS
- Difficult to map fields to rows and columns
- Chose eXist XML database
- Open source
- Written in Java
- Support for XMLDB API
38Network Adapter Design
- Abstract interface to Peer-to-peer Network
- Routing search requests, handling results, handle
incoming search requests, etc. - Only implemented Hybrid model (Napster model)
- All peers can act as client and/or server
39Network Adapter Protocol
40Evaluation and Validation Challenges
- Finding large XML collections
- Berkeley Drosophila Genome Project genome
annotations - Other sources DBLP (CS papers), EDGAR (SEC
filings), GeneOntology (gene-related concepts) - Transforming DTDs to XML Schema (DTDXS package)
- Automation
- XML-RPC interface for publish and search
41Publish Breakdown of Operations
42Publish Client Timings
43Publish Server Timings
44Network Adapter Protocol
45Search Breakdown of Operations
46Search Total vs. Server Timings