Title: Minutes of Edos meeting
1Minutes of Edos meeting
- January 2005, Geneva
- Serge Abiteboul
2Part 1
3Organization
- Presentation of participants
- Bryce, Deriaz, Pawlak (Geneva)
- Abiteboul, Vrdoljak (INRIA-Futurs)
- Lauriere, Pop, Warley (MandrakeSoft)
- Milo (Tel Aviv)
- Lavagno, Panto, de Simone (CSP Torino)
- Hodel (Zurich)
- Geneva and measurements see document
- MandrakeSoft and details of the problem
4Some observations
- Most system use mirror moving to P2P
- Usually packaged with bug-tracking system
- Cannot be separated from testing, consistency
other issues frontier? - Variants of distribution have impact
- Mirror works fine for stable
- Bittorrent works fine for snapshots such as beta
- None works nice for many modules changing very
frequently - Limitations of bittorrent
- Problem of too many files because of several
versions - Problem of getting a set of coherent files
- Different formats for packages and data
integration - Technological watch
- OceanStore
- New version of Bittorent
5Decision
- Cooperative work twiki
- Boris and Serge move the old stuff
- Install the slides of presentation
6Second part
7Purpose
- Understand the problem
- Brainstorming
- Articulate the RD issues
- Each group
- Isolate problems you would be interested in
working on
8Organization
- The problem
- Possible architectures
- Issues
- New functionalities
- Deliverables
- The issues (who works on what)
9The problem
10Problem definition
- A central authority distributes some data to a
community of users - The data consists of a large collection of
objects - The data keeps changing versions
- Coherence must be maintained
- Replication is needed to improve access
- Centralized updates
- Data is versioned
- Many similar situations
- Web site maintenance (more emphasis on coherence)
- Distributed databases with replication
11Formal model (needs more work)
- Release e.g. release EiffelTower
- (O1-K1)(O55,34) (On-Kn)
- The objects are organized in a hierarchy with
object sharing (and views) - Notion of coherent collections such as versions
of release - Main issue is naming of objects and collections
- Updates (from unique source)
- Start a new release PisaTower 4 (OI,Ki) a
new version of the release, new coherent snapshot
(rather frequent) - Add a new object version (Oi,Ki1) in the current
release - Add a new object (NewObject,1) in the current
release - Remove an object
- Access (for all developers or users?)
- Metadata What is the (current) release? What is
in it? - Data Get (Oi,Ki) or get the last Oi the set of
needed packages Get generic sets of objects
typically a directory - get all the current release
- get EiffelTower.System5.Set3 ???
- Get Deltas
12Other issues
- Security and access rights
- Subscriptions
- Constraints
- Degree of autonomy of partners
- Resources number of servers, bandwidth, disc
13Refine the model anddefine the Edos/distribution
API
- This is the next task to achieve?
- End of the year?
- Java or Web service based API?
- Not really a big issue for the moment
- XML vs. AXML possible
- UML?
14Possible architectures
15Alternatives
- Alternatives for package distribution
- Hierarchical mirrors, P2P, mixture of both
- Alternatives for indexing
- Centralized, replication, distributed via DHT
- Data vs. metadata
- In principle, could use different strategies
- What is expected from distributors
- Degree of autonomy
- software necessary and support
- Push and pull
16One extreme the current
Hierarchical distribution Limited central index
of DistribServ Index on each DistribServ Total
independence Small number 50 All pull
Publish its content
17Other extreme AllP2P
P2P distribution Distributed index Less
independence Possibly larger number Pull or push
18Issues
19Issues in the hierarchical architecture
- Distribution
- Improve the quality of meta data Mandrakesoft has
from mirrors (Describe mirrors policy in Active
XML What is stored automatic frequency of
downloads) - Freshness Push to mirrors and tools to select
policy for servers - More servers up to date so better accessibility
- Access
- Improve a central index managed by Mandrake
- Which package sits where with freshness
coherence of sites - Keep more versions of packages at source
- Propagate to all the servers tools do not allow
for more customization - More convenient tool to mirror
- Support for load balancing
20Hierarchical - continued
- Delta could be improved
- Rsync is based on packages and sometimes two
packages with different names have the almost the
same value (in particular source and
documentation) - Issue is granularity needs deltas on packages
- minor gain
- Can we change something on the mirrors?
- Probably
- Improvements so that they have incentive to do it
- Do we want to follow this path?
- Fits actual needs so is worth to continue it
- Improve it
21Issues in P2P architecture
- Distribution/storage in P2P?
- Access/indexing in P2P?
22Issues in P2P - continued
- What is different from DHT?
- Update and versioning (see OceanStore)
- Does not manage evolving sets of files metadata
- Notion of view
- Different kinds of P2P
- DHT
- Tell you friend (Gnutella like)
- Flooding
- What is stored on a peer
- Release? package? coherence?
- Centralized vs. distributed metadata/index
- How much replication?
23Use of Active XML
- For modeling e.g. who has/does what
- For managing information both pull and push
- For obtaining and managing measures (bandwidth)
- Limitation every peer must have a AXML server
- Reliability to be improved
- Size small for a software
- Environment Web server
- XML is seen negatively by the community? Effort
for describing RPM in XML - One alternative we will follow but not only one
242 issues
- Code quality in the consortium
- Expectation prototype level
- Open source style open source community is
welcome in taking part of it - Social issue
- Understand how the system operates (how the
community operates) and so how it can be improved
25Meta data what?
- Package
- URI, Name and version number
- Architecture, format (source, binary, doc),
- Content and signature package owner Mandrake
machine where the package was created - Signature, Date, summary, full description, size
when installed - Dependencies
- For the source package, the URI where they find
the source - License information (now static but could be
virtual little used) - Virtual information which releases it belongs to
and which environments it has been compiled - Virtual information bug status
- Change log between different versions text (I
fixed this bug) - Group field editors, perl development some
categorization in plain text - URL of the project
26Meta data - continued
- Other granularity such as version of a release
- Collection of all meta-data of all the packages
in it - Information of servers
- What is available where it comes from, when it
was obtained - Statistics of servers, trust, other servers, etc.
- No real delta
- Remark local jargon
- Source given (architecture src)
- Binary Something built from a source package
(architecture sparc, or no-arch) - Example 1
- Source documentation in txt is the source where
to install it - Binary documentation When installed
- Example 2
- Source XML and binary HTML
27Choice of architectures
- Limits of mirroring scaling
- Limits of P2P updating is more difficult
- Combine both use the hierarchy of mirrors to
reflect rapid changes and eventually flood them
in P2P system - Difficult to compare the architecture
- Do some evaluation and measurements
- Measures and analyse the actual system
28Security
- Fake files
- How important is security?
- Signature from Mandrakesoftware
- Security holes in the up-load procedure and
central server - Should be detected when tested
- Not fully sure (e.g., Trojan horse)
29New functionalities and motivations for
convincing peers to use a new system
- Facilitate the management of distribution (easy
to use interface) without loosing control - Improve access
- Subscriptions
- Incentive better service if you are a good guy
- Issue of trust and cheaters
- sprobe.cs.washington.edu measure peers bandwidth
- Customizing versions
- Big company that want to distribute its own
version of Mandrake expended with some software - Also loc al developers
30New functionalities
- User friendly catalog is put together by
Mandrakesoft - Incentive to interest the Mandrake community of
users - Client application to share RPM at the package
level with coherence - Subscription mechanism notification
- May be useful in test phase
- UDDI like repository
- I want a software from gif to jpeg out of scope
but fun
31Research Issues
- Modeling of the problem and declarative
specification of a peer policy (with AXML) - Optimization of distribution
- Flooding of a new release/package update
- Optimization of access and download
- Degree of replication and selection of the copy
- Replication in a P2P/DHT evaluation and control
- Customizing and other new functionalities
- Monitoring (for Mandrake and users)
- Deltas and incremental
32The future
33Deliverables
- 1st deliverable docMetrics
- docStateArt review of the state of the art
- (Turn it into a published paper?)
- Next pass at it Geneva (short) then INRIA that
splits it in 2 and Place them on the twiki -
EMAIL - Others may include their contribution in it
34Precise description of the distribution process
docModel
- Glossary of the domain
- Functional aspect of the process
- Define API
- Must contain features not present (Extensible)
- Frontiers at least distribution not forbidden
to think of more - Syntax
- Java, Web services, UML, RDF, BPEL
- None of the above
35Who is interested by what?
- Mandrakesoft improvements of current
architecture, metrics (tests and measures), API,
state of the art - Geneva measure, state of the art, API (short),
security - Zurich distributed databases
- Torino API, security, students as testers
- INRIA measures, state, API, P2P architecture,
security? - Tel Aviv architecture, state of the art, load
balancing, meta data - Use Nuxeo as sanity test because also interested
in the distribution process ask them
36Task leaders
- Measurements Ciaran
- State of the art Radu and Tova
- API Serge
- Security? Torino
- Last week of April phone meeting
- Start a Blog for each group very short
descriptions of what is going on
37The end
- Thanx Geneva for the organization