Minutes of Edos meeting - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Minutes of Edos meeting

Description:

Content and signature; package owner; Mandrake machine where the package was created ... License information (now static but could be virtual little used) ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 38
Provided by: edospr
Category:

less

Transcript and Presenter's Notes

Title: Minutes of Edos meeting


1
Minutes of Edos meeting
  • January 2005, Geneva
  • Serge Abiteboul

2
Part 1
3
Organization
  • Presentation of participants
  • Bryce, Deriaz, Pawlak (Geneva)
  • Abiteboul, Vrdoljak (INRIA-Futurs)
  • Lauriere, Pop, Warley (MandrakeSoft)
  • Milo (Tel Aviv)
  • Lavagno, Panto, de Simone (CSP Torino)
  • Hodel (Zurich)
  • Geneva and measurements see document
  • MandrakeSoft and details of the problem

4
Some observations
  • Most system use mirror moving to P2P
  • Usually packaged with bug-tracking system
  • Cannot be separated from testing, consistency
    other issues frontier?
  • Variants of distribution have impact
  • Mirror works fine for stable
  • Bittorrent works fine for snapshots such as beta
  • None works nice for many modules changing very
    frequently
  • Limitations of bittorrent
  • Problem of too many files because of several
    versions
  • Problem of getting a set of coherent files
  • Different formats for packages and data
    integration
  • Technological watch
  • OceanStore
  • New version of Bittorent

5
Decision
  • Cooperative work twiki
  • Boris and Serge move the old stuff
  • Install the slides of presentation

6
Second part
  • Brainstorming

7
Purpose
  • Understand the problem
  • Brainstorming
  • Articulate the RD issues
  • Each group
  • Isolate problems you would be interested in
    working on

8
Organization
  • The problem
  • Possible architectures
  • Issues
  • New functionalities
  • Deliverables
  • The issues (who works on what)

9
The problem
10
Problem definition
  • A central authority distributes some data to a
    community of users
  • The data consists of a large collection of
    objects
  • The data keeps changing versions
  • Coherence must be maintained
  • Replication is needed to improve access
  • Centralized updates
  • Data is versioned
  • Many similar situations
  • Web site maintenance (more emphasis on coherence)
  • Distributed databases with replication

11
Formal model (needs more work)
  • Release e.g. release EiffelTower
  • (O1-K1)(O55,34) (On-Kn)
  • The objects are organized in a hierarchy with
    object sharing (and views)
  • Notion of coherent collections such as versions
    of release
  • Main issue is naming of objects and collections
  • Updates (from unique source)
  • Start a new release PisaTower 4 (OI,Ki) a
    new version of the release, new coherent snapshot
    (rather frequent)
  • Add a new object version (Oi,Ki1) in the current
    release
  • Add a new object (NewObject,1) in the current
    release
  • Remove an object
  • Access (for all developers or users?)
  • Metadata What is the (current) release? What is
    in it?
  • Data Get (Oi,Ki) or get the last Oi the set of
    needed packages Get generic sets of objects
    typically a directory
  • get all the current release
  • get EiffelTower.System5.Set3 ???
  • Get Deltas

12
Other issues
  • Security and access rights
  • Subscriptions
  • Constraints
  • Degree of autonomy of partners
  • Resources number of servers, bandwidth, disc

13
Refine the model anddefine the Edos/distribution
API
  • This is the next task to achieve?
  • End of the year?
  • Java or Web service based API?
  • Not really a big issue for the moment
  • XML vs. AXML possible
  • UML?

14
Possible architectures
15
Alternatives
  • Alternatives for package distribution
  • Hierarchical mirrors, P2P, mixture of both
  • Alternatives for indexing
  • Centralized, replication, distributed via DHT
  • Data vs. metadata
  • In principle, could use different strategies
  • What is expected from distributors
  • Degree of autonomy
  • software necessary and support
  • Push and pull

16
One extreme the current
Hierarchical distribution Limited central index
of DistribServ Index on each DistribServ Total
independence Small number 50 All pull
Publish its content
17
Other extreme AllP2P
P2P distribution Distributed index Less
independence Possibly larger number Pull or push
18
Issues
19
Issues in the hierarchical architecture
  • Distribution
  • Improve the quality of meta data Mandrakesoft has
    from mirrors (Describe mirrors policy in Active
    XML What is stored automatic frequency of
    downloads)
  • Freshness Push to mirrors and tools to select
    policy for servers
  • More servers up to date so better accessibility
  • Access
  • Improve a central index managed by Mandrake
  • Which package sits where with freshness
    coherence of sites
  • Keep more versions of packages at source
  • Propagate to all the servers tools do not allow
    for more customization
  • More convenient tool to mirror
  • Support for load balancing

20
Hierarchical - continued
  • Delta could be improved
  • Rsync is based on packages and sometimes two
    packages with different names have the almost the
    same value (in particular source and
    documentation)
  • Issue is granularity needs deltas on packages
  • minor gain
  • Can we change something on the mirrors?
  • Probably
  • Improvements so that they have incentive to do it
  • Do we want to follow this path?
  • Fits actual needs so is worth to continue it
  • Improve it

21
Issues in P2P architecture
  • Distribution/storage in P2P?
  • Access/indexing in P2P?

22
Issues in P2P - continued
  • What is different from DHT?
  • Update and versioning (see OceanStore)
  • Does not manage evolving sets of files metadata
  • Notion of view
  • Different kinds of P2P
  • DHT
  • Tell you friend (Gnutella like)
  • Flooding
  • What is stored on a peer
  • Release? package? coherence?
  • Centralized vs. distributed metadata/index
  • How much replication?

23
Use of Active XML
  • For modeling e.g. who has/does what
  • For managing information both pull and push
  • For obtaining and managing measures (bandwidth)
  • Limitation every peer must have a AXML server
  • Reliability to be improved
  • Size small for a software
  • Environment Web server
  • XML is seen negatively by the community? Effort
    for describing RPM in XML
  • One alternative we will follow but not only one

24
2 issues
  • Code quality in the consortium
  • Expectation prototype level
  • Open source style open source community is
    welcome in taking part of it
  • Social issue
  • Understand how the system operates (how the
    community operates) and so how it can be improved

25
Meta data what?
  • Package
  • URI, Name and version number
  • Architecture, format (source, binary, doc),
  • Content and signature package owner Mandrake
    machine where the package was created
  • Signature, Date, summary, full description, size
    when installed
  • Dependencies
  • For the source package, the URI where they find
    the source
  • License information (now static but could be
    virtual little used)
  • Virtual information which releases it belongs to
    and which environments it has been compiled
  • Virtual information bug status
  • Change log between different versions text (I
    fixed this bug)
  • Group field editors, perl development some
    categorization in plain text
  • URL of the project

26
Meta data - continued
  • Other granularity such as version of a release
  • Collection of all meta-data of all the packages
    in it
  • Information of servers
  • What is available where it comes from, when it
    was obtained
  • Statistics of servers, trust, other servers, etc.
  • No real delta
  • Remark local jargon
  • Source given (architecture src)
  • Binary Something built from a source package
    (architecture sparc, or no-arch)
  • Example 1
  • Source documentation in txt is the source where
    to install it
  • Binary documentation When installed
  • Example 2
  • Source XML and binary HTML

27
Choice of architectures
  • Limits of mirroring scaling
  • Limits of P2P updating is more difficult
  • Combine both use the hierarchy of mirrors to
    reflect rapid changes and eventually flood them
    in P2P system
  • Difficult to compare the architecture
  • Do some evaluation and measurements
  • Measures and analyse the actual system

28
Security
  • Fake files
  • How important is security?
  • Signature from Mandrakesoftware
  • Security holes in the up-load procedure and
    central server
  • Should be detected when tested
  • Not fully sure (e.g., Trojan horse)

29
New functionalities and motivations for
convincing peers to use a new system
  • Facilitate the management of distribution (easy
    to use interface) without loosing control
  • Improve access
  • Subscriptions
  • Incentive better service if you are a good guy
  • Issue of trust and cheaters
  • sprobe.cs.washington.edu measure peers bandwidth
  • Customizing versions
  • Big company that want to distribute its own
    version of Mandrake expended with some software
  • Also loc al developers

30
New functionalities
  • User friendly catalog is put together by
    Mandrakesoft
  • Incentive to interest the Mandrake community of
    users
  • Client application to share RPM at the package
    level with coherence
  • Subscription mechanism notification
  • May be useful in test phase
  • UDDI like repository
  • I want a software from gif to jpeg out of scope
    but fun

31
Research Issues
  • Modeling of the problem and declarative
    specification of a peer policy (with AXML)
  • Optimization of distribution
  • Flooding of a new release/package update
  • Optimization of access and download
  • Degree of replication and selection of the copy
  • Replication in a P2P/DHT evaluation and control
  • Customizing and other new functionalities
  • Monitoring (for Mandrake and users)
  • Deltas and incremental

32
The future
33
Deliverables
  • 1st deliverable docMetrics
  • docStateArt review of the state of the art
  • (Turn it into a published paper?)
  • Next pass at it Geneva (short) then INRIA that
    splits it in 2 and Place them on the twiki -
    EMAIL
  • Others may include their contribution in it

34
Precise description of the distribution process
docModel
  • Glossary of the domain
  • Functional aspect of the process
  • Define API
  • Must contain features not present (Extensible)
  • Frontiers at least distribution not forbidden
    to think of more
  • Syntax
  • Java, Web services, UML, RDF, BPEL
  • None of the above

35
Who is interested by what?
  • Mandrakesoft improvements of current
    architecture, metrics (tests and measures), API,
    state of the art
  • Geneva measure, state of the art, API (short),
    security
  • Zurich distributed databases
  • Torino API, security, students as testers
  • INRIA measures, state, API, P2P architecture,
    security?
  • Tel Aviv architecture, state of the art, load
    balancing, meta data
  • Use Nuxeo as sanity test because also interested
    in the distribution process ask them

36
Task leaders
  • Measurements Ciaran
  • State of the art Radu and Tova
  • API Serge
  • Security? Torino
  • Last week of April phone meeting
  • Start a Blog for each group very short
    descriptions of what is going on

37
The end
  • Thanx Geneva for the organization
Write a Comment
User Comments (0)
About PowerShow.com