The Implementation and Architecture Working Group - PowerPoint PPT Presentation

1 / 5
About This Presentation
Title:

The Implementation and Architecture Working Group

Description:

How is the library implemented in the case studies? ... How can scientific data archives be stored? ... There should be published APIs and components. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 6
Provided by: julia71
Category:

less

Transcript and Presenter's Notes

Title: The Implementation and Architecture Working Group


1
The Implementation and Architecture Working Group
  • ISDA Workshop, March 1998

2
  • How is the library implemented in the case
    studies?
  • A variety of implementations ODBMS, RDBMS,
    binary files with index
  • How can scientific data archives be stored?
    Compare and contrast simple files, relational
    databases, and object databases.
  • Many possibilities the solution is domain
    specific, and often a tradeoff between efficiency
    and functionality. Databases certainly have the
    advantage that they offer very good querying
    functionality. ODBMS practitioners are largely
    convinced of their superiority over relational
    databases for scientific data.
  • What are the scalability requirements of
    scientific databases at the PetaByte level. In
    particular, how does one design a fully scalable
    digital library?
  • Complicated federated, distributed, 64 bits, SMP
    support, network, replicated. The whole system
    has to be considered, not just the database.
  • What are the scalability plans for the case
    studies?
  • A general need for middleware which allows
    connection of new (heterogeneous) repositories
    and integrates the data by offering services. We
    noted that the, apart from the software
    scalability, the hardware itself is not yet a
    solved problem, even with mega-bucks.

3
  • What are the special services required to support
    scientific data, e.g. visualization, comparison,
    pattern matching, cataloguing?
  • There are many meta data processing, browsing,
    mining, agent-based learning, quality evaluation
    etc.. All services should be components,
    distributable, and well-documented.
  • What is the perceived and real impact of
    component technologies in scientific data
    archives?
  • We saw components in analogy with hardware PnP
    devices. The use of this software technology is
    not yet well understood in the science field,
    with the notable exception of Java. But we are
    confident we need software components!
  • What is the impact of distributed object
    technologies such as CORBA, Java RMI, and
    distributed object databases?
  • We were unable to agree on a good answer to this
    question. One of us noted that CORBA had
    destroyed more projects than it had saved. We
    were unanimous in being optimistic about
    distributed Java.

4
  • How can we implement authentication, security,
    accounting, encryption, billing policies?
  • Do we want to? If it is required, then use one of
    the various methods like SSL, PGP etc.. All
    implementations of these features should be
    available as components.
  • What kinds of private storage will be available
    at the library facility, and at the clients
    office, to what kind of users? How, when and
    where do we cache results?
  • We used a real library as an analogy public,
    departmental, or domain specific. There is a need
    to have the equivalent of a library desk, and the
    equivalent of a notepad, that one takes out of
    the library. Putting new material into the
    library is a special topic that should address
    how to make private caches available to others.
  • What are the most significant maintenance tasks
    in a scientific digital library? What tasks
    require what levels of professional skill?
  • Similar tasks to those of a real librarian.
    Content deletion/erchival/backup/restore/versionin
    g. The long term viability of the media. Checking
    out new data for insertion. Software and hardware
    upgrades. Reacting to feedback from users.
    Preparing manifests of the data provenance,
    quality.

5
  • How do we estimate and minimise the wait time for
    a user query on the library?
  • The user must be given feedback when using the
    library. Estimates, continual estimates of
    completion, progress times. Making sure that
    trivial queries take trivial time to process.
    Prioritising queries a complex issue. Ordering
    the data to optimise query times striping etc..
  • How can user-written applications use scientific
    data collections etc.
  • There should be published APIs and components.
    Restrictions sometimes have to be imposed on
    access to the library. Autonomous agents that can
    search offline on behalf of the user. The latter
    part of the question is a Grand Challenge, as it
    implies a level of autonomous agent
    sophistication that is not yet possible.
  • Information Based Computing ?
  • The Information Boom will occur as new large
    scientific data repositories come online in the
    next few years. The major challenge will be to
    integrate them successfully, lest we become
    swamped in a morass of data. It is important to
    remember that it is the creativity of the human
    being that must be allowed full sway when we
    design our systems.
Write a Comment
User Comments (0)
About PowerShow.com