Running a Data Centre on the Long Term - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Running a Data Centre on the Long Term

Description:

Running a Data Centre on the Long Term Fran ois Ochsenbein (Fran oise Genova) Centre de Donn es astronomiques de Strasbourg (CDS) – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 32
Provided by: cds72
Category:
Tags: centre | data | dubois | long | running | term

less

Transcript and Presenter's Notes

Title: Running a Data Centre on the Long Term


1
Running a Data Centreon the Long Term
  • François Ochsenbein
  • (Françoise Genova)
  • Centre de Données astronomiques de Strasbourg
    (CDS)

2
Lessons learnt from more than 30 years of CDS
history
  • Introduction
  • CDS Organisation
  • CDS Activities
  • Dealing with technical evolution
  • Relation with the Astronomical Community
  • Conclusions

3
A bit of history...
  • Creation in 1972 by (French) Institut National
    d'Astronomie et de Géophysique (INAG), now
    Institut National des Sciences de l'Univers
    (INSU) --- 33yrs ago!
  • Named CDS Centre de Données Stellaires, later
    renamed Centre de Données de Strasbourg, after
    extending the CDS scope to non-stellar objects

4
CDS Charter
  • collect useful data on astronomical objects, in
    electronic form
  • improve them by critical evaluation, comparison
    and combination
  • distribute the results to the international
    astronomical community
  • conduct research using these data -- at the time
    of CDS creation, the original objective was to
    gather stellar data to study the galactic
    structure.
  • ? provide science tools to the community

5
CDS Organisation
  • collect ? Catalog Service, VizieR, data
    for Simbad
  • homogenize ? Simbad, catalog metadata
  • distribute ? CDS Services
  • preserve ? keep original versions
  • extended to images (Aladin), metadata
    (nomenclature), links, registries,...

6
SIMBAD Story
  • A database runnning over 3 decades
  • 1971-1981 CSI (Catalog of Stellar
    Identifications) and BSI (Bibliographical Star
    Index)
  • 1981-1990 SIMBAD was born (Set of
    Identifications, Measurements and Bibliography
    for Astronomical Data) , evolved over different
    mainframe architectures
  • 1990-2005 SIMBAD3 on workstations, using
    object-oriented concepts
  • 2006-... SIMBAD4 based on Java technology and
    open source databases

7
Simbad Evolution...
350,000 stars IBM360/65 PL1/Assembler
1976
700,000 objects Univac 1110 Real-Time Update
1988
2,500,000 objects Sun workstation CObject-Oriente
d
1998
8
Simbad data on the long term
  • From 300,000 stars to 3.5million astronomical
    objects (stars, galaxies, nebulae,...)
  • Regular editions of the full data-base as ascii
    files stored
  • on paper (1972-1977)
  • on microfiches (1977-1985)
  • as disk files (1990-...) 1/year ... 1/month
  • ? gives back details on db evolution
  • all modifications archived

9
CDS Catalog Service and
  • Catalog service (storage and distribution on
    magnetic tapes) since CDS creation in 1972
  • FTP access to CDS collection since 1991
  • Reorganisation of the catalog descriptions
    (metadata) since 1992
  • VizieR (catalogs organised as relational
    database) from 1996

10
MetadataEvolutions
2005
1981
11
Catalog preservation
  • Keep records on modifications
  • on paper until 1991
  • on electronic files since 1991
  • original file preserved as much as possible
  • Mirror copies (x4 for data files, x8 for database)

12
Aladin Sky Atlas
  • Aladin Server started around 1994 with digitized
    Schmidt plates used for the Guide Star Catalog,
    organized as a database
  • Aladin portal started around 1997 as our first
    Java test and becomes a widely used Virtual
    Observatory portal

13
From1997 to 2005
  • Originally visualize catalog data and images at
    CDS
  • Today a widely used VO portal

14
CDS organisation
  • The CDS team integrates staff with different
    profiles astronomers, computer engineers,
    specialized librarians
  • (mainly permanent positions).
  • Scientific objectives, strategy and work program
    discussed in the group chaired by the CDS
    director
  • From its creation, the CDS activities are
    examined by a Scientific Council (6F6f)

15
The CDS Scientific Council
16
(No Transcript)
17
Partnership
  • Historically CDS associated with participating
    Institutes
  • Lausanne/Genève (photometry)
  • Astronomisches Rechen-Institut, Heidelberg
    (astrometry)
  • Paris-Meudon Observatory (bibliography)
  • Marseille Observatory (radial velocities)
  • ... and Strasbourg Observatory (spectral
    classification)

18
Partnership (continued)
  • Participation in projects related to missions,
    e.g. Hipparcos, XMM
  • Participation to the networking of services
    (observatory archives, journals, ADS, ...)
  • Participation in the Virtual Observatory
    enterprise (AVO, IVOA, VOTECH) (bring
    interoperability skills, enables rapid
    prototyping)
  • Involvment with CDS user community

19
Partnership with publishers
  • The data published in the specialized literature
    can't be re-used
  • PS/PDF files are not re-usable
  • interest for the data used ends with the
    publication
  • convince the astronomical journal editors (AA,
    AJ, ApJ) to store the tables in a reusable form
    (standardized description)
  • ?improves the data quality and reliability

20
User Support
  • Communication with users (astronomers) is a
    privilegied way
  • to get feedback on the data service (good and
    bad)
  • to be aware of the astronomers' wishes for an
    efficient research work
  • to get help e.g. clean up some datasets
  • to motivate the CDS team
  • Demos to users (AAS, ...), constant discussions
    with other service developers (ADASS)

21
User Support (continued)
  • Hotline question_at_simbad.u-strasbg.fr
  • in existence since 15 years
  • is still our major source of error report and
    feedback
  • all astronomers and most engineers participate to
    this duty (even the Director...)
  • most questions now from non-astronomers

22
Dealing with Technical Evolutions
  • Example of the Simbad history
  • from mainframes managed by computing centers to
    clusters of PCs
  • changes of languages between PL1/assembler, C,
    perl, java...
  • from batch queries and updates on punched cards,
    to interactive terminals, real-time updating,
    graphical interfaces, WWW, java...
  • while keeping the scientific quality
  • ? database lifetime 10/15years

23
Taking advantage of Electronic Era
  • Take advantage of the existence of electronic
    data
  • using the electronic versions of the journals to
    feed Simbad (ToCs)
  • using the standards to perform extensive
    verifications of the database contents
  • improved data quality and reliability
  • (but some surprises from time to time...)

24
Combine the technological evolutions
  • Example of Aladin
  • 1997 first exercise of Java implementation at
    CDS
  • make use of the open access to data servers (URLs
    and HTTP)
  • make use of a registry (GLU) since its
    beginning
  • implementation of VO standards XML, VOTable,
    SIA, ConeSearch, Skynode
  • connectivity with other applications VOPlot,
    SpecView, VOSpec
  • API access (scripting mode, ExtApp interface)

25
Technological watch
  • New technologies taken into account
  • not too late new technologies open new
    functionalities, accessibility by recent
    technologies is a requirement
  • ...but not too early requires reliability,
    maintainability ... and time to implement!
  • Keep compatibility with older material
  • do not require the users to change their
    hard/software every 6 months
  • example of Aladin still a significant fraction
    of users with Java 1.1

26
Methodological watch
  • New methodologies are tested
  • software architecture example of
    object-orientation used since 15 years (Simbad)
  • contents and metadata ontologies prototyped as
    UCD (Unified Content Descriptors) brought
    extensive coherence checking methods
  • Pragmatic, bottom-up approach

27
Relations with the Scientific Community
  • At the beginning essentially personnel contacts
    of the CDS director and his staff with their
    peers
  • need to convince the astronomers to provide their
    data for wide distribution, through scientific
    collaborations
  • Bulletin d'Information du CDS (1971-1998)
  • scientific results
  • orientations discussed in the CDS Council
    meetings
  • news / controversies about astronomical data

28
Relations with the Scientific Community
(continued)
  • Deep Impact of the Web
  • new non-astronomer users
  • necessity of improving the documentation
  • much larger usage ? improved reliability
  • CDS role has changed
  • now asked by scientists to include their data
    among datasets available from CDS and mirrors
  • databases play now a fundamental role, quick data
    ingestion is now a requirement

29
Conclusions
  • Evolution from a service offered to a small
    community to major reference services
  • precursor/actor of the International Virtual
    Observatory
  • Importance of the quality, lifetime, motivation,
    diversity of the staff, mixture of scientists,
    computer scientists, and librarians
  • Constant search for networking / partnership

30
Co-authors
  • Françoise Genova
  • François Ochsenbein
  • Mark Allen
  • Olivier Bienaymé
  • Thomas Boch
  • François Bonnarel
  • Laurent Cambrésy
  • Sébastien Derriere
  • Pascal Dubois
  • Pierre Fernique
  • Soizick Lesteven
  • Cécile Loup
  • André Schaaff
  • Bernd Vollmer
  • Marc Wenger
  • Gérard Jasniewicz (GRAAL)
  • Emmanuel Davoust (OMP)
  • Daniel Egret (OSP)
  • the CDS Bibliographical team S. Borde (OSP), M.
    Brouty, C. Bruneau, C. Brunet, G. Chassagnard
    (IAP), S. Laloë, A. Schreyeck, P. Vannier, P.
    Vonflie, M.J. Wagner, F. Woelfel

31
Thank you
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com