Title: Running a Data Centre on the Long Term
1Running a Data Centreon the Long Term
- François Ochsenbein
- (Françoise Genova)
- Centre de Données astronomiques de Strasbourg
(CDS)
2Lessons learnt from more than 30 years of CDS
history
- Introduction
- CDS Organisation
- CDS Activities
- Dealing with technical evolution
- Relation with the Astronomical Community
- Conclusions
3A bit of history...
- Creation in 1972 by (French) Institut National
d'Astronomie et de Géophysique (INAG), now
Institut National des Sciences de l'Univers
(INSU) --- 33yrs ago! - Named CDS Centre de Données Stellaires, later
renamed Centre de Données de Strasbourg, after
extending the CDS scope to non-stellar objects
4CDS Charter
- collect useful data on astronomical objects, in
electronic form - improve them by critical evaluation, comparison
and combination - distribute the results to the international
astronomical community - conduct research using these data -- at the time
of CDS creation, the original objective was to
gather stellar data to study the galactic
structure. - ? provide science tools to the community
5CDS Organisation
- collect ? Catalog Service, VizieR, data
for Simbad - homogenize ? Simbad, catalog metadata
- distribute ? CDS Services
- preserve ? keep original versions
- extended to images (Aladin), metadata
(nomenclature), links, registries,...
6SIMBAD Story
- A database runnning over 3 decades
- 1971-1981 CSI (Catalog of Stellar
Identifications) and BSI (Bibliographical Star
Index) - 1981-1990 SIMBAD was born (Set of
Identifications, Measurements and Bibliography
for Astronomical Data) , evolved over different
mainframe architectures - 1990-2005 SIMBAD3 on workstations, using
object-oriented concepts - 2006-... SIMBAD4 based on Java technology and
open source databases
7Simbad Evolution...
350,000 stars IBM360/65 PL1/Assembler
1976
700,000 objects Univac 1110 Real-Time Update
1988
2,500,000 objects Sun workstation CObject-Oriente
d
1998
8Simbad data on the long term
- From 300,000 stars to 3.5million astronomical
objects (stars, galaxies, nebulae,...) - Regular editions of the full data-base as ascii
files stored - on paper (1972-1977)
- on microfiches (1977-1985)
- as disk files (1990-...) 1/year ... 1/month
- ? gives back details on db evolution
- all modifications archived
9CDS Catalog Service and
- Catalog service (storage and distribution on
magnetic tapes) since CDS creation in 1972 - FTP access to CDS collection since 1991
- Reorganisation of the catalog descriptions
(metadata) since 1992 - VizieR (catalogs organised as relational
database) from 1996
10MetadataEvolutions
2005
1981
11Catalog preservation
- Keep records on modifications
- on paper until 1991
- on electronic files since 1991
- original file preserved as much as possible
- Mirror copies (x4 for data files, x8 for database)
12Aladin Sky Atlas
- Aladin Server started around 1994 with digitized
Schmidt plates used for the Guide Star Catalog,
organized as a database - Aladin portal started around 1997 as our first
Java test and becomes a widely used Virtual
Observatory portal
13From1997 to 2005
- Originally visualize catalog data and images at
CDS - Today a widely used VO portal
14CDS organisation
- The CDS team integrates staff with different
profiles astronomers, computer engineers,
specialized librarians - (mainly permanent positions).
- Scientific objectives, strategy and work program
discussed in the group chaired by the CDS
director - From its creation, the CDS activities are
examined by a Scientific Council (6F6f)
15The CDS Scientific Council
16(No Transcript)
17Partnership
- Historically CDS associated with participating
Institutes - Lausanne/Genève (photometry)
- Astronomisches Rechen-Institut, Heidelberg
(astrometry) - Paris-Meudon Observatory (bibliography)
- Marseille Observatory (radial velocities)
- ... and Strasbourg Observatory (spectral
classification)
18Partnership (continued)
- Participation in projects related to missions,
e.g. Hipparcos, XMM - Participation to the networking of services
(observatory archives, journals, ADS, ...) - Participation in the Virtual Observatory
enterprise (AVO, IVOA, VOTECH) (bring
interoperability skills, enables rapid
prototyping) - Involvment with CDS user community
19Partnership with publishers
- The data published in the specialized literature
can't be re-used - PS/PDF files are not re-usable
- interest for the data used ends with the
publication - convince the astronomical journal editors (AA,
AJ, ApJ) to store the tables in a reusable form
(standardized description) - ?improves the data quality and reliability
20User Support
- Communication with users (astronomers) is a
privilegied way - to get feedback on the data service (good and
bad) - to be aware of the astronomers' wishes for an
efficient research work - to get help e.g. clean up some datasets
- to motivate the CDS team
- Demos to users (AAS, ...), constant discussions
with other service developers (ADASS)
21User Support (continued)
- Hotline question_at_simbad.u-strasbg.fr
- in existence since 15 years
- is still our major source of error report and
feedback - all astronomers and most engineers participate to
this duty (even the Director...) - most questions now from non-astronomers
22Dealing with Technical Evolutions
- Example of the Simbad history
- from mainframes managed by computing centers to
clusters of PCs - changes of languages between PL1/assembler, C,
perl, java... - from batch queries and updates on punched cards,
to interactive terminals, real-time updating,
graphical interfaces, WWW, java... - while keeping the scientific quality
- ? database lifetime 10/15years
23Taking advantage of Electronic Era
- Take advantage of the existence of electronic
data - using the electronic versions of the journals to
feed Simbad (ToCs) - using the standards to perform extensive
verifications of the database contents - improved data quality and reliability
- (but some surprises from time to time...)
24Combine the technological evolutions
- Example of Aladin
- 1997 first exercise of Java implementation at
CDS - make use of the open access to data servers (URLs
and HTTP) - make use of a registry (GLU) since its
beginning - implementation of VO standards XML, VOTable,
SIA, ConeSearch, Skynode - connectivity with other applications VOPlot,
SpecView, VOSpec - API access (scripting mode, ExtApp interface)
25Technological watch
- New technologies taken into account
- not too late new technologies open new
functionalities, accessibility by recent
technologies is a requirement - ...but not too early requires reliability,
maintainability ... and time to implement! - Keep compatibility with older material
- do not require the users to change their
hard/software every 6 months - example of Aladin still a significant fraction
of users with Java 1.1
26Methodological watch
- New methodologies are tested
- software architecture example of
object-orientation used since 15 years (Simbad) - contents and metadata ontologies prototyped as
UCD (Unified Content Descriptors) brought
extensive coherence checking methods - Pragmatic, bottom-up approach
27Relations with the Scientific Community
- At the beginning essentially personnel contacts
of the CDS director and his staff with their
peers - need to convince the astronomers to provide their
data for wide distribution, through scientific
collaborations - Bulletin d'Information du CDS (1971-1998)
- scientific results
- orientations discussed in the CDS Council
meetings - news / controversies about astronomical data
28Relations with the Scientific Community
(continued)
- Deep Impact of the Web
- new non-astronomer users
- necessity of improving the documentation
- much larger usage ? improved reliability
- CDS role has changed
- now asked by scientists to include their data
among datasets available from CDS and mirrors - databases play now a fundamental role, quick data
ingestion is now a requirement
29Conclusions
- Evolution from a service offered to a small
community to major reference services - precursor/actor of the International Virtual
Observatory - Importance of the quality, lifetime, motivation,
diversity of the staff, mixture of scientists,
computer scientists, and librarians - Constant search for networking / partnership
30Co-authors
- Françoise Genova
- François Ochsenbein
- Mark Allen
- Olivier Bienaymé
- Thomas Boch
- François Bonnarel
- Laurent Cambrésy
- Sébastien Derriere
- Pascal Dubois
- Pierre Fernique
- Soizick Lesteven
- Cécile Loup
- André Schaaff
- Bernd Vollmer
- Marc Wenger
- Gérard Jasniewicz (GRAAL)
- Emmanuel Davoust (OMP)
- Daniel Egret (OSP)
- the CDS Bibliographical team S. Borde (OSP), M.
Brouty, C. Bruneau, C. Brunet, G. Chassagnard
(IAP), S. Laloë, A. Schreyeck, P. Vannier, P.
Vonflie, M.J. Wagner, F. Woelfel
31Thank you