Title: 3rd International Digital Curation Conference
13rd International Digital Curation Conference
- The Digital Future
- Professor John Wood, Principal of the Faculty of
Engineering, Imperial College, London
12th December 2007, Washington DC
2Issues
- Data Deluge
- Curation and Provenance
- Interoperability
- Multi-disciplinarity of research
- Linking of publications to data
- What is coming up Tower of Babel or Nations
Speaking unto Nations. the need for
International Strategies
3The UK position
- A Recent Report has been published by the Office
of Science and Technology entitled Developing
the UKs e-infrastructure for science and
innovation - A strategic team is now meant to be looking at
the implementation implications. - Research Councils fund basic academic research
and some data repositories. Some Councils see
this as a long term commitment, others as short
term research projects - The Joint Information Service Committee (JISC)
that runs the academic network supports a number
of institutional publication repositories, the
Data Curation Centre, the National Grid Service
and developments in Virtual Research
Environments.
4OSI Report Key recommendations
- The UKs e-infrastructure should provide
researchers with - gt Access to the systems, services, networks and
resources that they need at the point that they
need them - gt Facilities to discover resources easily and use
them appropriately - gt Confidence in the integrity, authenticity and
quality of the services and resources they use - gt Assurance that their outputs will be accessible
now and in the future - gt A location-independent physical infrastructure
for combining computation and information from
multiple data sources - gt Advanced technologies to support collaborative
research - gt The training and skills needed to exploit the
services and resources available to them
5OSI Report Key recommendations
- The e-infrastructure should allow researchers to
- gt Exploit the power of advanced information
technologies and applications to continuously
enhance the process of research itself - gt Collaborate and communicate securely with
others, across disciplines, institutions and
sectors - gt Maximise the potential of advanced technologies
to support innovation and experimentation - gt Share their research outputs with others and
re-use them in the future - gt Engage with industry in support of wider
economic goals
6OSI Report Key recommendations
- The e-infrastructure must enable
- gt The growth of knowledge transfer and the
development of the commercial applications of
research outputs - gt Research funders to track the outputs from the
research they fund - gt The protection of individuals privacy and
work, within regulatory, legal and ethical
constraints - gt The protection of intellectual property and
rights management - gt The preservation of digital information output
as a vital part of the nations cultural and
intellectual heritage
7MRC Data Sharing and Preservation
- The MRC expects valuable data arising from
MRC-funded research to be - made available to the scientific community with
as few restrictions as - possible. Such data must be shared in a timely
and responsible manner..... - MRC research data are publicly-funded and, as a
public good, must be made available for new
research purposes in a timely, responsible
manner. - Governance of researcher access to MRC-funded
research data must balance the interests of data
creators, custodians, users and data subjects. - 3. Access policies and practices for individual
MRC-funded datasets must be transparent,
equitable, practicable and provide clear
decisions consistent withMRC data sharing
policy. - 4. Access to, and use of, MRC-funded data must
comply with statutory and other regulatory
requirements, and good research practice.
8JISCs Role in Developing a UK e-infrastructure
- Needs mandate to be agreed on what data service
of the future will be. - Needs to take account of relevant international
developments, including work by -
- e-IRG (e-Infrastructure Reflection Group)
- ESFRI (European Strategy Forum on Research
Infrastructures) - US initiatives to develop a cyberinfrastructure
9The latest step
- The Conference looked at the best governance
models, at the development of a common European
strategy, and at the international dimension of
Research Infrastructures. - This provided valuable feedback in the launch of
the EC's 7th Framework Programme and in the
updating of the European roadmap of Research
Infrastructures. - The Conference covered all types of Research
Infrastructures including the e-infrastructures
and especially distributed ones.
10OECD principles and guidelines (2007)
- OECD Principles and Guidelines for Access to
Research Data from Public Funding - 13 principles
- A. Openness
- Openness means access on equal terms for the
international research community at the lowest
possible cost, preferably at no more than the
marginal cost of dissemination. Open access to
research data from public funding should be easy,
timely, user-friendly and preferably
Internet-based. - B, C, ... M Flexibility, Transparency, Legal
conformity, Protection of intellectual property,
Formal responsibility, Professionalism,
Interoperability, Quality, Security, Efficiency,
Accountability, Sustainability. - http//www.oecd.org/dataoecd/9/61/38500813.pdf
11Implications of e-science
- e-science is about inventing and exploiting new
advanced computational methods to -
- create a new approach to shared research between
groups and facilities - generate, curate and analyze data
- link publications to data
- develop and explore models and simulations at an
unprecedented scale and to use simulations to run
experiments - help the set-up of distributed virtual
organizations to ease collaboration and sharing
of resources and information and the remote
operation of facilities
12Who are the users today?
- Research communities in urgent need for new
advanced methods because they face unprecedented
computational challenges - Example High Energy Physics
- LHC
- Neutrino Mass
- Gravitational Waves
- Research communities foreseeing the need for new
advanced computational methods because of new
major projects - Example fusion (ITER)
- Other research communities - a holistic approach
- Geophysics
- Condensed Matter
- Meteorology
- Energy
13A New Approach
- Increasingly there will be multi-disciplinary
approaches to mining data relating, for example -
- Biological with Social Science Data
- Fundamental physics with environmental
monitoring and solar activity -
14The early adopters HEP
- The High Energy Physics was the first research
community to adopt globally the grid paradigm for
data collection and analysis - High Energy Physics adopted grids for LHC to
handle the unprecedented volume of data produced - Highly structured community acting as Guinea
pig - High Energy Physics is the n1 user of
e-infrastructures around the world - 99.9 of the data from Atlas has to be removed in
the first few microseconds to avoid web
overload!!
15Looking forward - the LHC at CERN
CMS
NExT project
LHC analysis inititiative with Southampton
ATLAS tracker at RAL
CMS calorimeter crystal
Invented at RAL
Physics data in 2008!
LHC computing at RAL
ATLAS
16Particle Physics
- Progress towards the LHC at CERN - first beam in
2008
17Achievements in High Energy Physics
- the example of EGEE
- (Enabling Grids for E-science)
- 50K jobs/day
- gt 10K simultaneous jobs during prolonged periods
- Reliable data distribution service demonstrated
at 1.6 GB/sec from CERN to LHC Computing Grid
national nodes
18The new world of neutrino physics
SNO detector
The sun imaged with neutrinos (by the SuperK
experiment)
SuperKamiokande and SNO open a new world of
neutrino oscillations discovery that neutrinos
have tiny masses and mix
Oscillations Confirmed by MINOS in 2006
Neutrino discovery timeline ?
19Looking forward - Neutrino Physics
T2K at J-PARC - starts 2009 A strong role in
detector and accelerator development and in
physics analysis
- MICE at RAL - installing now
- Demonstrate cooling a muon beam
Learn more about neutrino mixing angles
Technology demonstration
Neutrino Factory international scoping study ?
design study RAL is one credible site
Explore CP violation origin of matter in the
universe?
20Neutrino Mass
The acceleration of the expanding universe has
been accepted. To understand the expansion
dynamics several models have been proposed.
- Further data will be acquired in the next few
years through the construction of new surveys of
galaxies and clusters of galaxies, which will map
billions instead of the current millions of
objects. - Cosmologists will be producing Petabytes of data
per year from these surveys which is a massive
increase for their current data volumes and a
need for experience in the management of such
vast data volumes to guide him and the cosmology
research community on how to address their data
management problems.
21Gravitational Waves LISA
- Laser Interferometer Space Antenna (LISA)
Pathfinder is intended to detect gravitational
waves. For LISA to control the spacecraft
position with an accuracy of a few millionths of
a millimetre.
ESA will design, develop, launch and operate the
LISA Pathfinder spacecraft. A consortium of
European scientific institutes will provide two
test-masses in a nearly perfect gravitational
free-fall and a sophisticated system to measure
and control their motion with unprecedented
accuracy.
22Meteorology
23Geophysics
- key technologies for OilGas.
- seismic processing platform
- reservoir simulation
- added values
- Capability to solve complex problems and to
validate innovative algorithms on real size
data sets - Close the gap between Research and Industrial
environment - Attract and keep brightest researchers
- Framework for Industry/Research collaboration
24No longer one technique!
- The problems facing society demand a
multi-technique approach. - Users are not expert in these techniques
- E.g. Biologists will send samples and remotely
access data. - Access Grid will enable several scientists to
control in real time - Interoperability between equipment and data sets
becomes imperative. - In real time who can drive the experiment.
Computer simulations will have real time feedback
during the experiment. One informs the other.
25 Rutherford Appleton Laboratory
26CCLRC Technology
efficient large solid angle detectors...
fast electronics
- detectors advanced data acquisition
- unique synergy within CCLRC
- e-technologies (e-science)
- bringing the central facilities into the
universities
27In practice Its all about scale
- Creation
- Examining the detector arrays on the MAPs
spectrometer at ISIS
28(No Transcript)
29SNS target stations and beamlines
30Major laser developments over the past few years
have ensured the CLF is home to the worlds most
intense lasers
Astra-Gemini
Vulcan
31A One off experiment
- Collection
- An ATSR image of Sicily with Mount Etna eruption
taken 24 July 2001
32In practice Its all about scale
- Computation
- 3-D rabbit heart MRI rendered at 512 x 512 x 1400
using 12 GPUs - Data needs interpretation and analysis
Picture of heart
33Developing a new detector for transmission
electron microscopes
1 mm
Commercial programme with MRC LMB Cambridge, MPI
(Germany) and SEI (Holland).
34Data Deluge!
Capacity eg at RAL 20PB by 2010 1PB 1015
Bytes Billions of Floppys Millions of
CDs Thousands of PCs (todays)
35Curation who is responsible?
- Curation
- Some STFC based Repositories
- The Atlas Datastore
- The British Atmospheric Data centre
- The CCLRC Data Portal
- The CCLRC Publications Archive
- The CCPs (Collaborative Computational Projects)
- The Chemical Database Service
- The Digital Curation Centre
- The EUROPRACTICE Software service
- The HPCx Supercomputer
- The JISCmail service
- The NERC Datagrid
- The NERC Earth Observation Data Centre
- The Starlink Software suite
- The UK Grid Support Centre
- The UK Grid for Particle Physics Tier 1A
- The World Data Centre for Solar-Terrestrial
Physics
Atlas Datastore Tape Robot
36The problem will grow
- New large scale facilities are being planned and
built around the world. - They will be run remotely and have to interact in
real time with HPC simulations, each informing
the other. What will be the role of the
researcher once the experiment starts? - Data storage etc needs to be planned right at the
start. - An example XFEL in Hamburg
37Schematic layout of a single pass XFEL
A new X-ray source is needed for studies of new,
of non-equilibrium states of matter at atomic
resolution in space and time
38Peak brightness of pulsed X-ray sources
Ultrafast x-ray sources will probe space and time
with atomic resolution.
Peak Brightness Phot./(s mrad2 mm2
0.1bandw.)
3rd Gen. SR
SPPS
what do we do today and what tomorrow?
2nd Gen. SR
Initial
Laser Slicing
FWHM X-Ray Pulse Duration ps
H.-D. Nuhn, H. Winick
39Fascination - FELs for hard X-rays
The X-ray free-electron lasers will provide
coherent radiation of the proper wavelength and
the proper time structure, so that materials and
the changes of their properties can be portrayed
at atomic resolution in four dimensions, in space
and time.
Diffraction pattern of 10 x 10 x 10 Au cluster
40Take a movie of chemical reactions
Schematic presentation of transition states in a
chemical reaction
41Imaging of a single bio-molecule
Lysozym
with atomic resolution
crystal
single molecule
Oversampling J. Miao, K.O. Hodgson and D. Sayre,
PNAS 98 (2001) 6641-6645
42Coulomb Explosion von Lyzosym
t0
t50 fsec
t100 fsec
R. Neutze, R. Wouts, D. van der Spoerl, E.
Weckert, J. Hajdu Nature 406 (2000) 752-757
43The VUV-FEL user facility at DESY
44Dynamics of condensed-matter systems
- Lateral coherence
- Phase transitions nucleation and growth
- Optical and accoustic phonons
- Fast dynamics in magnetic systems
- Surfaces interfaces
- Melting, solidification
- Lubrification, friction
- Nanoparticles
- Vibrational modes
- Capillary waves
- Melting and nucleation
45VUV-FEL
46European XFEL Facility in Hamburg
phase II
HERA
phase I
PETRA
XFEL Length ca. 3.3 km
47XFEL Office and Laboratory Building
48The European Roadmap for Large Research
Facilities
- European Strategy Forum on Research
Infrastructures (ESFRI) - Launched in April 2002
- Commissioned by the Council in 2004 to produce a
forward look Roadmap akin to the DoE Large
Facilities Roadmap but including all disciplines - First edition published in October 2006
- Many of the projects are now being funded for
drawing up preliminary proposals including the
requirements for e-infrastructure, remote access
etc.
49What is ESFRI?
- The European Strategy Forum on Research
Infrastructures - Brings together representatives of the 27
Member States,5 Associated States, and one
representative of the European Commission (EC) - Projects must be open access and genuinely
Pan-European or Global
50Excellence and Research Infrastructures
- Europe has a long-standing tradition of
excellence in research and its teams continue to
lead progress in many fields - However our centres of excellence often fail to
reach critical mass - There is a need to bring resources together and
to build a research and innovation area
equivalent to the "common market" - Renewed impetus behind the European Research Area
-
51Social Science and Humanities
6 Projects
CLARIN
CESSDA
EROHS
ESS
SHARE
DARIAH
52CLARIN
Social Science and Humanities
- Common Language Resources and Technology
Infrastructure - language resources and technology available and
useful to scholars of all disciplines, in
particular the humanities and social sciences - harmonise structural and terminological
differences - based on a Grid-type of infrastructure and by
using Semantic Web technology
www.mpi.nl/clarin
53CESSDA
Social Science and Humanities
- Council of European Social Science Data Archives
- distributed RI that provides and facilitates
access of researchers to high quality data and
supports their use - now 21 countries in Europe
- 15,000 data collections
- access to over 20,000 researchers
www.nsd.uib.no/cessda
54DARIAH
Social Science and Humanities
- Digital Research Infrastructure for the Arts and
Humanities) - based upon an existing network of Data Centres
and Services based in Germany (Max Planck
Society), France (CNRS), the Netherlands (DANS)
and the United Kingdom (AHDS) - bring essential cultural heritage online.
www.dariah.eu
55The European Social Survey
Social Science and Humanities
- monitor long term changes in social values
throughout Europe - produce data relevant to academic debate, policy
analysis and better governance - covers 27 European countries.
www.europeansocialsurvey.org
56SHARE
Social Science and Humanities
- Survey of Health, Ageing and Retirement in
Europe - fact-based economic and social science analyses
of the on-going changes in Europe due to
population ageing - will be expanded to all 25 Member States of the
EU.
www.share-project.org
57Environmental Sciences
AURORA BOREALIS
IAGOS-ERI
7 Projects
EUFAR
EURO-ARGO
LIFEWATCH
EMSO
ICOS
58EMSO
Environmental Sciences
- deep sea-floor observatories deployed on
specific sites offshore European coastline - allow continuous monitoring for environment and
security. - part of a global endeavour in sea-floor
observatories - long term monitoring of environmental processes
related to ecosystem life and evolution, global
changes and geo-hazards - key component of GMES and GEOSS.
www.ifremer.fr/esonet/emso
59EUFAR
Environmental Sciences
- Heavy-Payload fleet of airborne research in
Environmental and Geo-Sciences - more than 30 instrumented aircrafts for
tropospheric research, over oceanic, polar and
remote continental areas, which are especially
crucial for climate studies
www.eufar.net
60EURO-ARGO
Environmental Sciences
- European component of a world wide in situ
global ocean observing system - based on autonomous profiling floats throughout
the ice-free areas of the deep ocean - data are transmitted in real time by satellite
to data centres for processing, management, and
distribution
www.coriolis.eu.org
61IAGOS-ERI
Environmental Sciences
- integration of routine commercial passenger
aircraft measurements into a Global Observing
System - regular observations of atmospheric composition
by installing autonomous instrument packages,
certified for commercial aircraft (Airbus)
www.fz-juelich.de/icg/icg-ii/iagos
62LIFEWATCH
Environmental Sciences
- protection, management and sustainable use of
biodiversity - network of observatories, facilities for data
integration and interoperability - virtual laboratories offering a range of
analytical and modelling tools - a Service Centre providing special services for
scientific and policy users, including training
and research opportunities for young scientists
www.lifewatch.eu/
63ICOS
Environmental Sciences
- Integrated Carbon Observation System
- co-ordinated, integrated, long-term high-quality
observational data of the greenhouse balance of
Europe and of adjacent key regions of Siberia and
Africa
www.carboeurope.org
64Energy
Need to nucleate further work
IFMIF
HiPER
3 Projects
JHR
65HiPER
Energy
- large scale laser system designed to demonstrate
significant energy production from inertial
fusion - supporting a broad base of high power laser
interaction science - revolutionary approach to laser-driven fusion
known as Fast Ignition
www.hiper-laser.org
66IFMIF
Energy
- International Fusion Materials Irradiation
Facility - accelerator-based very high flux neutron source
to provide a suitable data base on irradiation
effects on material needed for the construction
of a fusion reactor
www-dapnia.cea.fr
67Biomedical and Life Sciences
6 Projects
STRUCTURAL BIOLOGY
BIOBANKS
CLINICAL TRIALS
EATRIS
INFRAFRONTIER
Upgrade of EBI
68European Biobanking and Biomolecular Resources
Biomedical and Life Sciences
- network of existing and de novo biobanks and
biomolecular resources - samples from patients and healthy persons,
molecular genomic resources and bioinformatics
tools
www.biobanks.eu
69INFRAFRONTIER
Biomedical and Life Sciences
- Phenomefrontier in vivo imaging and data
management tools, for the phenotyping of
medically relevant mouse models - Archivefrontier state-of-the-art archiving and
dissemination of mouse models (major upgrade of
the European Mouse Mutant Archive (EMMA))
www.eumorphia.org www.emma.rm.cnr.it
70Infrastructures for Clinical Trials and
Biotherapy
Biomedical and Life Sciences
- interconnect existing national networks of
clinical research centres and clinical trial
units - upgrade or create new facilities for the
evaluation of innovative biotherapy agents - make available professional data centres allowing
high quality data - management across the European Union
- establish connections with disease-oriented
patient associations and registries, and
disease-oriented investigators networks in order
to foster patients enrolment.
www.ecrin.org
71Upgrade of European Bioinformatics Infrastructure
Biomedical and Life Sciences
- platform for data collection, storage,
annotation, validation, dissemination and
utilisation - substantial upgrade to the existing European
Bioinformatics Institute (EBI) - integrate secondary data resources that are
distributed across Europe and make the most of
the diverse expertise of its scientists.
www.ebi.ac.uk
72Material Sciences
7 Projects
IRUVX
ESS
XFEL
ESRF
ILL
ELI
PRINS
73The European Spallation Source
Material Sciences
- worlds most powerful source of neutrons.
- built-in upgradeability
- initial 20 instruments
- will serve 4,000 users annually across many
areas of science and technology.
http//neutron.neutron-eu.net/n_ess
74ESRF Upgrade
Material Sciences
- European Synchrotron Radiation Facility (ESRF)
- supported and shared by 17 European countries
and Israel. - wide range of disciplines including physics,
chemistry and materials science as well as
biology, medicine, geophysics and archaeology - many industrial applications, including
pharmaceuticals, cosmetics, petrochemicals and
microelectronics.
www.esrf.fr
75Astronomy, Astrophysics and Nuclear Physics
SPIRAL2
5 Projects
European ELT
KM3NeT
SKA
FAIR
76The European ELT
Astronomy Astrophisics and Nuclear Physics
- highest priorities in ground-based astronomy
- detailed studies of inter alia planets around
other stars, the first objects in the Universe,
super-massive Black Holes, and the nature and
distribution of the Dark Matter and Dark Energy
which dominate the Universe - maintain and reinforce Europes position at the
forefront of astrophysical research.
www.eso.org/projects/e-elt
77FAIR
Astronomy Astrophisics and Nuclear Physics
- high energy primary and secondary beams of ions
of highest intensity and quality - including an antimatter beam of antiprotons
allowing forefront research - experiments with primary beams of ion masses up
to Uranium and the production of a broad range of
radioactive ion beams.
www.gsi.de/fair/index_e.html
78KM3NET
Astronomy Astrophisics and Nuclear Physics
- deep-sea research infrastructure in the
Mediterranean Sea - cubic-kilometre sized deep-sea neutrino
telescope for astronomy - detection of high-energy cosmic neutrinos
- long-term deep-sea measurements.
www.km3net.org
79SKA
Astronomy Astrophisics and Nuclear Physics
- Square Kilometre Array
- next generation radio telescope
- 50 times more sensitive than current facilities
- survey the sky more than 10,000 times faster
than any existing radio telescope.
www.skatelescope.org
80EU-HPC
- scientific computing network to utilise
top-level machines - limited number of world top-tier centres
- associated national, regional and local centres
- Capability (high-performance) and Capacity
Computing (high-throughput) - different machine architectures will fulfil the
requirements of different scientific domains and
applications
www.hpcineuropetaskforce.eu
81Computer Data Treatment, Particles and Space
PhysicsInputs from e-IRG, ESA, CERN
EUHPC (e-IRG)
The CERN Council strategy for particle physics
The ESA Cosmic Vision
82Global Dimension
- Several of the projects on the Roadmap require a
global approach. - Discussions are taking place on how the EU can
act with one voice - A Forum for decision making is urgently needed.
Next Carnegie meeting this weekend may start to
resolve - Major player are Australia, Japan, Russia, South
Africa, USA, China, India
83Science driver-Integration of Data (and
publications)
Neutron diffraction
X-ray diffraction
NMR
High-quality structure refinement
84 Supporting the Research Lifecycle
85The Information Infrastructure
The Body of Knowledge
86Current View
87Future View
88In practice The web has changed everything!
- Scientists increasingly expect
- Access to everything
- distributed, interoperating information sources
- Interlinking of everything
- Revalidation of results repeat experiment
- Discovery across everything
- new knowledge from old
- Archiving of everything
- Recording unique events
- Antarctic environmental data
The challenge is to keep pace with increasing
expectations
89Attracting new research communities a proposed
way forward
- e-infrastructures should move from computing
grids toward knowledge grids- the use of semantic
webs and data mining - Many research communities have limited computing
needs - Need to achieve deployment of large scale data
oriented scientific applications - And beyond data integration and knowledge
management - Requirement development and deployment of
services - Middleware services must be extended
- Data management
- Security
- User access must be eased
- In terms of user friendliness
- In terms of user support
90The Future?
- How much supercomputing power do we need. The
requirement to balance capacity and capability - Enabling data sets from many different sources
and disciplines to be mined effectively. Just
how did the scientists and engineers work
together across boundaries in the construction
and running of ATLAS at CERN? a possible
history of science Ph.D in the future! - Matching the pull of computer scientists with the
needs of the academic community. Raising
aspirations and integrating e-science with the
development of large networks and facilities - How to cope with massive data sets and to protect
them - How will students and teachers know something is
true? the need for strong measures to track
provenance.