JISCSURFCNImtgmay05 - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

JISCSURFCNImtgmay05

Description:

Resource Discovery Network / PSIgate physical sciences portal ... eBank embedded in a science portal. JISC/SURF/CNI Conference May 2005. 27 ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 37
Provided by: lizl7
Category:

less

Transcript and Presenter's Notes

Title: JISCSURFCNImtgmay05


1
From research data to new knowledge a lifecycle
approach. Dr Liz Lyon, Director UKOLN,
University of Bath, UK JISC/SURF/CNI Conference
May 2005, Amsterdam.
UKOLN is supported by
www.bath.ac.uk
www.ukoln.ac.uk
a centre of expertise in digital information
management
2
Overview
  • Scholarly communications in flux
  • e-Research and the diversity of data
  • Repositories meta-functionality
  • Realising the link to learning eBank UK
  • Providing value-added services
  • Enabling knowledge extraction post-processing
  • Look at (some of) the issues en route

3
1. Scholarly communications in flux
4
A medieval scriptorium..
5
Presentation services subject, media-specific,
data, commercial portals
Searching , harvesting, embedding
Resource discovery, linking, embedding
Data creation / capture / gathering laboratory
experiments, Grids, fieldwork, surveys, media
The scholarly knowledge cycle. Liz Lyon, Ariadne,
July 2003.
Aggregator services national, commercial
Data analysis, transformation, mining, modelling
Harvestingmetadata
Research e-Science workflows
Repositories institutional,
e-prints, subject, data, learning objects
Deposit / self-archiving
Validation
Publication
Peer-reviewed publications journals, conference
proceedings
6
Presentation services subject, media-specific,
data, commercial portals
Searching , harvesting, embedding
Resource discovery, linking, embedding
Aggregator services national, commercial
Learning object creation, re-use
Harvestingmetadata
Learning Teaching workflows
Repositories institutional,
e-prints, subject, data, learning objects
Institutional presentation services portals,
Learning Management Systems, u/g, p/g courses,
modules
Deposit / self-archiving
Validation
Resource discovery, linking, embedding
Validation
Peer-reviewed publications journals, conference
proceedings
Quality assurance bodies
7
Presentation services subject, media-specific,
data, commercial portals
Searching , harvesting, embedding
Resource discovery, linking, embedding
Resource discovery, linking, embedding
Data creation / capture / gathering laboratory
experiments, Grids, fieldwork, surveys, media
Aggregator services national, commercial
Data analysis, transformation, mining, modelling
Learning object creation, re-use
Harvestingmetadata
Learning Teaching workflows
Research e-Science workflows
Repositories institutional,
e-prints, subject, data, learning objects
Institutional presentation services portals,
Learning Management Systems, u/g, p/g courses,
modules
Deposit / self-archiving
Deposit / self-archiving
Validation
Publication
Resource discovery, linking, embedding
Validation
Peer-reviewed publications journals, conference
proceedings
Quality assurance bodies
8
2. e-Research and the diversity of data
9
Assuring permanent open access to the records of
science the humanities?
  • Long term access to primary data
  • Increasing data volumes from eScience and
    Grid-enabled / cyberinfrastructure applications
  • Changing research paradigm data-driven science,
    big science
  • Observational data, simulations, large-scale
    experimentation, computations
  • Multi-media resources, statistical data,
    surveys, geo-spatial data

10
Diversity of data collections
  • Very large, relatively homogeneous
    Large-scale Hadron Collider (LHC)
    outputs from CERN
  • Smaller, heterogeneous and richer collections
    World Data Centre for Solar-terrestrial
    Physics CCLRC
  • Small-scale laboratory results
    jumping robots project at the
    University of Bath
  • Population survey data UK Biobank
  • Highly sensitive, personal data patient care
    records

11
Taxonomy of data collections
  • Research collections jumping robots
  • Community collections Flybase at Indiana (with
    UC Berkeley )
  • Reference collections Protein Data Bank
  • Source NSF Long-Lived Digital Data Collections
  • Draft report March 2005

12
Taxonomy of data collections
Evolution
  • Research collections jumping robots
  • Community collections Flybase at Indiana (with
    UC Berkeley )
  • Reference collections Protein Data Bank
  • Source NSF Long-Lived Digital Data Collections
  • Draft report March 2005

13
Repository evolution 1971 Research collection
lt12 files 2005 Reference collection gt2700
structures deposited in 6 months
14
1. Issues research data as content
  • Sharing it!
  • Data diversity
  • Homo- or heterogeneous
  • Raw and derived / processed
  • Sensitivity
  • Fast or slow growth in volume
  • Repository evolution
  • Likelihood to scale up (from bytes to petabytes)
  • Quality assurance (from the start)
  • Community-based standards development
    (folksonomies)
  • Build robust services

15
3. Repositories meta-functionality
16
eBank UK linking research data to learning
  • JISC-funded September 2003, Phase 2 February
    2005
  • UKOLN at the University of Bath (lead),
    University of Southampton, University of
    Manchester
  • Exemplar e-Science testbed Combechem
  • Grid-enabled combinatorial chemistry
  • Crystallography, laser and surface chemistry
    examples
  • Development of an e-Lab using pervasive computing
    technology
  • National Crystallography Service
  • Resource Discovery Network / PSIgate physical
    sciences portal
  • http//www.ukoln.ac.uk/projects/ebank-uk/

17
Presentation services subject, media-specific,
data, commercial portals
Searching , harvesting, embedding
Resource discovery, linking, embedding
Resource discovery, linking, embedding
Data creation / capture / gathering laboratory
experiments, Grids, fieldwork, surveys, media
Data analysis, transformation, mining, modelling
Learning object creation, re-use
Aggregator services eBank UK
Harvestingmetadata
Learning Teaching workflows
Research e-Science workflows
Repositories institutional,
e-prints, subject, data, learning objects
Institutional presentation services portals,
Learning Management Systems, u/g, p/g courses,
modules
Deposit / self-archiving
Deposit / self-archiving
Validation
Publication
Resource discovery, linking, embedding
Validation
Peer-reviewed publications journals, conference
proceedings
Quality assurance bodies
18
Data Flow in eBank UK
Create
OAI-PMH
Index and Search
Institutional repository
eBank aggregator
Data files
Metadata
19
Comb-e-Chem Project
Video
Simulation
Properties
Analysis
StructuresDatabase
Diffractometer
X-Raye-Lab
Propertiese-Lab
Grid Middleware
20
(No Transcript)
21
The digital repository
ecrystals.chem.soton.ac.uk Acknowledgement Simon
Coles
22
Access to the underlying data
23
Harvesting OAIster
24
Aggregating search discover
25
Linking to publications
26
eBank embedded in a science portal
27
eBank Phase 2 linking to learning
  • Embedding in e-Learning processes
  • Evaluating the pedagogical benefits
  • MChem course
  • Chemical informatics course

28
2. Issues generic data models, metadata schema
terminology
  • Validation against other schema
  • CCLRC Scientific Data Model Vs 2
  • Complex digital objects and packaging options
  • METS
  • MPEG 21 DIDL
  • Terminologies
  • Domain crystallography
  • Inter-disciplinary e.g. biomaterials
  • Metadata enhancement subject keyword additions
    to datasets based on knowledge of keywords in
    related publications
  • Meaningful resource discovery?

29
3. Issues linking and identifiers
  • Links to individual datasets within an experiment
  • Links to all datasets associated with an
    experiment or a data collection
  • Links to derived eprints and published literature
  • Context sensitive linking find me
  • Datasets by this author / creator
  • Datasets related to this subject
  • Learning objects by this author / creator
  • Learning objects related to this subject
  • Identifiers and persistence
  • generic
  • domain International Chemical Identifier (InChI
    code)
  • Resource discovery Google Scholar?
  • Provenance authenticity, authority, integrity?

30
4. Issues embedding and workflow
  • Into the crystallographic publishing community
    International Union of Crystallography
  • Into the chemistry research workflow
  • SMART TEA Digital Lab Book e-synthesis Lab
  • Other analytical techniques and instrumentation
  • Into the curriculum and e-Learning workflows
  • MChem course
  • Undergraduate Chemical Informatics courses

31
Repositories and digital curation
For later use? In use now (and the future)?
Static
Dynamic
Data preservation
Data curation
maintaining and adding value to a trusted body
of digital information for current and future use
32
Provide value-added services
  • Annotation
  • e-Lab books (Smart Tea Project in chemistry)
  • Gene and protein sequences

33
Enable post-processing and knowledge extraction
  • The acquisition of newly-derived information and
    knowledge from repository content
  • Run complex algorithms over primary datasets
  • Mining (data, text, structures)
  • Modelling (economic, climate, mathematical,
    biological)
  • Analysis (statistical, lexical, pattern
    matching, gene)
  • Presentation (visualisation, rendering)

34
(No Transcript)
35
5. Issues knowledge services
  • Layered over repositories
  • Annotation
  • Mining, modelling, analysis
  • Visualisation
  • Across multiple repositories
  • Grid enabled applications
  • Highly distributed, dynamic and collaborative
  • Associated with curatorial responsibility
  • UK Digital Curation Centre http//www.dcc.ac.uk

36
Issues summary
  • Research data is diverse, increasing rapidly in
    volume and complexity
  • Repository collections are dynamic and evolve
  • Technical challenges associated with
    interoperability, persistence, provenance,
    resource discovery and infrastructure provision
  • Embedding in workflow is critical scholarly
    communications, research practice, learning
  • Knowledge extraction tools will generate new
    discoveries based on repository content
  • Repository solutions must scale M2M processing
    will become the norm
Write a Comment
User Comments (0)
About PowerShow.com