IWIR-CRIS '06 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

IWIR-CRIS '06

Description:

Title: Slide 1 Author: Bo Alr Last modified by: Harrie Lalieu Document presentation format: Aangepast Other titles: Times New Roman StarSymbol Tahoma Wingdings ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 18
Provided by: BoA118
Learn more at: http://www.eurocris.org
Category:
Tags: cris | iwir | development

less

Transcript and Presenter's Notes

Title: IWIR-CRIS '06


1
IWIR-CRIS '06
Data retrieval in PURE Data retrieval in the
4-year old PURE CRIS project at 9 universities
2
Agenda
  • Overview
  • Retrieval
  • Validated manual data gathering
  • Dynamic integration to local back-end systems
  • Aggregation, enrichment and import of historic
    data
  • Experiments with automated imports of historic
    data
  • Exposure
  • Two web services
  • OAI
  • Z39.50
  • Reports
  • Portal framework
  • Archiving
  • Near future

3
Overview
  • Brief overview
  • in order to discuss ingestion, integration,
    conversion and import in a specific context

4
Overview
  • Brief overview
  • History
  • Development begun in 2002
  • Users
  • 9 universities (DKSE), several hospitals other
    research institutions
  • Platform and architecture
  • J2EE enterprise application
  • Release management All users have instances of
    same release version, same code-base
  • Business model
  • Commercial software licenses, powerful user
    group, shared budgets
  • Modular
  • Basic module, Reporting module, Student thesis
    module, External publications module,
    Bibliometrics module, Press module.

5
Overview
6
Retrieval
  • Manual data gathering
  • User roles/right workflow
  • de-centralized data gathering
  • validated data gathering
  • continuous data gathering
  • GUI example
  • Management focus is necessary
  • Reports and statistics, KPI-management, etc.
  • Adding value to researchers is necessary
  • Instantly in Google indexes, instantly updated
    personal websites, instantly updated CV,
    increased citations (source in paper), etc.

7
Retrieval
  • Dynamic integration
  • Dynamic integration to local back-end systems
  • Personnel systems, payroll systems (for data
    retrieval)
  • LDAPs, Active Directories (for data retrieval
    authentication)
  • Single sign-on systems (for authentication)
  • to automatically create object types such as
    person or organization
  • and yes, PURE hosts data, too
  • We need complete objects according to the
    meta-data model
  • Plug-in architecture in PURE
  • Pro individually adapted integration
  • Con individually programmed plug-in necessary
  • Future GUI, standardized plug-ins

8
Retrieval
  • Import
  • Historic data
  • Many sources
  • More or less useful data
  • More or less consequent use of formats -)
  • The PXA format
  • PURE XML Archive format - .zip based
  • Meta-data, relations between entities, binary
    files
  • Aggregation gt enrichment gt conversion gt import
  • The process is external to PURE

9
Retrieval
  • Experiments
  • Experiments with automated imports of historic
    data from specific, identified sources
  • source format gt PXA conversion gt import gt
    enrichment/validation
  • Very poor data quality demands the concept of
    draft objects in PURE

10
Exposure
  • Web services
  • RPC/encoded document/literal
  • Rich libraries of methods
  • Including format-specific methods APA, MLA,
    HARVARD, VANCOUVER and CBE
  • Free and near-instant adding of methods
  • WS code example (if time)

11
Exposure
  • OAI support
  • OAI-PMH data provider
  • OAI-PMH formats
  • DC
  • DDF-MXD (Danish national format)
  • SVEP (Swedish national format)
  • more to come
  • Also used to harvest other PURE-repositories for
    external publications

12
Exposure
  • Z39.50
  • Enabling of searches in PURE from library systems
  • SRW/SRU

13
Exposure
  • Reports
  • PURE reporting module
  • GUI example

14
Exposure
  • Reference manager
  • Export of data to local Reference Manager
    installation
  • Using RM-formatted export file
  • Promotes registering to the repository rather
    than in RM
  • GUI example

15
Exposure
  • Portal framework
  • PUREportal free PURE-specific framework for
    custom development of research exhibition portals
  • Online example
  • Typical cost scenario 20,000
  • Typical delivery time 1 month
  • Little need for requirements specification
  • Automatic PURE-API maintenance

16
Archiving
  • Data archiving 2 levels
  • SQL environment
  • Meta-data and relations
  • Binary files just stored in server file system
  • FEDORA via connector (not PURE-specific, Open
    Source)
  • Facilitates
  • Higher quality archival of binary files
  • Long term preservation in general
  • Adoption of PURE in institutions general FEDORA
    strategies

17
Near future
  • The near future regarding data retrieval
  • More automated imports using increasingly
    advanced converters
  • Automated data delivery (push and harvest) to
  • Industry specific search services (e.g. PubMed,
    Nordicom)
  • Documentary data collections (such as
    clinicaltrials.org), and national collections
    (such as DDF (DK), ForskDok (NO), etc.
  • Temporary import objects
  • When imported data are not in sufficient quality
    to create valid objects
  • when data cannot be properly related to other
    objects upon import
Write a Comment
User Comments (0)
About PowerShow.com