Title: Enabling Access to Scientific
1Enabling Access to Scientific Technical
Data-sets in e-Science a role for Library
Science
- James L. Mullins, PhD
- Dean of Libraries
- Professor of Library Science
- Purdue University
- March 24, 2008
University of Illinois Mortenson Center South
African Librarian Program
2e-Science
- What is meant by e-Science?
- Large scale science increasingly carried out
through distributed global collaborations enabled
by the Internet - Such collaborative scientific enterprise requires
access to massive data collections, large scale
computing resources and high performance
visualization back to the individual user
scientists - Requires large scale storage, retrieval and
transfer
3Innovative Research Concepts
- Data Authors benefit from their own work,
broadly disseminated, safely archived. - Data Managers -- collaborates by insuring
successful retention and dissemination through
technical infrastructure - Data Scientists conduct creative inquiry and
analysis, enhance the research of data authors
National Science Board, Long-lived digital data
collections Enabling research and education in
the 21st century, p. 27.
4Innovative Research Concepts
Data Scientists crucial to the
successful management of a digital
data collection lie in having
their contributions fully recognized
National Science Board, Long-lived digital data
collections Enabling research and education in
the 21st century, p. 27.
5National Science Foundation Recognition of the
Challenge for Data Curation
Dr. Christopher Greer Former Program Director
Office of Cyberinfrastructure, NSF
6To Stand the Test of Time Long Term Stewardship
of Digital Data Sets in Science and Engineering
- A report to the National Science Foundation from
the ARL Workshop on New Collaborative
Relationships The Role of Academic Libraries in
the Digital Data Universe - Supported by NSF, September 26-27, 2006
- Attendees NSF program directors disciplinary
researchers information technologists computer
scientists and librarians - http//www.arl.org/bmdoc/digdatarpt.pdf
7To Stand the Test of Time Long Term Stewardship
of Digital Data Sets in Science and Engineering
Overarching Recommendation
- NSF should facilitate the establishment of a
sustainable framework for the long-term
stewardship of data. This framework should
involve multiple stakeholders by supporting - Research to understand, model, prototype data
stewardship - Training and educational programs to develop new
workforce - Efforts to effect change in the research
enterprise regarding the importance of the
stewardship of digital data produced
8To Stand the Test of Time Long Term Stewardship
of Digital Data Sets in Science and Engineering
Specific Recommendations
How can Libraries respond? How can Libraries
prepare?
9 Conceptualization By Chris Greer, NSF
ICenter
10Scholarly Communication
in the past,libraries involved at this end
traditional research publication
publishedresearchnon-traditional
unpublishedresearchtraditional
secondarytertiaryresources
publishedresearchtraditional
publisheddata/datasets
analyzeddata/datasets
currently many attempts todata mine to uncover
data
processeddata/datasets
metadata curation profiles for dataallow
forward/backward movement through scholarly
communication process
rawdata/datasets
Source D. Scott Brandt, Purdue University
11One Example
12One Example
Purdue University
13Purdue University
- Founded 1869 by gift from John Purdue
- Premier programs engineering (astronautics
alumnus Neil Armstrong) agriculture hospitality
and tourism business computer science
communications. - 39,102 students 2007/08
- Third largest international student enrollment in
U.S. 4,994 for 2007/08 (over 2,000 from India,
China and Korea combined).
14Purdue University
- Nine Colleges Agriculture, Consumer Family
Sciences, Education, Engineering, Liberal Arts,
Management, Pharmacy/ Nursing/Health Sciences,
Technology, Vet Medicine - 73 Departments, several cross-disciplinary e.g.
Agricultural Biological Engineering
15Interdisciplinary collaboration
Discovery Park A 44 acre site,
interdisciplinary centers which are designed to
facilitate and promote leading edge research
Cyber Infrastructure
Energy
Oncology
Entrepreneurship
e-Enterprise
Manufacturing
Nanotechnology
Environment
Learning Center
Bioscience
16Envisioning New Interdisciplinary Collaborations
Associate Dean for Research, D. Scott
Brandt, Professor of Library Science Facilitates
individual and interdisciplinary research
efforts of the fifty Libraries faculty
17Purdue University Libraries
- Since 2004, initiative for Libraries faculty to
collaborate with other faculty across
campusapply library science knowledge and
expertise to research problems -
- collect, organize, describe, curate, archive,
disseminate data/information
18Determine need for collaboration
- Hypothesized that researchers have data
management needs and that librarians can help
meet them - Employed top-down and bottom-up investigation for
data collection - Verified PU researchers said they need help in
collecting, organizing and providing access to
their data
19Outside of the library
- Attended research seminars, call-outs, etc., to
identify collaboration and funding opportunities - Built relationships - found researchers who
understood that collecting, organizing and
providing access to data and information are not
only important, but critical - Found problems to solve, then collaborated on
solutions - Talked about what we knoworganizing data and
information (different meanings to different
groups) - Brought something to the table. Had to be
prepared to demonstrate something tangible
(initially a proof-of-concept or a prototype).
20Current areas of collaboration
- Discovery Learning Center
- Earth Atmospheric Sciences
- English
- IT at Purdue
- Mechanical Engineering Technology
- Regenstrief Center
- Graduate School
- Oncological Sciences
- Agronomy
- Biology
- Cancer Center
- Center for the Environment
- Chemical Engineering
- Chemistry
- Civil Engineering
- Cyber Center
21Motivation (library participants)
- Directly related to work, and makes something
difficult easier - Its an extension of our everyday job
- Something new and exciting to do
- Breaking new ground, want to contribute to
interdisciplinary initiative - Force the issue of how it gets done (i.e., more
people added to help out)
22Motivation (non-participants)
- Articulation of what is expected by the Dean
- Partly determined on a case-by-case basis
- Has to be interesting to me
- Something that uses the skills I can bring to
it - Need to get credit for it (recognition, reward)
- Important to allow individual to define what
interdisciplinary research is - Should be opportunities to "stick your toe in the
water" before making big commitment - Need time to do it, and to do the things I want
to do
23Distributed Data Curation Center D2C2
- Sustainability for data curation repositories
- Ontological and taxonomic organization of
disciplinary datasets - Metadata to facilitate access to data
- collections
- Data curation profiles for archiving and
preserving datasets - http//d2c2.lib.purdue.edu/
24Recap
- 22 Libraries faculty involved in 32 grants since
April of 2006 - New positions Data Research Scientist to support
research - Computer Science/Libraries discussion on joint
appointment - Sun Microsystems gift -
- Sun StorageTek 5800, 32 terabytes, for D2C2
research. -
25100 conversations, lead to 20 discussions, lead
to 5 grants, lead to 1 award
26Web 2.0 for e-Science nanoHUB from PowerPoint
Presentation by Mark Lundstrom, Gerhard Klimeck,
Michael McLennan, Purdue University, Network for
Computational Nanotechnology.
27nanoHUB, Purdue University
https//www.nanohub.org/home
28National Science Foundation (NSF) DataNet 07-601
- e-Science data stewardship, Five 20 million
grants - Competition to Build a Data network
- Replicate for print resources for data
- Proposals being led by librarians, collaborators
are computer scientists, information
technologists, domain scientists sociologists,
information scientists, computer engineers,
museum and K-12 educators metadata and ontology
specialists data visualization specialists. - International collaboration with UK, Australia,
and China. - Submission deadline March 21st, announcement
summer 2008. -
29Thank you! Questions and Answers? James L.
Mullins Purdue University, USA
jmullins_at_purdue.edu