Title: Reinventing Science Librarianship
1Reinventing Science Librarianship
- Education for New Roles
- Catherine Blake
- cablake_at_email.unc.edu
- http//www.ils.unc.edu/cablake
- University of North Carolina _at_ Chapel Hill
2Source The DCC Curation Lifecycle Model
3Creation
- Jupiter has moons
- Galileo, Sidereus Nuncius, 1610
- Relative sizes of the Earth, Sun and Moon
- Aristarchus's 3rd century BC
- this image - 10th century AD
Source Wikipedia
4Creation
- Little Dipper microarray processors
- Biology/pharmacology
- The first beam in the Large Hadron Collider at
CERN1 was successfully steered around the full 27
kilometers of the worlds most powerful particle
accelerator
Source http//www.scigene.com/products/little_dip
per.html http//mediaarchive.cern.ch/MediaArchive/
Photo/Public/2008/0809002/0809002_01/0809002_01-A5
-at-72-dpi.jpg
5Acquisition Collection
- Data acquired directly from scientists
- Heterogeneous formats
- multi-media
- annotations on a spreadsheet
- Varying quality
- experimental settings
- Student vs verified data
6Identification Cataloging
- Collectively identifying resources
- Group think
- Social bookmarking
- Participatory cataloging
- Eg UNC photographs
7Storage Preservation
- Storage
- 92 on magnetic media
- 5 exabytes of print, film, magnetic, and optical
storage media produced about in 2002 - Preservation
- Heterogeneous
- Changing hardware
- Changing software
Image Source http//www.cray.com/products/index.h
tml http//www2.sims.berkeley.edu/research/project
s/how-much-info-2003/
8Barriers to access removed
- Environment
- New source of information providers (Scientists,
Granting agencies) - NIH Mandated access
- Consequences
- No single point of access
- Different levels of access required
- HIPPA compliance
- Maintaining cultural norms
9Use and Reuse
- Data and Text Mining
- Use data collected for a different purpose
- Eg a side-effect of one drug becomes the purpse
of another - Information Synthesis
- Combine speculative information
- Literature Based Discovery
- Uncover transitive connections from text
10Data Oriented Roles
- Data Consultant
- Share best practice regarding how to organize
share data - Data Distributor
- Scientists control the data, distributor makes
the data available to others - Data Manager
- Manager organizes and keep the data
11New Roles
- Data Service Provider
- Data conversion and pre-processing
- Data and Text Analyst
- Scientist provides the data, analyst applies
visualization, data and text mining tools. - Embedded Roles (Data Scientist)
- Information Work flow
12Data Oriented Roles
- Information organization
- Conceptual Modeling
- Create and understand
- ER diagrams
- UML diagrams
- Concept maps
13Reference Model For an Open Archival Information
System
Sourcenost.gsfc.nasa.gov/isoas/presentations/oais
_tutorial_200005.ppt
14Data Oriented Roles
- Conceptual ?? relational models
- Good database design
- Normalization
- Methods to enforce
- data quality
- referential integrity
- Ongoing maintenance
15New Roles
- Text Mining A case study
- All text is not created equal
- Things that in the way
- Page breaks
- Figures
- Tables
- Special characters
- Implications to preservation
16Human readable form (PDF)
17Data Services Case Study
18Machine readable form
- gtlt/TABLE
- gtltP
- gtScientists engage in the discovery process more
than any other user population, yet their
day-to-day activities are often elusive. The
development of accurate models often requires
that a scientist resolve conflicting
evidence.lt/P - gtltP
- gtOne activity that consumes much of a scientists'
time is ltI - gtsynthesislt/I
- gt, ltIMG
- SRC"/giflibrary/12/ldquo.gif"
- BORDER"0"gtthe dialectic combination of thesis
and antithesis into a higher stage of truthltIMG
SRC"/giflibrary/12/rdquo.gif" - BORDER"0"gt (ltI
- gtMerriam-Webster's Collegiate Dictionarylt/I
- gt, ltA
- HREF"BIB24"
- gt2004lt/A
- gt). This dictionary definition reflects the
alternative viewpoints that often occur when
multiple empirical studies explore the same
phenomena. The synthesis activity results in an
overall findingnbsp-nbspa higher stage of
truthnbsp-nbspwhich scientists achieve by
19First phase pre-processing
- gtlt/TABLEgt
- ltPgtScientists engage in the discovery process
more than any other user population, yet their
day-to-day activities are often elusive. The
development of accurate models often requires
that a scientist resolve conflicting
evidence.lt/Pgt - ltPgtOne activity that consumes much of a
scientists' time is ltIgtsynthesislt/Igt, ltIMG
SRC"/giflibrary/12/ldquo.gif BORDER"0"gtthe
dialectic combination of thesis and antithesis
into a higher stage of truthltIMG
SRC"/giflibrary/12/rdquo.gif BORDER"0"gt
(ltIgtMerriam-Webster's Collegiate Dictionarylt/Igt,
ltA HREF"BIB24"gt2004lt/Agt). This dictionary
definition reflects the alternative viewpoints
that often occur when multiple empirical studies
explore the same phenomena. The synthesis
activity results in an overall findingnbsp-nbsp
a higher stage of truthnbsp-nbspwhich
scientists achieve by
20Second phase pre-processing
- Add Identifiers
- break paragraphs into sentences
- Add document, section, paragraph, sentence IDs
- Replacements
- symbols , references
- Output
- IdentifiersOne activity that consumes much of a
scientists' time is synthesis the dialectic
combination of thesis and antithesis into a
higher stage of truth _BIB_24. - IdentifiersThis dictionary definition reflects
the alternative viewpoints that often occur when
multiple empirical studies explore the same
phenomena.
21Text Analytics
- Clustering
- Categorization
- Association Rules
22Visualization
NCI-funded research 1995-2001
23Embedded Roles
24Embedded Roles
- Workflow
- Deep understanding
- Data formats
- Access norms
- Reward structures
- Custom pre-processing
25Closing Remarks
- Not everyone will have every skill
- Existing skills that will remain critical
- Strong ties to faculty
- Strong negotiating skills
- Knowledge of standards and resources
- The roles exist, its not clear where they will
live within an institution
The ability to think like someone within a
discipline