Title: Data Management Plans: Approaches to Dissemination and Preservation
1Data Management Plans Approaches to
Dissemination and Preservation
- Micah Altman
- Archival Director, Henry A. Murray Research
Archive - Associate Director, Harvard-MIT Data Center
- Senior Research Scientist, Institute for
Quantitative Social Sciences
Portions of this presentation based on joint
work with Charles Franklin, forManaging Social
Science Research Data (Forthcoming 2008, Chapman
Hall). However, all errors are certainly my own.
2Contents
- Context of a data management plan
- Technologies for Dissemination
- Technical and Institutional Approaches to
Preservation
3Data Lifecycle
- Collection
- Data Entry Verification
- Processing
- Identifier assignment
- Internal metadata
- Coding
- Merging
- Cleaning
- Research and Publication Support
- Documentation, Dissemination
- Archiving
- Recoding, Secondary Analysis, Merging
- ( and back to the beginning)
(Rinse, Lather, Repeat)
4Whats Wrong with the Previous Slide
- Data is changed all the time.
- What metadata?
- Publications emerge before processing(the
conference is next week) - Dissemination happens early (sometimesmaybe if
I know you) - Much data is never archived
(So, I lied)
5Questions to ask when developing a Data
Management Plan
- Who are the stakeholders?
- What do the stakeholders require and want?
- How does the plan engage with the data lifecycle
to ensure these things? - What are the motivating questions, use cases,
scenarios? - (Example) How do I update all of my tables and
figures correctly, when I add the new data the
reviewers suggested? - (Example) Can I prove the direct connection from
data collection to publication, before a hostile
prosecutor, in court?
6Stakeholders and Legal Requirements
7Stakeholders Implicated as Information Flows
8Elements of Dissemination
- Dissemination ? promoting use
- Discoverability
- Accessibility
- Comprehensibility
- (Oh yeah, ease of ingest that too.)
9Emerging Technologies for Dissemination
- Institutional Repositories
- Web services
- Virtual-hosted archives
Plus Ça Change, Plus C'est Fou (Translation
The more things change the more they stay insane)
10Institutional Repositories
- Some tradeoffs when used for data
- Generality vs. domain-specific services
- Institutional adoption vs. use within
institution - Preservation motive vs. Dissemination
- None of these tradeoffs is fundamental, but
reflect current stage of evolution
11A Web Services Rhebus
?
Can you count how many ?s are in this picture?
12Archival Virtual Hosting
- Its virtual
- Noting to install
- Virtual collections
- Institutionally supported
- Persistent identifiers and citations
- No worries about file formats changing, backups,
etc. - All the initial setup work is done for you
- You retain total control over
- content
- Access
- presentation
- Get it now, well set it up and load the data,
for free.
13 http//dvn.iq.harvard.edu/dvn
14Dissemination Options Stakeholders and Lifecycles
See, Desmond Dekker
15Elements of Preservation
- Preservation ? ensuring the possibility of use
- Finding, selecting, acquiring intellectual
objects - Identifying intellectual content
- Safeguarding
- Verifying
16Early 19th Century Thinking on Digital
Preservation
- I met a traveler from an antique land Who said
"Two vast and trunkless legs of stone Stand in
the desert... Near them, on the sand, Half sunk
a shattered visage lies, whose frown, And
wrinkled lip, and sneer of cold command, Tell
that its sculptor well those passions read Which
yet survive, stamped on these lifeless things,
The hand that mocked them and the heart that
fed And on the pedestal these words appear My
name is Ozymandius, King of Kings, Look on my
works, ye Mighty, and despair!Nothing beside
remains. Round the decay Of that colossal wreck,
boundless and bare The lone and level sands
stretch far away. - Only half the digits were preserved.
17Technical Strategies to Preservation
- Musical Chairs (media format migration)
- Bag and Tag (save as is, decide on action
later) - Assume a Can Opener (Universal Virtual
Computer) - RomZ and EmuZ (emulation)
- Freeze-Dry (save a reduced version)
18Institutional Strategies for Preservation
- Ignore it, maybe someone else will take care of
it - (internet archive, )
- Well always be here
- (self-preservation)
-
- We are ever true to Insert Alma Mater
- (institutional archives)
- Ask someone else do it
- (ICPSR, MRA, Roper, )
- Ask someone(s) else do it
- (Data-PASS, Meta-Archive, ClockSS)
- Trust No One
- (LOCKSS)
All quotes are entirely fictional -)
19Dissemination and Preservation (Mix Match
Examples)
- Refactor web pages (slightly) to facilitate
LOCKSS (lots of journals!) - Deliver NARA content in Dataverse Network
(HARVARD, NARA, CENSUS) - Harvest dataverse content into DSPACE. (MIT,
HARVARD) - Replicate dataverse content in SRB (UNC)
- Dataverse Datapass
20Potential Nexuses for Preservation Failure
- Technical
- Media failure storage conditions, media
characteristics - Format obsolescence
- Preservation infrastructure software failure
- Storage infrastructure software failure
- Storage infrastructure hardware failure
- External Threats to Institutions
- Third party attacks
- Institutional funding
- Change in legal regimes
- Quis custodiet ipsos custodes?
- Unintentional curatorial modification
- Loss of institutional knowledge skills
- Intentional curatorial deaccessioning
- Change in institutional mission
21Institutional Preservation Strategies --
Corollaries
- There are potential single points of failure in
both technology, organization and legal regimes - Diversify your portfolio multiple software
systems, hardware, organization (e.g., Data-PASS
-) - Find international partners
- Many combinations of preservation dissemination
strategies are compatible - Layer technologies and strategies
- Leverage dissemination (in a planned way) for
preservation (and vice-versa) - Preservation is impossible to demonstrate
conclusively - Consider organizational credentials
- No organization is absolutely certain to be
reliable
22More Information
- Get a Dataverse at IQSS Now
- http//dvn.iq.harvard.edu/dvn
- Dataverse Network Software
- http//TheData.Org
- Data-PASS
- http//www.icpsr.umich.edu/DATAPASS/