Title: NDNP and CONTENTdm: A Match?
1Ohio Newspaper Digitization Project
- Since 2008
Read all about it!
NDNP and CONTENTdm A Match?
Todays Lead Article The Ohio Newspaper
Digitization Project Read All About It
2Connecting to collections Ohio Newspaper
Digitization Project
Read all about it!
- Since 2008
Two Men Suspected In Presentation
- Todays presenters
- Eric W. Schnittke, Project Coordinator
- Phil Sager, Digital Projects Developer
3The National Digital Newspaper Program
- Enhance access to all American newspapers
- Improve products of United States Newspaper
Program (USNP) using current technologies - Establish standards and best practices for
newspaper digital reformatting and access - Develop geographically-diverse program that
benefits all US communities - Use multi-phased approach for research and scaled
development - http//www.neh.gov/projects/ndnp.html
4Connecting to collections Ohio Newspaper
Digitization Project
Read all about it!
- Since 2008
GOSSIP Ohio Newspaper Digitization Project
- The Ohio Historical Societys Involvement With
NDNP - Chosen in the summer of 2008
- Digitize 100,000 pages
- Covering 1880-1922
- Ten Ohio Regions, Advisory Board
- Vendors for digitization, duplication
- Start with sample roll, ramp up
5Connecting to collections Ohio Newspaper
Digitization Project
Read all about it!
- Since 2008
Whats the Buzz? Chronicling America!
- www.loc.gov/chroniclingamerica/
- LCs website with participants entries
- Highly Searchable
- Eleven states So far.
- 1880-1910
- Start with sample roll, ramp up
6(No Transcript)
7http//www.loc.gov/chroniclingamerica/
8(No Transcript)
9How Hard Is It To Do Online Newspapers?
- Can range from the
- Crude to the sophisticated
- Easy to the complicated
10Questions to Think About
- How much time, effort, money do you have to
expend? - How strictly would you like to adhere to the
lastest best practices and standards? - There is a certain amount of tradeoff
11NDNP Specification
- NDNP Spec gold standard, with respect to
newspapers on microfilm - Digitization standards
- Metadata creation
- http//www.loc.gov/ndnp/pdf/NDNP_200911TechNotes.p
df for more info.
12- Although in several ways it stops short of the
ideal, for example - Page-level (vs. article level representation)
- No added-value descriptive metadata beyond
title/edition - E.g. manually keyed text such as birth and death
notices, etc.)
NDNP Specification
13- Two Choices
- 1. OCLC NDNP loader software
- Advantages
- We wouldnt have to be involved much in the
conversion process - Yet still have some control over what assets and
metadata are processed.
Loading NDNP-formatted output into CONTENTdm
14- Two Choices
- 1. OCLC NDNP loader software
- Disadvantages
- New version still being worked on for CDMv5 (OHS
NDNP data will be test-case) - Fee-based
- License software (c.f. OCR license)?
Loading NDNP-formatted output into CONTENTdm
15- Two Choices
- 2. Vendor-prepared display files, plus tab file
- Advantages Free
- Disadvantages
- Probably will require more back-and-forth to get
mappings correct for tab file - Batch upload with Project Client
- May rule out option of JP2-based word-bounded
highlighting
Loading NDNP-formatted output into CONTENTdm
16- What metadata?
- Basic descriptive and structural metadata (e.g.
title, issue, pagination, etc.)? - Technical and administrative metadata?
- METS ALTO
- ALTO Analyzed Layout and Text Object
- METS schema extension
- Used to wrap word coordinate metadata (and other
page layout data) - Any hope of using without the OCLC loader
software?
Vendor discussion points
17- What files?
- Searchable PDFs only?
- JPEG2000s only
- Both?
- Implications for
- Storage and backup
- Online user experience
Vendor discussion points
18- Looked at two methods
- 1. Multi-page PDF method
- Advantages
- Like the Adobe Reader 9 viewer interface
- Flexible format for printing
Ohio Jewish Chronicle (non-NDNP)
19- Looked at two methods
- 1. Multi-page PDF method
- Disadvantages
- Dont Like the old Adobe Readers (e.g. v6)
- Need to perform second in-document search to see
highlighted hits - Slower to load
- Vendor cost per page higher
Ohio Jewish Chronicle (non-NDNP)
20- Second method
- 2. TIFFs ? JP2s with OCR
- Advantages
- Viewer is all server-side
- Can take advantage of OCR and word-bounding
functionality built into the Project Client - Vendor cost per page lower
Ohio Jewish Chronicle (non-NDNP)
21- Second method
- 2. TIFFs ? JP2s with OCR
- Disadvantages
- Viewer not as good (as Reader 9)
- (Institutions trying alternatives like Zoomify,
etc.)
Ohio Jewish Chronicle (non-NDNP)
22- NDNP spec is excellent, but demanding
- However, the difference in cost per page is
substantial to produce NDNP formatted output - Preference is to outsource microfilm
digitization and metadata creation - Might be hard to convince some institutions
(including our own) that the NDNP way is the best
way to go
Future Newspaper Efforts
23Connecting to collections Ohio Newspaper
Digitization Project
Read all about it!
- Since 2008
CLASSIFIEDS Questions? Comments?
Wiki http//ohsweb.ohiohistory.org/ondp Ohio
Memory http//www.ohiomemory.org Email us
at eschnittke_at_ohiohistory.org psager_at_ohiohistory.
org