Title: The Historical AND Digital Brown
1THE HISTORICAL AND DIGITAL BrownWhite
Turning reels and paper into a CONTENTdm
collection with article level access
Christine Guenther Product Manger, Digital
Services Bethlehem, PAformerly OCLC
Preservation Service Center
2The Roadmap
Selection
ACCESSQUALITY
METADATAQUALITY
IMAGEQUALITY
3Workflow Choices
Lehigh U Project
4(No Transcript)
5Evaluation of source material
Start with FILM or digitize ORIGINAL?
? Print Master? Preservation grade film?
Budget gt Scan from existing film
? Poor filming? Only poor copies? (color
content) gtScan direct
6Hybrid approach scan good film, use originals
for the other volumes
112 years, 114 volumes, about 55,000
pagesProduction 2007/2008.
7Scanning Original BW volumes
- Zeutschel 7000 bookscanner
- 400 dpi capture
- 8-bit grayscale
- Bound volumes vs. gutter text loss
8Scanning Preservation Microfilm
- NextScan Eclipse
- 35mm roll film scanner
- Same specifications
- Pro/Con Scan rate high, but post-scan image
enhancements necessary (deskew, crop etc)
9Quality Assurance!
- Bethlehem Digital QA team
- View every image for proper capture and
completeness - Collect issue date and page sequence data for all
pages - Deal with naming irregularities typical for
student newspapers or dupes on film
10From pixels to content
11Beyond OCR
- Challenge OCR is cost-effective, but only as
accurate as text quality appearance in the source
file - Simple OCR is not reliable for recognizing
specific metadata such as Issue Date and blind
for document structure (articles) - Used metadata elements similar to NDNP guidelines
(including LCCN, geographic coverage, etc.) - Advanced Search goes beyond full text
search!Offering access points for discovery.
12Metadata collection - Results
13Content conversion
- CCS docWORKS software
- CCS developed the schema in cooperation with 12
EU and US libraries during the EU-funded METAe
project. - ALTO Analyzed Layout and Text Object
- The Library of Congress chose CCS's ALTO-schema
for the National Digital Newspaper Program (NDNP)
14METS/ALTO
- METS Contains all digital preservation data
bibliographical, administrative, technical,
structural ONE PER ISSUE - Passport for digital preservation
- ALTO Contains layout information and OCR results
each word is mapped to a specific location in
an image ONE PER PAGE - Can also include article level information
15Summary The data package per page
access
- Best Image in standard format
- SUSTAIN
- REPROCESS(?)
- Well suited for oversize content
- QUICK ACCESSw/ DETAIL
- De-facto standard
- PRINT
- CONTAINED
- SearchabilityXML
- FULL TEXT
CCS docWORKS
16Extra feature Segmentation
- Layout analysis
- Manual correction
- Article jumps
- Headline correction
17Production floor in India
18Headline correction
19Online presentation system CONTENTdm
20CONTENTdm Collection Building
- Import METS/ALTO JPEG2000 PDF Compound
Objects (one per issue) - Troubleshooting
- Quality Assurance
21Thank you!
- For further information
- Christine Guenther
- Backstage Library Works9 South Commerce
WayBethlehem, PA 180171-800-773
7222guenthec_at_oclc.org - Lara Henry
- Sales Representative
- lara_at_bslw.com
- 1-800-288-1265
- www.bslw.com