Digital Archives at the National Library of Medicine - PowerPoint PPT Presentation

About This Presentation
Title:

Digital Archives at the National Library of Medicine

Description:

Titles and images selected by Michael North, Head of Rare Books and Early Manuscripts ... Books are scanned, and from the initial scan a photocopy and a TIFF ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 48
Provided by: diane67
Learn more at: https://www.nlm.nih.gov
Category:

less

Transcript and Presenter's Notes

Title: Digital Archives at the National Library of Medicine


1
Digital Archives at the National Library of
Medicine
  • A presentation at the MLA Session
  • Lighting the Path Digital Repositories in the
    Real World
  • May 24, 2004
  • by Diane Boehr
  • Cataloging Unit Head, National Library of
    Medicine,
  • National Institutes of Health,
  • Health Human Services
  • boehrd_at_mail.nlm.nih.gov

2
Scope
  • Historical medical works
  • The NLM Archive
  • PubMed Central

3
Considerations as you begin a project
  • It will take much longer than you anticipate
  • You will learn a great deal about topics outside
    your normal work duties
  • Be willing to take baby steps and make a start
  • It is very rewarding to see the fruits of your
    labor

4
HMD Projects
  • Historical Anatomies
  • Medicine in the Americas

5
Historical Anatomies
  • http//www.nlm.nih.gov/exhibition/historicalanatom
    ies/home.html
  • Provides high-resolution downloadable scans of
    selected important images from illustrated
    anatomical atlases dating from the 15th to the
    20th century
  • Titles and images selected by Michael North, Head
    of Rare Books and Early Manuscripts

6
Historical Anatomies
  • Consists of large JPEGs and zoomable digitized
    images from the books and a brief bibliographical
    and historical introduction to each title

7
Technical details
  • The imaging for this project is contracted out
  • The contractor makes archival quality TIFF files
    (800 ppi resolution) and from that, thumbnail and
    JPEG images are made for the site, using Adobe
    Photoshop
  • Zoomifyer Pro is used to create the pan and zoom
    images
  • The TIFF files are backed up on CD-ROMs

8
Search and retrieval
  • Individual images do not have any metadata
    associated with them at this time
  • Bibliographic citations on the site match the
    LocatorPlus records
  • As the focus of the site is selected individual
    images from the books, rather than the entire
    text, there are currently no links from the
    LocatorPlus records for the individual titles to
    images on the Web site

9
Sample screen
10
Medicine in the Americas
  • Monographic original source materials on the
    development of medicine in New World published
    prior to 1914 are being digitized in their
    entirety
  • (http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?db
    Books)

11
Technical details
  • Digitizing is being done in-house
  • Books are scanned, and from the initial scan a
    photocopy and a TIFF file are created
  • Photocopies are scanned to create OCR Word text
    files, which are then manually reviewed and
    cleaned up to create a searchable, downloadable
    PDF text in modern font
  • TIFF file is used to create the typeface and
    layout of the original published work

12
Technical details
  • Mounting of these texts on the Web and the XML
    coding of the Word files done using the NLM
    Bookshelf platform
  • Bookshelf developed by NCBI for medical texts
    supplied by publishers in SGML, or other desktop
    publishing formats
  • Platform has an existing template that allows the
    record creators to easily input metadata without
    needing to know XML

13
Search and Retrieval
  • Bookshelf site only supports keyword searching
  • Standard bibliographic data from LocatorPlus and
    brief historical data is included with the text
  • Catalog records have hot links to the Bookshelf
    site

14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
Timeframes
  • Both projects went from planning to
    implementation in about one year, although both
    projects will be adding more material to their
    sites
  • Use of standard, off the shelf products or
    existing technologies made implementation easier

19
NLM Archives
  • A site to store material of permanent value that
    has been published on the NLM Web site, but is
    now outdated or superseded
  • Searchable, yet clearly distinguished from
    current material

20
What do we mean by permanent?
  • Three aspects to permanence were identified
  • 1) Identifier validity The extent to which the
    given name or identifier will always provide
    access to the same resource
  • 2) Resource availability The extent to which a
    given resource is guaranteed to remain available
    in electronic form
  • 3) Content invariability The extent to which the
    content of the resource could change

21
NLM Permanence Ratings
  • Four categories of permanence have been defined
  • 1) Permanent, unchanging content NLM has made
    a commitment to keep this resource permanently
    available. Its identifier will always provide
    access to the resource. Its content will not
    change.

22
NLM Permanence Ratings
  • 2) Permanent, stable content NLM has made a
    commitment to keep this resource permanently
    available. Its identifier will always provide
    access to the resource. Its content is subject
    only to minor corrections or additions.

23
NLM Permanence Ratings
  • 3) Permanent, dynamic content NLM has made a
    commitment to keep this resource permanently
    available. Its identifier will always provide
    access to the resource. Its content could be
    revised, replaced.

24
NLM Permanence Ratings
  • 4) Permanence not guaranteed NLM has made no
    commitment to retain this resource. It could
    become unavailable at any time. Its identifier
    could be changed.

25
Workflows
  • Permanence ratings are assigned when a resource
    is promoted to the NLM Web site
  • Default permanence ratings are generated based on
    the category to which the resource belongs
  • Resource creators use a template which adds basic
    metadata, in addition to the category and
    permanence rating

26
Templates
  • Metadata input template is a feature of TeamSite,
    our Web content management software
  • No knowledge of HTML is needed to use these
    templates
  • Minimal set of required fields, with default
    values or drop-down menus supplied wherever
    possible

27
Required metadata
28
(No Transcript)
29
  • The NLM metadata set is based on Dublin Core,
    with some local adaptations
  • The full scheme may be seen at
  • http//www.nlm.nih.gov/tsd/cataloging/metafilenew.
    html

30
Workflows
  • Every resource has the minimal metadata assigned
    by the resource creator
  • Permanent resources are routed to the Cataloging
    Section
  • Complete MARC bibliographic records are created
  • Includes standardized access points, including
    MeSH and an NLM classification number
  • Accessible in LocatorPlus
  • Distributed to the utilities and other NLM
    licensees.

31
Workflows
  • The enhanced metadata created in Cataloging is
    then added back to the header information of the
    online resource
  • Preliminary metadata and the enhanced versions
    can be seen by clicking on "View source"

32
(No Transcript)
33
(No Transcript)
34
Basic metadata
35
(No Transcript)
36
Enhanced metadata
37
Archive Design
  • Separate, distinct, but integral part of the NLM
    Web site
  • Searchable with standard NLM search software
    Mindserver from Recommind

38
Archive contents
  • Out-of-date resources--older material that was
    once up on the site, but is no longer of current
    interest
  • Earlier versions of current documents that have
    undergone major revisions

39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Still to come
  • Archiving non-HTML files, such as PDF, video and
    audio clips, etc.
  • Archiving resources from areas in the library
    which do not get promoted through TeamSite

43
Impact on Cataloging
  • PubMed Central (PMC)
  • A bibliographic record must exist in the NLM
    catalog before a journal is added to PMC
  • Records must be created if the title is not
    already in the catalog
  • Downloaded from OCLC
  • Skeletal record created from local template
  • High-priority, 24 hr. turnaround time
  • Records are then fully cataloged

44
Impact on Cataloging
  • PMC
  • If the title is already in the catalog, holdings
    must be updated
  • Indicate the title is available in PMC
  • Range of issues
  • Any embargo periods

45
Impact on Cataloging
  • NLM Archive
  • Cataloger creates core level MARC records for any
    new resource on the NLM Web site rated Permanent
  • View the site, as well as utilize metadata
    supplied by record creator for descriptive data
  • Supply MeSH and NLM classification
  • Establish authorized name headings in the
    national authority file
  • Transfer this enhanced metadata back to the
    resource

46
Impact on Cataloging
  • HMD projects
  • Minimal impact on Cataloging
  • Books being digitized already have records in the
    catalog
  • HMD has its own cataloging staff who can make
    links between existing catalog records and
    digitized material

47
Impact on Cataloging
  • Despite the increased workload, we think
    archiving projects are enhanced when catalogers
    are involved in the projects
  • Catalogers increase their knowledge by becoming
    involved in these projects
Write a Comment
User Comments (0)
About PowerShow.com