Document Objects: Their Formats, Storage and Processing - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Document Objects: Their Formats, Storage and Processing

Description:

Multi-lingual characters. Layout retention. Editability. File size ... Display of multi-lingual characters. Ability to display characters of different languages ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: profraj
Category:

less

Transcript and Presenter's Notes

Title: Document Objects: Their Formats, Storage and Processing


1
Document Formats and Their Characteristics
T.B. RajashekarNational Centre for Science
InformationIndian Institute of ScienceBangalore
560 012 (E-Mail raja_at_ncsi.iisc.ernet.in)
2
About This Presentation
  • Goal To understand different characteristics of
    document formats
  • Related reference
  • Selecting Electronic Document Formats, by Gary
    Cleveland, IFLA UDT Programme. August 1999.
    www.ifla.org/VI/5/op/udtop11/udtop11.htm

3
Why Study E-Documents?
  • E-documents are at the core of DLs
  • Irrespective of variety of metadata, search and
    browse features supported by a DL, users are
    ultimately interested in gaining access to
    e-documents
  • Issues related to their creation, storage and
    retrieval are therefore of key importance

4
E-document Formats
  • Characteristics of an e-document are determined
    by its format
  • These characteristics also determine the
    processes and technologies used for their
    creation, storage, indexing, retrieval, and their
    appropriateness for use in DLs
  • Understanding these characteristics is useful in
    assessing the importance of different e-document
    formats

5
Characteristics of E-document Formats
  • Machine-readability
  • Multi-lingual characters
  • Layout retention
  • Editability
  • File size
  • Multiple-page
  • Structured or non-structured
  • Multimedia
  • Support for links
  • Screen display
  • Printing
  • Resource overheads
  • Degree of usage
  • Cost

6
Characteristics
  • Machine-readability
  • Ability to recognize text within a document
  • Required for indexing and searching
  • Many formats need special processing to identify/
    extract text
  • Display of multi-lingual characters
  • Ability to display characters of different
    languages

7
Characteristics
  • Layout retention
  • Degree to which a document preserves the look of
    the original document
  • Editability
  • Does the format allow editing of its content?
    Level of ease/ difficulty
  • Multiple-page
  • Does the format support inclusion of all pages of
    a document within the same file?

8
Characteristics
  • File size
  • Same document rendered in different formats
    occupies different storage
  • Related issues storage space, server speed,
    bandwidth server, user
  • Structured, non-structured
  • Explicit identification of document elements like
    author, title, chapters, sections, etc.

9
Characteristics
  • Multimedia
  • Does the format support multimedia?
  • Does our material require such support?
  • Support links
  • Linking capability compound documents,
    multimedia
  • Screen display
  • Is the document to be primarily read online?
    (e.g. website pages)

10
Characteristics
  • Printing
  • How well and how easily the format prints on
    paper
  • Resource overhead
  • Tools required for creation
  • Complexity of preparation
  • Skills needed for preparation

11
Characteristics
  • Degree of usage
  • Degree of usage of the format how common is its
    usage
  • Cost
  • Cost for creation and maintenance s/w, h/w,
    staff, training, available funding

12
Common E-document Formats
  • Image formats
  • TIFF
  • GIF
  • JPEG
  • Basic text formats
  • ASCII
  • UNICODE
  • RTF
  • Presentation formats
  • Postscript
  • PDF
  • Structured formats
  • HTML
  • SGML
  • XML
  • Audio and video formats
  • Audio Wave, Real Audio, MP3
  • Video AVI, MPEG

13
Image Formats
  • Used to display digital images of text pages,
    photographs, illustrations, artwork, and other
    graphical material.
  • Digitization digital images of paper
    pages.
  • Common image formats in use include TIFF, GIF,
    and JPEG.

14
Basic Text Formats
  • Simplest form of electronic documents
  • Contain only a simple string of characters and
    are devoid of more complex information
    (structure, layout, diagrams, etc.)
  • ASCII (8 bit, Latin chars), UNICODE (16 bit,
    65,536 chars all major written languages)
  • RTF (Microsofts text format with some formatting
    support)

15
Presentation Formats
  • Formats that have been developed for on screen
    display or printing.
  • Also called as page description formats.
  • They are typically static, single file formats
    that do not contain any structure information.
  • Common presentation formats Adobe PostScript and
    Adobe PDF.

16
Structured Formats
  • Formats that support explicit tagging of document
    elements
  • Formats under this category
  • SGML (Standard Generalized Markup Language)
  • XML(Extensible Markup Language)
  • HTML

17
Table giving detailed comparison of
characteristics of different document formats
Write a Comment
User Comments (0)
About PowerShow.com