WMES3103 : INFORMATION RETRIEVAL - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

WMES3103 : INFORMATION RETRIEVAL

Description:

PDF Portable Document Format for displaying and printing documents ... JPEG (Joint Photographic Experts Group) includes compression ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 18
Provided by: FSK6
Category:

less

Transcript and Presenter's Notes

Title: WMES3103 : INFORMATION RETRIEVAL


1
WMES3103 INFORMATION RETRIEVAL
  • TEXT AND MULTIMEDIA LANGUAGES AND PROPERTIES

2
INTRODUCTION
  • Text - main form of communicating data and
    information
  • Text also supplemented with multimedia elements -
    to make the contents of an IRS more attractive
    and interactive
  • Website with a combination ot text and multimedia
    will be visited by many as compared to one which
    is text-based only
  • IRS - text and multimedia is depicted via special
    languages.

3
Metadata
  • New concept on information metadata
  • Information about data arrangement, data domain
    and relationship between the two
  • Data about data
  • 2 types descriptive and semantic

4
  • descriptive Metadata metadata which explain
    about document or one unit of information
  • Commonly used Metadata
  • Authors
  • Date of publication
  • Source of publication
  • Length of document
  • Type of document

5
(No Transcript)
6
Metadata
  • semantic Metadata resembles subject that can be
    obtain from the contents of the document
    subjects heading
  • Keywords
  • LC Code

7
TEXT
  • With computers, we need to code text into binary
    digits
  • First coding schemes EBCDIC and ASCII 7 bits
    to code each symbol
  • Then, ASCII changed to 8 bits to accommodate
    other languages, accents and diacritical marks
  • Oriental languages Unicode 16 bits

8
TEXT
  • Formats
  • No one single format for a text document
  • Good IRS system should be able to retrieve
    information from any format
  • Initially, IRS will convert a document to an
    internal format but this had a lot of
    disadvantages
  • Now, many new format has been developed for
    document interchange

9
TEXT
  • RTF Rich Text Format for word processing
  • PDF Portable Document Format for displaying and
    printing documents
  • Postscript powerful programming language for
    drawing
  • MIMT Multipurpose Internet Mail Exchange to
    encode e-mail
  • Files are compressed Compress (Unix), ARJ
    (PCs), ZIP
  • Convert binary files to ASCII text
    uuencode/uudecode, binhex

10
MARKUP LANGUAGES
  • Markup extra textual syntax that can be used to
    describe formatting actions, structure
    information, text semantics, attributes, etc.
  • Formal markup languages are more structured
  • Marks tags - initial and ending tag surrounding
    the marked text
  • Standard metalanguage SGML
  • New metalanguange for Web XML (eXtensible
    Markup Language) subset of SGML
  • Most popular markup language used for the Web
    HTML (HyperText Markup Language)

11
MULTIMEDIA
  • Applications that handle different types of
    digital data originating from distinct types of
    media
  • Text, sound, images, video
  • Digital data distinct and different in volume,
    format, and processing requirements
  • Different types of formats necessary for storing
    each type of media

12
MULTIMEDIA
  • Different formats used commonly on the Web and in
    digital libraries
  • Images
  • Audio
  • Moving Images
  • Textual Images
  • Graphics and Virtual Reality

13
IMAGES
  • XBM, BMP, PCX direct representation of a
    bit-mapped (or pixel-based)
  • GIF (Graphic Interchange Format) includes
    compression and good for black or white or with
    small number of clours or gray levels (256)
  • JPEG (Joint Photographic Experts Group)
    includes compression
  • TIFF (Tagged Image File Format) used to
    exchange different documents between different
    applications and different computer platforms
  • TGA (Television Targa image file) associated
    with video game boards
  • Various other image formats

14
AUDIO
  • Must be digitized before storage
  • AU, MIDI (standard format to interchange music
    between electronic instruments and computers),
    WAVE for small pieces of digital audio
  • Audio libraries RealAudio or CD formats
  • Animation or moving pictures
  • MPEG (Moving Pictures Expert Group) related to
    JPEG
  • Others AVI, FLI, QuickTime

15
TEXTUAL IMAGES
  • Images that contain mainly typed or typeset text
  • Obtained by scanning the documents
  • For archival purposes
  • Saved as images but with further compression
  • Textual and non-textual stored and compressed
    separately and when neded can be combined and
    displayed together

16
GRAPHICS AND VIRTUAL REALITY
  • 3-dimensional graphics found on Web
  • CGM (Computer Graphics Metafile) standard
  • Metafile collection of elements
  • CGM standard specifies which elements are allowed
    to occur in which positions in a metafile
  • VRML (Virtual Reality Modeling Language) file
    format for describing interactive 3D objects and
    worlds - universal interchange format for 3D
    graphics and multimedia - can be used for various
    applications

17
MULTIMEDIA DOCUMENTS MARKUP
  • HyTime Hyper/Time-based Structuring Language
    standard defined for multimedia documents markup
  • SGML architecture which specifies the generic
    hypermedia structure of documents
Write a Comment
User Comments (0)
About PowerShow.com