LIBR 557 Advanced Information Retrieval - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

LIBR 557 Advanced Information Retrieval

Description:

To identify some of the important issues and future trends affecting web search ... Synonyms. e.g. Mac and Macintosh and Apple. Ambiguity ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 43
Provided by: sheilapo
Category:

less

Transcript and Presenter's Notes

Title: LIBR 557 Advanced Information Retrieval


1
Web Trends and Issues
  • Looking to the near and far future of Web
    searching

2
Objective
  • To identify some of the important issues and
    future trends affecting web search

3
Outline
  • Web standards
  • Semantic Web
  • Z39.50
  • Folksonomy
  • Social Searching
  • Misinformation on the Web

4
Metadata standards for the Web
  • Metadata data about data
  • Standards rules on how data describes data
  • Web has grown without any standards in place

5
Goal of metadata on the Web
  • To impose a structure on web resources using a
    descriptive meta-language for the purpose of
    resource discovery and resource sharing
  • In other words, to make the web more searchable
    and sharable

6
XML eXtensible Markup Language
  • XML is a meta-language a language for creating
    other languages
  • Defines the rules for tagging web elements, but
    does not define the tags themselves
  • Customized tags can be created for any purpose

7
Why is XML so important?
  • XML is ideal for expressing metadata
  • Web browsers understand it
  • Flexible
  • Supports exchange of data between disparate
    sources interoperability
  • Relatively easy to learn
  • Non-proprietary
  • Becoming the standard recommended by W3C since
    1988

8
XML basic tagging rules
  • Starts with an XML declaration
  • lt?xml version 1.0?gt
  • Tags must have open and closing tag
  • ltauthorgt..lt/authorgt
  • Tags must be properly nested
  • ltdategtltmonthgtMarchlt/monthgtltyeargt2004lt/yeargtlt/dategt
  • Tag attributes must be in quotations
  • ltdescription about http//www.art.com
    gt..lt/descriptiongt
  • Tags are case-sensitive

9
XML is an empty shell
  • On its own, XML is not a complete metadata
    framework
  • It must be used in conjunction with metadata
    languages in order to describe what documents are
    about

10
XML and RDF
  • RDF (Resource Description Framework) is a
    metadata syntax standard for how information
    elements on a web page are described
  • RDF consists of three parts
  • resource
  • element type
  • Value
  • http//www.w3c.org/rdf/

11
RDF might look like this
  • ltdescription abouthttp//dstc.com.au/report.html
    gt
  • ltauthorgtJacky Crystallt/authorgt
  • Where
  • resource http//dstc.com.au/report.html
  • element type ltauthorgt
  • value Jacky Crystal

12
XML, RDF, and Schemas
  • RDF provides a structure for metatags, but does
    not say what those metatags should be called e.g.
    ltauthorgt vs ltcreatorgt
  • Metadata schema is a controlled list of tags and
    their hierarchical relationship to one another
  • Dublin Core is a commonly used schema

13
Dublin Core metadata schema
  • First proposed in Dublin, Ohio in 1995
  • Metadata schema consisting of 15 core data
    elements and associated qualifiers
  • Not meant to be exhaustive describes common
    information properties only
  • Often used in combination with other,
    discipline-specific schemas

14
Dublin Core Metadata Element Set
15
XML mark-up of Dublin Core elements in RDF format
  • lt? xml version"1.0" ?gt
  • ltRDF xmlns "http//w3.org/TR/1999/PR-rdf-syntax-
    19990105"
  • xmlnsDC http//purl.org/DCgt
  • ltDescription about "http//dstc.com.au/report.ht
    ml" gt
  • ltDCTitlegtThe Future of Metadata lt/DCTitlegt
  • ltDCCreatorgtJacky Crystal lt/DCCreatorgt
  • ltDCDategt1998-01-01lt/DCDategt
  • ltDCSubjectgt Metadata, RDF, Dublin Core
    lt/DCSubjectgt
  • lt/descriptiongt
  • lt/RDFgt

16
XML in use in libraries
  • California Digital Librarys eScholarship
    Editions http//texts.cdlib.org/escholarship/
  • University of Buffalos XML based catalogue
    http//ublin.lib.buffalo.edu/ub/netcat/
  • MARC-XML conversion services already exist
    (Stanfords Medlane Project)
  • ILS systems (Endeavor) and many databases already
    support XML (Medline, EBSCO)

17
XML in use for RSS feeds
  • Look for this symbol on sites
  • Content marked up in XML to create RSS feeds
    for syndication (distribution)
  • Usually used for news sites or blogs sites where
    new content added daily or hourly
  • Need to subscribe
  • Need a RSS reader to view lots out there, most
    are free to download (look up RSS in Googles
    Web Directory for a good list)

18
XML in use by search engines
  • Largely remains to be seen
  • The hope is that
  • Search options and performance will improve
  • Richer content will emerge from the invisible
    web in XML format and be searchable through
    search engines
  • e.g. library catalogues

19
For web standards to succeed, there must be
  • Collaboration and agreement on schemas
  • Wide-spread deployment
  • Crosswalks
  • Support by all stakeholders
  • Time and money

20
Pitfalls of metadata standards
  • There are too many of them
  • Slow to develop
  • Its complicated
  • Web still works without them
  • Decentralized (strength or weakness?)

21
2. Semantic Web
  • Semantic (si-'man-tik)
  • 1 of or relating to meaning in language
  • 2 of or relating to semantics (the study of
    meanings)

22
What is the Semantic Web?
  • Currently, just a vision
  • Brainchild of Tim Berners-Lee, inventor of the
    Web
  • Widespread use of RDF/XML and metadata languages
    on the Web to create machine-understandable data
  • In the Semantic Web, computers query and process
    information and can perform specific tasks for you

23
Searching the Semantic Web
  • Decentralized searching
  • Deploy agents to query the Web versus mere
    searching

24
Not everyone is convinced
  • Clay Shirky www.shirky.com believes the Semantic
    Web is deeply flawed
  • It describes a world where language is merely
    math done with words
  • Further, the Semantic web is missing the point

25
Its the needle we want..sort of
  • A known needle in a known haystack
  • A known needle in an unknown haystack
  • An unknown needle in an unknown haystack
  • Any needle in a haystack
  • The sharpest needle in a haystack
  • Most of the sharpest needles in a haystack
  • All the needles in a haystack
  • Affirmation of no needles in the haystack
  • Things like needles in any haystack
  • Let me know whenever a new needle shows up
  • Where are the haystacks?
  • Needles, haystacks -- whatever.
  • - Dr. Matthew Koll

26
3. Z39.50
  • First conceived by the library community in
    1960-70s to create a national bibliographic
    network
  • Became the search retrieval standard for IR
    systems
  • Began with the exchange of MARC records, an early
    metadata standard
  • Way ahead of Semantic Web

27
Key features
  • Ability to search and share resources across a
    distributed network of computers, regardless of
    native search interface
  • Offers a consistent view of information from a
    wide variety of sources
  • Supports complex Boolean, keyword, proximity,
    truncation and limit searching
  • On of the few tried and tested standards for
    shared semantic knowledge

28
Lessons learned from Z39.50
  • Standards dont guarantee mutual agreement and
    understanding
  • Incorrect and inconsistent implementation by
    vendors
  • Slow performance
  • Key bits of information still not accessible
    cannot access holdings information item status
    via Z39.50

29
4. Folksonomy
  • Meanwhile, the world is self-metadating
  • Folksonomy is the means for people to tag
    objects using their own vocabulary so that it is
    easy for them to re-find that information again.
  • Works best when natural terms are used not what
    the person perceives will be used by others
  • Commonly referred to as tagging
  • term coined by Thomas Vander Wal

30
Tagging sites
  • Flickr www.flickr.com
  • Del.icio.us http//del.icio.us
  • 43Things www.43things.com
  • Real-time tag search engine
  • Technorati www.technorati.com

31
Problems with self-metadating
  • Messy and uncontrolled
  • Flat namespace no hierarchy
  • Synonyms
  • e.g. Mac and Macintosh and Apple
  • Ambiguity
  • e.g. ramblings, random thoughts, stuff, musings
    arent that helpful
  • Spellings/Plurals
  • e.g labor and labour book and books

32
Plus side of self-metadating
  • Fast, easy, and already a high level of
    participation
  • Reflects real-world vocabulary
  • Not culturally exclusive
  • Direct sharing and communication with other users

33
5. Social searching
  • People helping people
  • Eurekster www.eurekster.com personalizes results
    based on the sites used by your network of
    friends or colleagues

34
Web logs or Blogs
  • Online journals with entries organized by reverse
    chronological order
  • Blogs link to other blogs, establishing networks
    of people and communities
  • Growing use of blog software for staff intranets
  • librarian.net blog by Jessamyn West
  • misbehaving.net - Women technology blog

35
Personalization
  • Search results based on demographic information
  • Yahoo, AOL and Google have announced intentions
    to introduce personalization
  • e.g. Google Personalized http//labs.google.com/pe
    rsonalized

36
Localization
  • Localized searches e.g. www.metrobot.com/
  • Jon Udells walking tour of his hometown
    http//weblog.infoworld.com/udell/gems/gmap2_flash
    .html

37
6. Antisocial aspects of the Web..
  • Disinformation
  • martinlutherking.org
  • Identity theft
  • Seisint (owned by Lexis-Nexis), Choicepoint data
    leaks
  • Charity Scams e.g. Nigerian Letter
  • Spoofs and Parodies
  • Lip Balm Anonymous www.kevdo.com/lipbalm/
  • Counterfeit sites

38
(No Transcript)
39
(No Transcript)
40
5. Librarians of the future
  • Is finding information going to be easier in the
    future?
  • What skills will we need?
  • Do I need to understand programming?

41
Trust what you know
  • Its not new, just new lingo
  • Databases, databases, databases
  • Cataloguing principles
  • Classification
  • Indexing
  • Critical evaluation of resources
  • Creativity,curiosity, tenacity also help

42
Next week
  • Group presentations
Write a Comment
User Comments (0)
About PowerShow.com