Title: LIBR 557 Advanced Information Retrieval
1Web Trends and Issues
- Looking to the near and far future of Web
searching
2Objective
- To identify some of the important issues and
future trends affecting web search
3Outline
- Web standards
- Semantic Web
- Z39.50
- Folksonomy
- Social Searching
- Misinformation on the Web
4Metadata standards for the Web
- Metadata data about data
- Standards rules on how data describes data
- Web has grown without any standards in place
5Goal of metadata on the Web
- To impose a structure on web resources using a
descriptive meta-language for the purpose of
resource discovery and resource sharing - In other words, to make the web more searchable
and sharable
6XML eXtensible Markup Language
- XML is a meta-language a language for creating
other languages - Defines the rules for tagging web elements, but
does not define the tags themselves - Customized tags can be created for any purpose
7Why is XML so important?
- XML is ideal for expressing metadata
- Web browsers understand it
- Flexible
- Supports exchange of data between disparate
sources interoperability - Relatively easy to learn
- Non-proprietary
- Becoming the standard recommended by W3C since
1988
8XML basic tagging rules
- Starts with an XML declaration
- lt?xml version 1.0?gt
- Tags must have open and closing tag
- ltauthorgt..lt/authorgt
- Tags must be properly nested
- ltdategtltmonthgtMarchlt/monthgtltyeargt2004lt/yeargtlt/dategt
- Tag attributes must be in quotations
- ltdescription about http//www.art.com
gt..lt/descriptiongt - Tags are case-sensitive
9XML is an empty shell
- On its own, XML is not a complete metadata
framework - It must be used in conjunction with metadata
languages in order to describe what documents are
about
10XML and RDF
- RDF (Resource Description Framework) is a
metadata syntax standard for how information
elements on a web page are described - RDF consists of three parts
- resource
- element type
- Value
- http//www.w3c.org/rdf/
11RDF might look like this
- ltdescription abouthttp//dstc.com.au/report.html
gt - ltauthorgtJacky Crystallt/authorgt
- Where
- resource http//dstc.com.au/report.html
- element type ltauthorgt
- value Jacky Crystal
12XML, RDF, and Schemas
- RDF provides a structure for metatags, but does
not say what those metatags should be called e.g.
ltauthorgt vs ltcreatorgt - Metadata schema is a controlled list of tags and
their hierarchical relationship to one another - Dublin Core is a commonly used schema
13Dublin Core metadata schema
- First proposed in Dublin, Ohio in 1995
- Metadata schema consisting of 15 core data
elements and associated qualifiers - Not meant to be exhaustive describes common
information properties only - Often used in combination with other,
discipline-specific schemas
14Dublin Core Metadata Element Set
15XML mark-up of Dublin Core elements in RDF format
- lt? xml version"1.0" ?gt
- ltRDF xmlns "http//w3.org/TR/1999/PR-rdf-syntax-
19990105" - xmlnsDC http//purl.org/DCgt
- ltDescription about "http//dstc.com.au/report.ht
ml" gt - ltDCTitlegtThe Future of Metadata lt/DCTitlegt
- ltDCCreatorgtJacky Crystal lt/DCCreatorgt
- ltDCDategt1998-01-01lt/DCDategt
- ltDCSubjectgt Metadata, RDF, Dublin Core
lt/DCSubjectgt - lt/descriptiongt
- lt/RDFgt
16XML in use in libraries
- California Digital Librarys eScholarship
Editions http//texts.cdlib.org/escholarship/ - University of Buffalos XML based catalogue
http//ublin.lib.buffalo.edu/ub/netcat/ - MARC-XML conversion services already exist
(Stanfords Medlane Project) - ILS systems (Endeavor) and many databases already
support XML (Medline, EBSCO)
17XML in use for RSS feeds
- Look for this symbol on sites
- Content marked up in XML to create RSS feeds
for syndication (distribution) - Usually used for news sites or blogs sites where
new content added daily or hourly - Need to subscribe
- Need a RSS reader to view lots out there, most
are free to download (look up RSS in Googles
Web Directory for a good list)
18XML in use by search engines
- Largely remains to be seen
- The hope is that
- Search options and performance will improve
- Richer content will emerge from the invisible
web in XML format and be searchable through
search engines - e.g. library catalogues
19For web standards to succeed, there must be
- Collaboration and agreement on schemas
- Wide-spread deployment
- Crosswalks
- Support by all stakeholders
- Time and money
20Pitfalls of metadata standards
- There are too many of them
- Slow to develop
- Its complicated
- Web still works without them
- Decentralized (strength or weakness?)
212. Semantic Web
- Semantic (si-'man-tik)
- 1 of or relating to meaning in language
- 2 of or relating to semantics (the study of
meanings)
22What is the Semantic Web?
- Currently, just a vision
- Brainchild of Tim Berners-Lee, inventor of the
Web - Widespread use of RDF/XML and metadata languages
on the Web to create machine-understandable data - In the Semantic Web, computers query and process
information and can perform specific tasks for you
23Searching the Semantic Web
- Decentralized searching
- Deploy agents to query the Web versus mere
searching
24Not everyone is convinced
- Clay Shirky www.shirky.com believes the Semantic
Web is deeply flawed - It describes a world where language is merely
math done with words - Further, the Semantic web is missing the point
25Its the needle we want..sort of
- A known needle in a known haystack
- A known needle in an unknown haystack
- An unknown needle in an unknown haystack
- Any needle in a haystack
- The sharpest needle in a haystack
- Most of the sharpest needles in a haystack
- All the needles in a haystack
- Affirmation of no needles in the haystack
- Things like needles in any haystack
- Let me know whenever a new needle shows up
- Where are the haystacks?
- Needles, haystacks -- whatever.
- - Dr. Matthew Koll
263. Z39.50
- First conceived by the library community in
1960-70s to create a national bibliographic
network - Became the search retrieval standard for IR
systems - Began with the exchange of MARC records, an early
metadata standard - Way ahead of Semantic Web
27Key features
- Ability to search and share resources across a
distributed network of computers, regardless of
native search interface - Offers a consistent view of information from a
wide variety of sources - Supports complex Boolean, keyword, proximity,
truncation and limit searching - On of the few tried and tested standards for
shared semantic knowledge
28Lessons learned from Z39.50
- Standards dont guarantee mutual agreement and
understanding - Incorrect and inconsistent implementation by
vendors - Slow performance
- Key bits of information still not accessible
cannot access holdings information item status
via Z39.50
294. Folksonomy
- Meanwhile, the world is self-metadating
- Folksonomy is the means for people to tag
objects using their own vocabulary so that it is
easy for them to re-find that information again. - Works best when natural terms are used not what
the person perceives will be used by others - Commonly referred to as tagging
- term coined by Thomas Vander Wal
30Tagging sites
- Flickr www.flickr.com
- Del.icio.us http//del.icio.us
- 43Things www.43things.com
- Real-time tag search engine
- Technorati www.technorati.com
31Problems with self-metadating
- Messy and uncontrolled
- Flat namespace no hierarchy
- Synonyms
- e.g. Mac and Macintosh and Apple
- Ambiguity
- e.g. ramblings, random thoughts, stuff, musings
arent that helpful - Spellings/Plurals
- e.g labor and labour book and books
32Plus side of self-metadating
- Fast, easy, and already a high level of
participation - Reflects real-world vocabulary
- Not culturally exclusive
- Direct sharing and communication with other users
335. Social searching
- People helping people
- Eurekster www.eurekster.com personalizes results
based on the sites used by your network of
friends or colleagues
34Web logs or Blogs
- Online journals with entries organized by reverse
chronological order - Blogs link to other blogs, establishing networks
of people and communities - Growing use of blog software for staff intranets
- librarian.net blog by Jessamyn West
- misbehaving.net - Women technology blog
35Personalization
- Search results based on demographic information
- Yahoo, AOL and Google have announced intentions
to introduce personalization - e.g. Google Personalized http//labs.google.com/pe
rsonalized
36Localization
- Localized searches e.g. www.metrobot.com/
- Jon Udells walking tour of his hometown
http//weblog.infoworld.com/udell/gems/gmap2_flash
.html
376. Antisocial aspects of the Web..
- Disinformation
- martinlutherking.org
- Identity theft
- Seisint (owned by Lexis-Nexis), Choicepoint data
leaks - Charity Scams e.g. Nigerian Letter
- Spoofs and Parodies
- Lip Balm Anonymous www.kevdo.com/lipbalm/
- Counterfeit sites
38(No Transcript)
39(No Transcript)
405. Librarians of the future
- Is finding information going to be easier in the
future? - What skills will we need?
- Do I need to understand programming?
41Trust what you know
- Its not new, just new lingo
- Databases, databases, databases
- Cataloguing principles
- Classification
- Indexing
- Critical evaluation of resources
- Creativity,curiosity, tenacity also help
42Next week