Kate Lopez - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Kate Lopez

Description:

... and admission Accurate metadata Database ... hyper information Hypertext vs. textual information Visibility vs. quality Types of Hypertext Evaluation ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 18
Provided by: clop7
Category:

less

Transcript and Presenter's Notes

Title: Kate Lopez


1
The Development of Accurate Information Retrieval
Solutions in Early Search Engines
  • Kate Lopez
  • December 5, 2008
  • CS 349

2
Idea
  • Historical progression of ideas about the proper
    method of ranking or scoring web pages.
  • Look at several early ideas for ranking and
    mitigating spam and index manipulation.
  • The structure of the Web will inherently change
    the fundamentals of the system of information
    retrieval that we use. It will have to
    continually adapt to changes in content and user
    preference, while maintaining an acceptable level
    of trust in accuracy.

3
Marchiori
  • Security of World Wide Web Search Engines,
    Massimo Marchiori, 1996
  • The Quest for Correct Information on the Web
    Hypter Search Engines, Massimo Marchiori, 1997

4
Background
  • Economic motivation
  • Defense against attacks
  • Web Structure
  • Partial function from URL to sequence of bytes
  • Web Object
  • Pair of URL and sequence
  • Score function
  • Flattening phenomenon
  • Heavy competition situation

5
Search Engine Persuasion "Spamdexing"
  • Artificial repetition of relevant keys
  • Example fake commercial web object
  • Impact on reliability
  • Solution
  • Truncation

6
Other Approaches to SEP defense
  • Probabilistic
  • Search engine post-processor
  • Effectiveness grows with market pressure
  • Clustering and shuffling
  • Unique-Top
  • Frequency implies relevance assumption
  • Percentage score function
  • Hyper
  • Advertise competitor web objects to score high

7
Hyper Search Engines
  • How do we properly classify objects in response
    to the users needs?
  • New measure of informative content hyper
    information
  • Hypertext vs. textual information
  • Visibility vs. quality

8
Types of Hypertext Evaluation
  • Single links
  • Multiple links
  • Link type
  • Local
  • Frame
  • Other

9
Testing Post-Processor Implementation
  • Randomly select 25 queries
  • Subjects search for relevant information given a
    topic, then evaluate result

10
Lynch
  • When Documents Deceive Trust and Provenance as
    New Factors for Information Retrieval in a
    Tangled Web, Clifford Lynch, 2001

11
Historical Assumptions of IR
  • Behavior, consistency, and admission
  • Accurate metadata
  • Database type
  • Full documents
  • Surrogates
  • Document passivity

12
In Conflict with the Web
  • Distributed information environment
  • Document inconsistency
  • Document presentation
  • The user
  • The crawler
  • Metadata manipulation
  • Creator
  • Source document vs. page viewed
  • Provenance of data and metadata
  • User trust preferences

13
Security Concerns Indexing
  • Page manipulation to alter behavior
  • Index spamming
  • Page jacking
  • Selective response
  • Indexer countermeasures
  • Result spot checking
  • Page certification

14
Security Concerns Metadata
  • Simple distinctions within searches
  • Accuracy
  • Who generated content?
  • Does it accurately reflect the object it
    describes?
  • Metadata use uncommon because of these reasons
  • Potential solutions
  • Indexer and content provider collaboration
  • Signature of assertion
  • Example RDF
  • Public key infrastructure systems
  • Pretty Good Privacy system

15
User Expectations
  • Formalization of expectations about behavior and
    trust in behavior
  • Credentials
  • Personal preferences database
  • Levels of trust

16
Conclusions
  • Pre-Lycos
  • Saw the development of web terminology and the
    first attempts to defend against information
    manipulation.
  • Post-Lycos and Pre-Google
  • Developers began to focus on more on user
    preferences, which led to progress in the method
    of page rank.
  • Post-Google
  • Users looked ahead to potential vulnerabilities
    and improvements of the system.

17
Questions?
Write a Comment
User Comments (0)
About PowerShow.com