Semantics and the Web - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Semantics and the Web

Description:

(1)Assertion of Meaning on the Web ... Trust strangers? What would your mother tell you? 'Attention Economy' Information is plentiful ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 38
Provided by: terryb61
Category:
Tags: semantics | strangers | the | web

less

Transcript and Presenter's Notes

Title: Semantics and the Web


1
Semantics and the Web
  • Terry Brooks
  • University of Washington
  • tabrooks_at_u.washington.edu

2
How is Meaning Constructed?
  • (1)Assertion of Meaning on the Web
  • The Semantic Web History, Description,
    Technology and Sociology
  • (2)Aggregation of Meaning on the Web
  • Web Semantics HTML, Googlebot, Culture of Lay
    Indexing

3
Environments for Constructing Meaning
  • Open Web vs Closed Web

Culture provides a context for the use of
technology
4
The Idea of The Semantic Web
  • Suppose you wish to find the Ms. Cook you met at
    a trade conference last year. You don't remember
    her first name, but you remember that she worked
    for one of your clients and that her son was a
    student at your alma mater. An intelligent search
    program can sift through all the pages of people
    whose name is "Cook" (sidestepping all the pages
    relating to cooks, cooking, the Cook Islands and
    so forth), find the ones that mention working for
    a company that's on your list of clients and
    follow links to Web pages of their children to
    track down if any are in school at the right
    place. (Berners-Lee 2001)
  • (Google indexes 4.2 billion web pages, April 1,
    2004)

5
World Brain
  • H.G. Wells
  • 1937

6
Memex Machine
  • Vannevar Bush
  • 1945

7
Hypertext
Ted Nelson 1965
8
(No Transcript)
9
The Semantic Web
  • Tim Berners-Lee
  • Scientific American
  • May 17, 2001

A new form of Web content that is meaningful to
computers will unleash a revolution of new
possibilities
10
Meaningful to computers
  • META keywords tag
  • ltMETA
  • name"keywords"
  • content"vacation, Greece, sunshine"gt

11
Meaningful to computers
  • Dublin Core Metadata Set
  • ltMETA
  • name "DC.Creator"
  • content "Simpson, Homer"gt

12
Meaningful to computers
  • Resource Description Framework
  • ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
    df-syntax-ns" xmlnscontact"http//www.w3.org/20
    00/10/swap/pim/contact"gt
  • ltcontactPerson rdfabout"http//www.w3.org/Peop
    le/EM/contactme"gt ltcontactfullNamegtEric
    Millerlt/contactfullNamegt
  • ltcontactmailbox rdfresource"mailtoem_at_w3.org"
    /gt ltcontactpersonalTitlegtDr.lt/contactpersonalTi
    tlegt
  • lt/contactPersongt
  • lt/rdfRDFgt

13
Open Web Link computers together and share
information Trust strangers? What would your
mother tell you?
14
Attention Economy
  • Information is plentiful
  • Attention is scarce
  • Goldhaber, 1997
  • 0.1 websites get 32.3 of activity
  • Huberman, 2001

15
META keywords tag called Spam magnet Danny
Sullivan, March 2003
Dublin Core avoided as spam Dublin Core FAQ
RDF has not caught on with a large user community
Eberhart, August 15, 2002
16
Asserting meaning on the open web?
  • My web page means this!

17
Semantics
A Moment to Reflect Where Were
Going -----------------------------------------
  • Assertion of Meaning on the Open Web
  • The Semantic Web History, Description,
    Technology and Sociology
  • Aggregation of Meaning on the Open Web
  • Web Semantics HTML, Googlebot, Culture of Lay
    Indexing

18
There is semantic content on the open web
Indexing your own work
  • HTML A markup language!

ltpgtMy paragraphlt/pgt lth1gtMy headinglt/h1gt
Indexing somebody elses work
  • Hyperlink
  • Hyperlink text

lta href http//goodPage.htmlgt
lta hrefhttp//goodPage.htmlgtTasty Donutslt/agt
19
Google Harvests Lay Indexing
  • Googlebot crawls the web
  • Google parses
  • Web page content
  • Hyperlinks
  • Hyperlink text itself
  • Text surrounding hyperlinks

Lay Indexers
20
PageRank
  • Google works because it relies on the millions of
    individuals posting websites to determine which
    other sites offer content of value. Instead of
    relying on
  • a group of editors or
  • solely on the frequency with which certain terms
    appear,
  • Google ranks every web page using a breakthrough
    technique called PageRank. PageRank evaluates
    all of the sites linking to a web page and
    assigns them a value, based in part on the sites
    linking to them.
  • By analyzing the full structure of the web,
  • Google is able to determine which sites have been
    "voted" the best sources of information by those
    most interested in the information they offer.
    This technique actually improves as the web gets
    bigger, as each new site is another point of
    information and another vote to be counted
  • Google Corporate Information

21
Probability
  • Satisfies the average web searcher because Google
    has aggregated the valuations of the average web
    author
  • Transforms web authors into lay indexers of web
    content where links are a plebiscite for the most
    important web pages

22
Dogs
  • I-love-dogs.com (PageRank 6/10)
  • Guide Dogs for the Blind (PR 6/10)
  • American Kennel Club (PR 6/10)
  • January 29, 2004
  • 14.5 million web pages

23
Aggregating Meaning
  • Google aggregates the meaning expressed by lay
    indexers in the web content, hyperlinks and
    hyperlink text.
  • Other aggregators
  • Blogdex
  • Price comparison aggregators
  • News feed aggregators

24
Information or Spam?
  • Aggregation Surreptitiously collecting
    information from other people
  • Bad faith Anticipating aggregation and
    manipulating the aggregator

25
Corporate Asset
  • Collecting genuine web authorship
  • Bad faith A web author games Google to assert
    his meaning in place of the meaning given by the
    web community
  • Link farming
  • Google bombing
  • Cloaking
  • Misuse of metadata

26
Culture of Lay Indexing
  • Ignorance When, if and how the googlebot will
    aggregate content
  • Mistrust Google must assume that web writers are
    constantly scheming

27
Google's order of results is automatically
determined by more than 100 factors, including
our PageRank algorithm. ... Due to the nature of
our business and our interest in protecting the
integrity of our search results, this is the only
information we make available to the public about
our ranking system. (PageRank Information)
28
http//faculty.washington.edu/tabrooks/
29
Traditional Methods of Constructing Meaning
  • Trust in the Assertions of a Few
  • Indexers - index sense Nichols, 1892
  • ERIC database subject experts indexers

30
Legacy Culture of Trust
  • Information professionals
  • Controlled access

31
Web A Lawless Meaning Space Traditional
closed information systems honored the assertion
of meaning by a single individual, but to succeed
Google must distrust it. This is the social
consequence of a network technology that permits
anyone to conflate the roles of author, indexer
and publisher. That is, the Internet is an "open"
system where anyone can author anything and
declare its meaning, i.e., a lawless meaning
space.
32
Technological Consequences of Culture of Lay
Indexing
  • META keywords tag
  • Dublin Core
  • Resource Description Framework (RDF)
  • Also, it is interesting to note that metadata
    efforts have largely failed with web search
    engines, because any text on the page which is
    not directly represented to the user is abused to
    manipulate search engines. There are even
    numerous companies which specialize in
    manipulating search engines for profit. (Brin
    Page, 1998)

33
Cool Tricks and Feeding the Googlebot
  • Any reduction in ignorance will be neutralized
  • SEO (search engine optimizers)
  • Google has made five significant changes to
    its algorithmic formulas in the past two weeks,
    Brin said.
  • Seattle Post-Intelligencer, Wednesday, February
    18, 2004

34
Need for a Writers Guide
  • Google lists hazards Javascript, cookies,
    session IDs, frames, DHTML, Flash
  • Poems That Go
  • We are open to all forms of multimedia,
    computer-generated, and interactive work that
    include (but are not limited to) HTML, Shockwave,
    Quicktime, streaming media, Flash, Java, and
    DHTML content.

35
Different Webs
  • Open Web
  • Aggregate naïve/genuine behavior
  • Writing Markup Semantics

is the semantic web ?
  • Closed Web

Your choiceDublin Core, RDF, etc. Trust,
Accountability, Cooperation
36
Semantics and the WebThe Semantic Web
  • Which web?
  • Whose semantics?

37
The Nature of Meaning in the Age of Google
http//www.ischool.washington.edu/tabrooks/Writing
/AgeOfGoogle/AgeOfGoogle.htm
Write a Comment
User Comments (0)
About PowerShow.com