Semantics and the Web - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Semantics and the Web

Description:

(1)Assertion of Meaning on the Web ... Trust strangers? What would your mother tell you? 'Attention Economy' Information is plentiful ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 38

Provided by: terryb61

Category:

more less

Transcript and Presenter's Notes

Title: Semantics and the Web

1
Semantics and the Web

Terry Brooks
University of Washington
tabrooks_at_u.washington.edu

2
How is Meaning Constructed?

(1)Assertion of Meaning on the Web
The Semantic Web History, Description,
Technology and Sociology
(2)Aggregation of Meaning on the Web
Web Semantics HTML, Googlebot, Culture of Lay
Indexing

3
Environments for Constructing Meaning

Open Web vs Closed Web

Culture provides a context for the use of
technology
4
The Idea of The Semantic Web

Suppose you wish to find the Ms. Cook you met at
a trade conference last year. You don't remember
her first name, but you remember that she worked
for one of your clients and that her son was a
student at your alma mater. An intelligent search
program can sift through all the pages of people
whose name is "Cook" (sidestepping all the pages
relating to cooks, cooking, the Cook Islands and
so forth), find the ones that mention working for
a company that's on your list of clients and
follow links to Web pages of their children to
track down if any are in school at the right
place. (Berners-Lee 2001)
(Google indexes 4.2 billion web pages, April 1,
2004)

5
World Brain

H.G. Wells
1937

6
Memex Machine

Vannevar Bush
1945

7
Hypertext
Ted Nelson 1965
8
(No Transcript)
9
The Semantic Web

Tim Berners-Lee
Scientific American
May 17, 2001

A new form of Web content that is meaningful to
computers will unleash a revolution of new
possibilities
10
Meaningful to computers

META keywords tag
ltMETA
name"keywords"
content"vacation, Greece, sunshine"gt

11
Meaningful to computers

Dublin Core Metadata Set
ltMETA
name "DC.Creator"
content "Simpson, Homer"gt

12
Meaningful to computers

Resource Description Framework
ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnscontact"http//www.w3.org/20
00/10/swap/pim/contact"gt
ltcontactPerson rdfabout"http//www.w3.org/Peop
le/EM/contactme"gt ltcontactfullNamegtEric
Millerlt/contactfullNamegt
ltcontactmailbox rdfresource"mailtoem_at_w3.org"
/gt ltcontactpersonalTitlegtDr.lt/contactpersonalTi
tlegt
lt/contactPersongt
lt/rdfRDFgt

13
Open Web Link computers together and share
information Trust strangers? What would your
mother tell you?
14
Attention Economy

Information is plentiful
Attention is scarce
Goldhaber, 1997
0.1 websites get 32.3 of activity
Huberman, 2001

15
META keywords tag called Spam magnet Danny
Sullivan, March 2003
Dublin Core avoided as spam Dublin Core FAQ
RDF has not caught on with a large user community
Eberhart, August 15, 2002
16
Asserting meaning on the open web?

My web page means this!

17
Semantics
A Moment to Reflect Where Were
Going -----------------------------------------

Assertion of Meaning on the Open Web
The Semantic Web History, Description,
Technology and Sociology
Aggregation of Meaning on the Open Web
Web Semantics HTML, Googlebot, Culture of Lay
Indexing

18
There is semantic content on the open web
Indexing your own work

HTML A markup language!

ltpgtMy paragraphlt/pgt lth1gtMy headinglt/h1gt
Indexing somebody elses work

Hyperlink
Hyperlink text

lta href http//goodPage.htmlgt
lta hrefhttp//goodPage.htmlgtTasty Donutslt/agt
19
Google Harvests Lay Indexing

Googlebot crawls the web
Google parses
Web page content
Hyperlinks
Hyperlink text itself
Text surrounding hyperlinks

Lay Indexers
20
PageRank

Google works because it relies on the millions of
individuals posting websites to determine which
other sites offer content of value. Instead of
relying on
a group of editors or
solely on the frequency with which certain terms
appear,
Google ranks every web page using a breakthrough
technique called PageRank. PageRank evaluates
all of the sites linking to a web page and
assigns them a value, based in part on the sites
linking to them.
By analyzing the full structure of the web,
Google is able to determine which sites have been
"voted" the best sources of information by those
most interested in the information they offer.
This technique actually improves as the web gets
bigger, as each new site is another point of
information and another vote to be counted
Google Corporate Information

21
Probability

Satisfies the average web searcher because Google
has aggregated the valuations of the average web
author
Transforms web authors into lay indexers of web
content where links are a plebiscite for the most
important web pages

22
Dogs

I-love-dogs.com (PageRank 6/10)
Guide Dogs for the Blind (PR 6/10)
American Kennel Club (PR 6/10)
January 29, 2004
14.5 million web pages

23
Aggregating Meaning

Google aggregates the meaning expressed by lay
indexers in the web content, hyperlinks and
hyperlink text.
Other aggregators
Blogdex
Price comparison aggregators
News feed aggregators

24
Information or Spam?

Aggregation Surreptitiously collecting
information from other people
Bad faith Anticipating aggregation and
manipulating the aggregator

25
Corporate Asset

Collecting genuine web authorship
Bad faith A web author games Google to assert
his meaning in place of the meaning given by the
web community
Link farming
Google bombing
Cloaking
Misuse of metadata

26
Culture of Lay Indexing

Ignorance When, if and how the googlebot will
aggregate content
Mistrust Google must assume that web writers are
constantly scheming

27
Google's order of results is automatically
determined by more than 100 factors, including
our PageRank algorithm. ... Due to the nature of
our business and our interest in protecting the
integrity of our search results, this is the only
information we make available to the public about
our ranking system. (PageRank Information)
28
http//faculty.washington.edu/tabrooks/
29
Traditional Methods of Constructing Meaning

Trust in the Assertions of a Few
Indexers - index sense Nichols, 1892
ERIC database subject experts indexers

30
Legacy Culture of Trust

Information professionals
Controlled access

31
Web A Lawless Meaning Space Traditional
closed information systems honored the assertion
of meaning by a single individual, but to succeed
Google must distrust it. This is the social
consequence of a network technology that permits
anyone to conflate the roles of author, indexer
and publisher. That is, the Internet is an "open"
system where anyone can author anything and
declare its meaning, i.e., a lawless meaning
space.
32
Technological Consequences of Culture of Lay
Indexing

META keywords tag
Dublin Core
Resource Description Framework (RDF)
Also, it is interesting to note that metadata
efforts have largely failed with web search
engines, because any text on the page which is
not directly represented to the user is abused to
manipulate search engines. There are even
numerous companies which specialize in
manipulating search engines for profit. (Brin
Page, 1998)

33
Cool Tricks and Feeding the Googlebot

Any reduction in ignorance will be neutralized
SEO (search engine optimizers)
Google has made five significant changes to
its algorithmic formulas in the past two weeks,
Brin said.
Seattle Post-Intelligencer, Wednesday, February
18, 2004

34
Need for a Writers Guide

Google lists hazards Javascript, cookies,
session IDs, frames, DHTML, Flash
Poems That Go
We are open to all forms of multimedia,
computer-generated, and interactive work that
include (but are not limited to) HTML, Shockwave,
Quicktime, streaming media, Flash, Java, and
DHTML content.

35
Different Webs