Title: Semantics and the Web
1Semantics and the Web
- Terry Brooks
- University of Washington
- tabrooks_at_u.washington.edu
2How is Meaning Constructed?
- (1)Assertion of Meaning on the Web
- The Semantic Web History, Description,
Technology and Sociology - (2)Aggregation of Meaning on the Web
- Web Semantics HTML, Googlebot, Culture of Lay
Indexing -
-
3Environments for Constructing Meaning
Culture provides a context for the use of
technology
4The Idea of The Semantic Web
- Suppose you wish to find the Ms. Cook you met at
a trade conference last year. You don't remember
her first name, but you remember that she worked
for one of your clients and that her son was a
student at your alma mater. An intelligent search
program can sift through all the pages of people
whose name is "Cook" (sidestepping all the pages
relating to cooks, cooking, the Cook Islands and
so forth), find the ones that mention working for
a company that's on your list of clients and
follow links to Web pages of their children to
track down if any are in school at the right
place. (Berners-Lee 2001) - (Google indexes 4.2 billion web pages, April 1,
2004)
5World Brain
6Memex Machine
7Hypertext
Ted Nelson 1965
8(No Transcript)
9The Semantic Web
- Tim Berners-Lee
- Scientific American
- May 17, 2001
A new form of Web content that is meaningful to
computers will unleash a revolution of new
possibilities
10Meaningful to computers
- META keywords tag
- ltMETA
- name"keywords"
- content"vacation, Greece, sunshine"gt
11Meaningful to computers
- Dublin Core Metadata Set
- ltMETA
- name "DC.Creator"
- content "Simpson, Homer"gt
12Meaningful to computers
- Resource Description Framework
- ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnscontact"http//www.w3.org/20
00/10/swap/pim/contact"gt - ltcontactPerson rdfabout"http//www.w3.org/Peop
le/EM/contactme"gt ltcontactfullNamegtEric
Millerlt/contactfullNamegt - ltcontactmailbox rdfresource"mailtoem_at_w3.org"
/gt ltcontactpersonalTitlegtDr.lt/contactpersonalTi
tlegt - lt/contactPersongt
- lt/rdfRDFgt
13Open Web Link computers together and share
information Trust strangers? What would your
mother tell you?
14Attention Economy
- Information is plentiful
- Attention is scarce
- Goldhaber, 1997
- 0.1 websites get 32.3 of activity
- Huberman, 2001
15META keywords tag called Spam magnet Danny
Sullivan, March 2003
Dublin Core avoided as spam Dublin Core FAQ
RDF has not caught on with a large user community
Eberhart, August 15, 2002
16Asserting meaning on the open web?
17Semantics
A Moment to Reflect Where Were
Going -----------------------------------------
- Assertion of Meaning on the Open Web
- The Semantic Web History, Description,
Technology and Sociology - Aggregation of Meaning on the Open Web
- Web Semantics HTML, Googlebot, Culture of Lay
Indexing -
-
18There is semantic content on the open web
Indexing your own work
ltpgtMy paragraphlt/pgt lth1gtMy headinglt/h1gt
Indexing somebody elses work
lta href http//goodPage.htmlgt
lta hrefhttp//goodPage.htmlgtTasty Donutslt/agt
19Google Harvests Lay Indexing
- Googlebot crawls the web
- Google parses
- Web page content
- Hyperlinks
- Hyperlink text itself
- Text surrounding hyperlinks
Lay Indexers
20PageRank
- Google works because it relies on the millions of
individuals posting websites to determine which
other sites offer content of value. Instead of
relying on - a group of editors or
- solely on the frequency with which certain terms
appear, - Google ranks every web page using a breakthrough
technique called PageRank. PageRank evaluates
all of the sites linking to a web page and
assigns them a value, based in part on the sites
linking to them. - By analyzing the full structure of the web,
- Google is able to determine which sites have been
"voted" the best sources of information by those
most interested in the information they offer.
This technique actually improves as the web gets
bigger, as each new site is another point of
information and another vote to be counted - Google Corporate Information
21Probability
- Satisfies the average web searcher because Google
has aggregated the valuations of the average web
author - Transforms web authors into lay indexers of web
content where links are a plebiscite for the most
important web pages
22Dogs
- I-love-dogs.com (PageRank 6/10)
- Guide Dogs for the Blind (PR 6/10)
- American Kennel Club (PR 6/10)
- January 29, 2004
- 14.5 million web pages
23Aggregating Meaning
- Google aggregates the meaning expressed by lay
indexers in the web content, hyperlinks and
hyperlink text. - Other aggregators
- Blogdex
- Price comparison aggregators
- News feed aggregators
24Information or Spam?
- Aggregation Surreptitiously collecting
information from other people - Bad faith Anticipating aggregation and
manipulating the aggregator
25Corporate Asset
- Collecting genuine web authorship
- Bad faith A web author games Google to assert
his meaning in place of the meaning given by the
web community - Link farming
- Google bombing
- Cloaking
- Misuse of metadata
26Culture of Lay Indexing
- Ignorance When, if and how the googlebot will
aggregate content - Mistrust Google must assume that web writers are
constantly scheming
27Google's order of results is automatically
determined by more than 100 factors, including
our PageRank algorithm. ... Due to the nature of
our business and our interest in protecting the
integrity of our search results, this is the only
information we make available to the public about
our ranking system. (PageRank Information)
28http//faculty.washington.edu/tabrooks/
29Traditional Methods of Constructing Meaning
- Trust in the Assertions of a Few
- Indexers - index sense Nichols, 1892
- ERIC database subject experts indexers
30Legacy Culture of Trust
- Information professionals
- Controlled access
31 Web A Lawless Meaning Space Traditional
closed information systems honored the assertion
of meaning by a single individual, but to succeed
Google must distrust it. This is the social
consequence of a network technology that permits
anyone to conflate the roles of author, indexer
and publisher. That is, the Internet is an "open"
system where anyone can author anything and
declare its meaning, i.e., a lawless meaning
space.
32Technological Consequences of Culture of Lay
Indexing
- META keywords tag
- Dublin Core
- Resource Description Framework (RDF)
- Also, it is interesting to note that metadata
efforts have largely failed with web search
engines, because any text on the page which is
not directly represented to the user is abused to
manipulate search engines. There are even
numerous companies which specialize in
manipulating search engines for profit. (Brin
Page, 1998)
33Cool Tricks and Feeding the Googlebot
- Any reduction in ignorance will be neutralized
- SEO (search engine optimizers)
- Google has made five significant changes to
its algorithmic formulas in the past two weeks,
Brin said. - Seattle Post-Intelligencer, Wednesday, February
18, 2004
34Need for a Writers Guide
- Google lists hazards Javascript, cookies,
session IDs, frames, DHTML, Flash - Poems That Go
- We are open to all forms of multimedia,
computer-generated, and interactive work that
include (but are not limited to) HTML, Shockwave,
Quicktime, streaming media, Flash, Java, and
DHTML content.
35Different Webs
- Open Web
- Aggregate naïve/genuine behavior
- Writing Markup Semantics
-
is the semantic web ?
Your choiceDublin Core, RDF, etc. Trust,
Accountability, Cooperation
36Semantics and the WebThe Semantic Web
- Which web?
- Whose semantics?
37The Nature of Meaning in the Age of Google
http//www.ischool.washington.edu/tabrooks/Writing
/AgeOfGoogle/AgeOfGoogle.htm