The Semantic Web in use: Analyzing FOAF Documents - PowerPoint PPT Presentation

About This Presentation
Title:

The Semantic Web in use: Analyzing FOAF Documents

Description:

The Semantic Web in use: Analyzing FOAF Documents Li Ding, Lina Zhou, Tim Finin and Anupam Joshi University of Maryland, Baltimore County DARPA contract F30602-00 ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 42
Provided by: umb48
Category:

less

Transcript and Presenter's Notes

Title: The Semantic Web in use: Analyzing FOAF Documents


1
The Semantic Web in useAnalyzingFOAF Documents
  • Li Ding, Lina Zhou,Tim Finin and Anupam Joshi
  • University of Maryland, Baltimore County

DARPA contract F30602-00-0591and NSF awards
ITR-IIS-0326460 and ITR-IIS-0325464 provided
partial research support for this work
2
Outline
  • Motivation
  • Introduction
  • The six popular ontologies
  • FOAF vocabulary
  • Why FOAF
  • Building FOAF Document collection
  • FOAF Document Identification
  • FOAF Document Discovery
  • Popular Properties of foafPerson
  • Applications
  • Personal Information Fusion
  • Social Network Analysis

3
The Semantic Web
  • The semantic web vision is that information and
    services are described using shared ontologies in
    KR-like markup languages, making them accessible
    to machines (programs).
  • How do we get there?
  • What kind of ontologies? IEEE SUO? Cyc?
  • What kind of languages? RDF? OWL? RuleML?
  • Its reasonable to start with the simple and move
    toward the complex
  • From Dublin Core to CYC
  • From RDF to OWL and beyond
  • Significant semantic web content exists today
  • Using simple vocabularies (e.g., FOAF) and
    RDF/RDFS

4
The Semantic Web
  • The more important word in Semantic Web is the
    latter
  • The KR aspects of the SW were taken off the
    shelf, the result of 25 years of research done in
    the AI community
  • Remember hypertext? It was a nice research
    backwater going back to the 50s (recall Memex
    and Xanadu)
  • Hypertext was forever change by the Web
  • So maybe the web will forever change KR
  • TBL The Semantic Web will globalize KR, just as
    the WWW globalize hypertext

5
Web of what?
  • What features does the web bring to the table?
  • Anyone can say anything about anything
  • The meaning of RDF terms will be (partly)
    determined socially
  • Its a web of documents, services, agents and
    people

6
What kind of Ontologies?
Thesauri narrower term relation
space of interest
Disjointness, Inverse,part of
Frames (properties)
Formal is-a
Catalog/ID
CYC
RDF
DAML
DB Schema
RDFS
UMLS
Wordnet
OO
IEEE SUO
OWL
General Logical constraints
Formal instance
Informal is-a
Value Restriction
Terms/ glossary
ExpressiveOntologies
Vocabularies
SimpleOntologies
Taxonomies
After Deborah L. McGuinness (Stanford)
7
The Semantic Web Today
  • There are several simple RDF vocabularies that
    are widely used today
  • Dublin Core
  • RSS
  • FOAF
  • Its instructive to study how these are being
    used today
  • And to track how their usage changes

8
The Six Most Popular Ontologies
RDF
DC
RSS
MCVB
FOAF
RDFS
The statistics is generated by http//swoogle.umbc
.edu
9
A usecase FOAF
  • FOAF (Friend of a Friend) is a simple ontology to
    describe people and their social networks.
  • See the foaf project page http//www.foaf-project
    .org/
  • We recently crawled the web and discovered over
    1,500,000 valid RDF FOAF files.
  • Most of these are from seveal blogging system
    that encode basic user info in foaf
  • See http//apple.cs.umbc.edu/semdis/wob/foaf/

ltfoafPersongt ltfoafnamegtTim Fininlt/foafnamegt ltfo
afmbox_sha1sumgt241037262c252elt/foafmbox_sha1sum
gt ltfoafhomepage rdfresource"http//umbc.edu/fi
nin/" /gt ltfoafimg rdfresource"http//umbc.edu/
finin/images/passport.gif" /gt lt/foafPersongt
10
FOAF vocabulary http//xmlns.com/foaf/0.1/
_at_
11
FOAF why RDF? Extensibility!
  • FOAF vocabulary provides 50 basic terms for
    making simple claims about people
  • FOAF files can use other RDF terms too RSS,
    MusicBrainz, Dublin Core, Wordnet, Creative
    Commons, blood types, starsigns,
  • RDF guarantees freedom of independent extension
  • OWL provides fancier data-merging facilities 
  • Result Freedom to say what you like, using any
    RDF markup you want, and have RDF crawlers merge
    your FOAF documents with others and know when
    youre talking about the same entities. 

After Dan Brickley, danbri_at_w3.org 
12
No free lunch!
  • Consequence
  • We must plan for lies, mischief, mistakes, stale
    data, slander
  • Dataset is out of control, distributed, dynamic
  • Importance of knowing who-said-what
  • Anyone can describe anyone
  • We must record data provenance
  • Modeling and reasoning about trust is critical
  • Legal, privacy and etiquette issues emerge
  • Welcome to the real world

After Dan Brickley, danbri_at_w3.org 
13
FOAF example using XML
  • ltrdfRDF
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synta
    x-ns"
  • xmlnsfoaf"http//xmlns.com/foaf/0.1/"gt
  • ltfoafPersongt
  • ltfoafnamegtTim Fininlt/foafnamegt
  • ltfoafmbox rdfresource"mailtofinin_at_umbc.edu"/
    gt
  • lt/foafPersongt
  • lt/rdfRDFgt

14
FOAF example using XML
  • ltfoafPersongt
  • ltfoafnamegtTim Fininlt/foafnamegt
  • ltfoafmbox rdfresource"mailtofinin_at_umbc.edu"/
    gt
  • ltfoafnickgtTimlt/foafnickgt
  • ltfoafhomepage rdfresource"http//umbc.edu/fin
    in/"/gt
  • ltfoafimg rdfresource "http//umbc.edu/finin/p
    assport.gif"/gt
  • lt/foafPersongt

15
FOAF example using XML
  • ltfoafPersongt
  • ltfoafnamegtTim Fininlt/foafnamegt
  • ltfoafknowsgt
  • ltfoafPersongt
  • ltfoafnamegtAnupam Joshilt/foafnamegt
  • ltrdfseeAlso rdfresource
    "http//umbc.edu/joshi/joshi.foaf"/gt
  • ltfoafknowsgt
  • lt/foafPersongt

16
FOAF isnt the only one
  • Other ontologies are used to publish social
    information
  • Swoogle finds gt360 RDFs or OWL classes with the
    local name person.

17
Lots of FOAF tools
18
Why FOAF
  • Information Creators
  • Community membership management
  • Unique Person Identification (privacy preserved)
  • Indicating Authorship
  • Information Consumers
  • Provenance tracking
  • Social networking
  • Expose community information to new comers
  • Match interests
  • Trust building block

19
Studying how FOAF is being used
  • What counts as a FOAF document?
  • How can we find foaf documents?

20
Identify a FOAF document
  • D is a generic FOAF document when 1,2,3 met
  • D is a strict FOAF document when 1,2,3,4 met
  • D is an RDF document.
  • D uses FOAF namespace
  • The RDF graph serialized by D contains the
    sub-graph below
  • D defines one and only one Person instance

foafPerson
rdftype
X
foafY
Z
21
Different FOAF collections
  • DS-Swoogle
  • Foaf documents selected from Swoogles database
    of 340K semantic web documents
  • Swoogle selects at most 1000 documents from any
    site
  • DS-FOAF
  • Custom crawler found 1.5M foaf documents, most
    from a few large blog sites (e.g., livejournal)
  • DS-FOAF-Small
  • Subset of 7K non-blog foaf documents from 1K
    sites defining 37K people

22
FOAF document Discovery
  • Bootstrap using web search engine (Got 10,000
    docs)
  • Discovery using rdfsseeAlso semantics (Got
    1.5M docs)

Top 7 FOAF websites
23
From DS-Swoogle
  • 17 SWDs add to the definition of foafPerson
  • e.g., defining superclasses, disjointness, etc.
  • 162 properties are defined for foafPerson
  • e.g., properties whose domain is foafPerson
  • 74 properties defined as relations between people
  • e.g., properties with both domain and range of
    foafPerson
  • 582 properties used
  • e.g., used to assert something of a foafPerson
    instance

24
Popular properties of foafPerson
Top 10 popular properties (per document)
non-blog(26,936) liveJournal.com (20,298,073) DS-FOAF-SMALL (33,790)
1 foafmbox_sha1sum (0.84) foafmbox_sha1sum (1.0) foafname(0.80)
2 foafhomepage (0.66 ) dcdescription(1.0) foafmbox_sha1sum(0.71)
3 foafname (0.64) dctitle (1.0) foafnick (0.51)
4 foafnick (0.61) foafnick (1.0) foafhomepage (0.40)
5 foafweblog (0.60) foafpage (1.0) foafdepiction (0.35)
6 foafknows (0.44) foafweblog (0.99) foafweblog (0.30)
7 foafmbox (0.38) rdfsseeAlso (0.85) foafknows (0.28)
8 foafimg (0.38) foafknows (0.85) foafsurname (0.27)
9 bioolb (0.35) foafdateOfBirth (0.71) foaffirstName (0.26)
10 rdfsseeAlso (0.34) foafinterest (0.67) rdfsseeAlso (0.26)
11 foafmbox (0.26)
DS-FOAF-SMALL is a newly dataset in Oct 2004,
based on 7276 evenly sampled documents.
25
Popular properties of foafPerson
Top 10 popular properties (per instance)
non-blog(26,936) liveJournal.com (20,298,073) DS-FOAF-SMALL (33,790)
1 foafname (0.84) dctitle (1.74) foafname(0.69)
2 foafknows (0.79) foafinterest (1.68) foafmbox_sha1sum(0.65)
3 foafhomepage (0.63) foafnick (1.04) rdfsseeAlso (0.39)
4 foafmbox_sha1sum (0.51) foafweblog (1.00) foafnick (0.26)
5 rdfsseeAlso (0.40) rdfsseeAlso (0.99) foafhomepage (0.18)
6 dctitle (0.31) foafknows (0.95) foafmbox (0.15)
7 foafnick (0.22) foafpage (0.95) foafweblog (0.15)
8 foafweblog (0.18) dcdescription (0.046) foaffirstName (0.11)
9 foafmbox (0.15) foafmbox_sha1sum (0.046) foafsurname (0.11)
10 damlequivalentTo (0.13) foafdateOfBirth (0.046) foafdepiction (0.10)
11 foafknows (0.07)
DS-FOAF-SMALL is a newly dataset in Oct 2004,
based on 7276 evenly sampled documents.
26
Extracting social networks
  • Three steps
  • Discovering foaf instances
  • Merging instances representing the same person
  • Linking people via foafknows and other foaf
    based relations
  • e.g., quaffingdrankBeerWith
  • Integrating other SNA data
  • e.g., from co-author relationships mined from
    citeseer

27
Merging instances
  • Named instances
  • Inverse functional properties
  • Set of nearly inverse functional properties
  • OWL constraints
  • RdfseeAlso

28
Collecting Personal Information
http//www-2.cs.cmu.edu/People/fgandon/foaf.rdf
httpwww.cs.umbc.edu/dingli1/foaf.rdf
29
Caution Collision? Mistake!
caution
http//www.ilrt.bris.ac.uk/people/cmdjb/webwho.xrd
f
http//www.mindswap.org/katz/2002/11/jordan.foaf
30
SNA1 Instances of foafPerson/doc
  • Zipfs distribution
  • Sloppy tail few foaf documents contain thousands
    of instances

Cumulative distribution
31
SNA2 Instances of foafPerson/group
A group refers to a fused person
  • Zipfs distribution
  • Sloppy tail some instances are wrongly fused due
    to incorrect FOAF documents

Cumulative distribution
32
Degree analysis
  • For social networks, the in-degree and out-degree
    measure of a person is of interest
  • Can be used to identify hubs and authorities or
    to compute other interesting properties or
    rankings
  • Analyzing most large social networks reveals that
    in-degree and out-degree follows a power law or
    Zipf distribution
  • We found that to be the case for social networks
    induced by foaf documents.

33
SNA3 In-degree of group
  • Zipfs Distribution
  • Sharp tail few FOAF documents have large
    in-degrees

Cumulative distribution
34
SNA4 Out-degree of group
  • Zipfs distribution
  • Sloppy tail few person directory documents

Cumulative distribution
35
SNA5 Patterns of FOAF Network
  • Four types of group
  • Isolated
  • Only in
  • only one inlink (97)
  • Only out
  • Both (intermediate)
  • Basic Patterns
  • Singleton (isolated)
  • Star (only out) an active person publishes
    friends
  • Clique a small group

36
SNA6 Size of components
  • Zipfs distribution
  • Sloppy head singleton
  • Sloppy tail blog websites (e.g.
    www.livejournal.com)

Cumulative distribution
37
SNA7 Growth of FOAF network
  • The data suggests that there is a natural
    evolution for a social network
  • (1) disjointed star-like, connected components
  • (2) link together to form trees and forests,
  • (3) eventually forming a scale-free network

38
SNA7 Growth of FOAF network
3
1
2
39
The Map of FOAF network
Blog.livedoor.jp
non-blog
www.ecademy.com
June 2004
www.livejournal.com
40
Conclusions
  • The semantic web is evolving
  • There is a growing volume of RDF content
  • FOAF is one of the one of the early successes.
  • FOAF data is being used
  • FOAF data is relatively easy to collect and
    analize
  • FOAF data is a good source for social network
    information

41
Questions?
  • Demo http//apple.cs.umbc.edu/semdis
  • Swoogle http//swoogle.umbc.edu/
  • ebiquity group http//ebiquity.umbc.edu
Write a Comment
User Comments (0)
About PowerShow.com