Network Structure of Folksonomies - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Network Structure of Folksonomies

Description:

{Roma, holidays, Italy} vs {Roma, football, we_won} vs {Roma, love, ... Global frequency. Relative frequencies. STATEMENT: Resources sharing 'rare' tags are ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 29
Provided by: Andr126
Category:

less

Transcript and Presenter's Notes

Title: Network Structure of Folksonomies


1
Network Structure of Folksonomies
  • Vito D. P. Servedio
  • Dipartimento di Fisica, Università di Roma "La
    Sapienza
  • Centro Studi e Ricerche "Enrico Fermi"

TAGora Semiotic Dynamics of Online Social
Communities EU-IST-2006-034721
2
In collaboration with
Andrea Baldassarri, Ciro Cattuto, Vittorio
Loreto
Miranda Grahl, Andreas Hotho, Christoph
Schmitz, Gerd Stumme
3
AGENDA
  • Properties of folksonomy hypergraphs
  • Network of tag co-occurrences
  • Clustering of resources

4
a folksonomy example del.icio.us screenshot
5
data structure basic units of information
  • Post (tags, user, resource)
  • TAS tag assignment (tag, user, resource)

(bookmarking, sharing, collaborative,
folksonomy, andreab, http//del.icio.us)
(bookmarking, andreab, http//del.icio.us) (shar
ing, andreab, http//del.icio.us)
(collaborative, andreab, http//del.icio.us) (f
olksonomy, andreab, http//del.icio.us)
6
folksonomy hypergraph structure
7
data collection
TAGora Project (STREP FP6)
Semiotic Dynamics in Online Social Communities
  • del.icio.us
  • Work co-ordinated by Uni of Kassel
  • Collected data from Nov. 2006.
  • Over 667K users
  • 19 million resources
  • Nearly 2.5 million tags
  • 140 million tag assignments
  • 50GB of data

www.tagora-project.eu
Consortium University of Roma La
Sapienza SONY CSL Paris University of
Kassel University of Koblenz-Landau University of
Southampton
  • flickr
  • Work co-ordinated by Uni of Koblenz
  • Collected data 21st May 2007.
  • 300K users
  • 25 million photos
  • 1.5 million tags
  • over 110 million tag assignments
  • bibsonomy
  • Complete dataset (june 2007)
  • 1385 users
  • 37651 tags
  • Over 149K resources

8
artificial networks permuted and binomial
In the following slides we shall use some
artificial networks defined as
PERMUTED take the original folksonomy and
shuffle all nodes in the same class.
permuted example Resource1 User3 Tag2 Resource2
User2 Tag3 Resource1 User1 Tag1 Resource1 User1
Tag2 Resource1 User2 Tag1 Resource1 User1 Tag4
example Resource1 User1 Tag1 Resource1 User1
Tag2 Resource1 User1 Tag3 Resource1 User2
Tag1 Resource1 User2 Tag2 Resource2 User3 Tag4
We end up with a hypergraph with same degree of
the original one
BINOMIAL same number of hyperedges
endpoints chosen uniformly at random among T, U, R
9
average path length (extimated)
moving on hyperedges
10
cliquishness
A high resource cliquishness indicates that many
of the users related to that resource assign
overlapping sets of tags to it
Tr3 Ur2 tur3
tur of hyperlinks connected to r Tr
of adjacent tags Ur of adjacent
users
11
connectedness / transitivity
12
AGENDA
  • Properties of folksonomy hypergraphs
  • Network of tag co-occurrences
  • Clustering of resources

13
networks of tag co-occurrence
Tags acquire a stronger semantic context when
they co-occur each other e.g. Roma, holidays,
Italy vs Roma, football, we_won vs Roma,
love, girls etc.
  • Tag co-occurrences in posts
  • Weighted graph of tags
  • Weight number of common posts
  • Strength of a tag
  • Sum of its edge weights
  • Can we study sematics of tags?
  • (japan,tokyo more frequent than physics,sex)
  • --check with Google!
  • Compare statistics with shuffled graphs

14
weighted network of tag co-occurrence
Two Tags co-occur if they are present in the same
post
We can say more
Two tags t, t co-occur with weight w if they are
simultaneously present in w posts.
In terms of adjacency Tensors
tensor contraction in flat space
We examine the weighted undirected network
defined by W
15
strength cumulative distribution
tag shuffled example Resource1 User1
Tag2 Resource1 User1 Tag3 Resource1 User1
Tag1 Resource1 User2 Tag4 Resource1 User2
Tag1 Resource2 User3 Tag2
The tag reshuffling procedure makes almost no
changes in the P(s) the strength is related to
frequency of tags, not on semantics
16
Average neighbour strength
  • Examine strength correlation between neighbors
  • Positive correlation Assortative mixing
  • e.g. Social networks
  • Negative correlation Disassortative mixing
  • e.g. Technological networks
  • Look for spam infection
  • Reveal semantics via shuffled graph

17
average neighbor strength
Scatter plot
spam
  • Tags introduced with spamming, cluster together
  • Shuffling the graph changes the measure
  • Correlations related to semantics correspond to
    a region in the graph

spam
spam
18
AGENDA
  • Properties of folksonomy hypergraphs
  • Network of tag co-occurrences
  • Clustering of resources

19
clustering and community detection
  • Folksonomies complex tripartite networks
  • (tag, user, resource)
  • Clustering detection can reveal
  • sub-set of users (social communities)
  • sub-set of tags (semantic frames, jargons)
  • sub-set of resources (social classification)
  • Other

Now we focus on clustering of resources using
only tag assignments
20
resource similarity network
Weighted network How to choose weights?
How to take into account tag frequency?
21
tag clouds for resources
  • Each resource is characterised by a tag-cloud
  • tags are assigned by users, and appear with
    different frequency.

22
similarity metrics
TF/IDF-like weighting procedure
STATEMENT Resources sharing rare tags
are closely related
23
case in study
  • Sample of 400 resources
  • 200 resources tagged with design
  • 200 resources tagged with politics
  • Does the similarity network show two clusters?

Finer structure? Subclusters?
24
similarity matrix
Broad variability of similarity strengths on
logarithmic scale.
P(w)
A small power (0.1) is used as an effective way
to treat with vanishing weights.
w
TASK Find column and row permutations that
uncover a block structure
25
spectral analysis
A. Capocci, V.D.P. Servedio, G. Caldarelli and F.
Colaiori, Physica A 352, 669 (2005). and many
others
Q Eigenvalues
 Laplacian  matrix
26
cluster identification
Correlation of homologous components reveals
cluster structure.
V2 v2,1, v2,2, ...,v2,n , V3 v3,1, v3,2,
..., v3,n , V4 v4,1, v4,2, ..., v4,n
v2,i v3,i v4,i
2
4
3
1
27
cooperative classification
Tag clouds of the six identified clusters of
resources
28
Conclusions and outlooks
  • Folksonomies are the way people is building the
    information and communication systems of our
    future.
  • Folksonomies are a laboratory to study
    human/social/semiotic dynamics.
  • A Folksonomy is a growing tri-partite network,
    whose nodes are users, resources and metadata
    (tags), while (hyper)links are annotation events
  • (note that this structure is similar to search
    queries
  • user, search string, resource retrieved).
  • Folksonomies statistical structure reveals many
    complex features, typical of interacting humans.
  • Projections of folksonomy on different spaces can
    be useful to study
  • spam infection
  • semantic of tags
  • emerging resource classification.
Write a Comment
User Comments (0)
About PowerShow.com