Network Structure of Folksonomies - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Network Structure of Folksonomies

Description:

{Roma, holidays, Italy} vs {Roma, football, we_won} vs {Roma, love, ... Global frequency. Relative frequencies. STATEMENT: Resources sharing 'rare' tags are ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 29

Provided by: Andr126

Category:

more less

Transcript and Presenter's Notes

Title: Network Structure of Folksonomies

1
Network Structure of Folksonomies

Vito D. P. Servedio

Dipartimento di Fisica, Università di Roma "La
Sapienza
Centro Studi e Ricerche "Enrico Fermi"

TAGora Semiotic Dynamics of Online Social
Communities EU-IST-2006-034721
2
In collaboration with
Andrea Baldassarri, Ciro Cattuto, Vittorio
Loreto
Miranda Grahl, Andreas Hotho, Christoph
Schmitz, Gerd Stumme
3
AGENDA

Properties of folksonomy hypergraphs
Network of tag co-occurrences
Clustering of resources

4
a folksonomy example del.icio.us screenshot
5
data structure basic units of information

Post (tags, user, resource)
TAS tag assignment (tag, user, resource)

(bookmarking, sharing, collaborative,
folksonomy, andreab, http//del.icio.us)
(bookmarking, andreab, http//del.icio.us) (shar
ing, andreab, http//del.icio.us)
(collaborative, andreab, http//del.icio.us) (f
olksonomy, andreab, http//del.icio.us)
6
folksonomy hypergraph structure
7
data collection
TAGora Project (STREP FP6)
Semiotic Dynamics in Online Social Communities

del.icio.us
Work co-ordinated by Uni of Kassel
Collected data from Nov. 2006.
Over 667K users
19 million resources
Nearly 2.5 million tags
140 million tag assignments
50GB of data

www.tagora-project.eu
Consortium University of Roma La
Sapienza SONY CSL Paris University of
Kassel University of Koblenz-Landau University of
Southampton

flickr
Work co-ordinated by Uni of Koblenz
Collected data 21st May 2007.
300K users
25 million photos
1.5 million tags
over 110 million tag assignments

bibsonomy
Complete dataset (june 2007)
1385 users
37651 tags
Over 149K resources

8
artificial networks permuted and binomial
In the following slides we shall use some
artificial networks defined as
PERMUTED take the original folksonomy and
shuffle all nodes in the same class.
permuted example Resource1 User3 Tag2 Resource2
User2 Tag3 Resource1 User1 Tag1 Resource1 User1
Tag2 Resource1 User2 Tag1 Resource1 User1 Tag4
example Resource1 User1 Tag1 Resource1 User1
Tag2 Resource1 User1 Tag3 Resource1 User2
Tag1 Resource1 User2 Tag2 Resource2 User3 Tag4
We end up with a hypergraph with same degree of
the original one
BINOMIAL same number of hyperedges
endpoints chosen uniformly at random among T, U, R
9
average path length (extimated)
moving on hyperedges
10
cliquishness
A high resource cliquishness indicates that many
of the users related to that resource assign
overlapping sets of tags to it
Tr3 Ur2 tur3
tur of hyperlinks connected to r Tr
of adjacent tags Ur of adjacent
users
11
connectedness / transitivity
12
AGENDA

Properties of folksonomy hypergraphs
Network of tag co-occurrences
Clustering of resources

13
networks of tag co-occurrence
Tags acquire a stronger semantic context when
they co-occur each other e.g. Roma, holidays,
Italy vs Roma, football, we_won vs Roma,
love, girls etc.

Tag co-occurrences in posts
Weighted graph of tags
Weight number of common posts

Strength of a tag
Sum of its edge weights

Can we study sematics of tags?
(japan,tokyo more frequent than physics,sex)
--check with Google!
Compare statistics with shuffled graphs

14
weighted network of tag co-occurrence
Two Tags co-occur if they are present in the same
post
We can say more
Two tags t, t co-occur with weight w if they are
simultaneously present in w posts.
In terms of adjacency Tensors
tensor contraction in flat space
We examine the weighted undirected network
defined by W
15
strength cumulative distribution
tag shuffled example Resource1 User1
Tag2 Resource1 User1 Tag3 Resource1 User1
Tag1 Resource1 User2 Tag4 Resource1 User2
Tag1 Resource2 User3 Tag2
The tag reshuffling procedure makes almost no
changes in the P(s) the strength is related to
frequency of tags, not on semantics
16
Average neighbour strength

Examine strength correlation between neighbors
Positive correlation Assortative mixing
e.g. Social networks
Negative correlation Disassortative mixing
e.g. Technological networks
Look for spam infection
Reveal semantics via shuffled graph

17
average neighbor strength
Scatter plot
spam

Tags introduced with spamming, cluster together
Shuffling the graph changes the measure
Correlations related to semantics correspond to
a region in the graph

spam
spam
18
AGENDA

Properties of folksonomy hypergraphs
Network of tag co-occurrences
Clustering of resources

19
clustering and community detection

Folksonomies complex tripartite networks
(tag, user, resource)
Clustering detection can reveal
sub-set of users (social communities)
sub-set of tags (semantic frames, jargons)
sub-set of resources (social classification)
Other

Now we focus on clustering of resources using
only tag assignments
20
resource similarity network
Weighted network How to choose weights?
How to take into account tag frequency?
21
tag clouds for resources

Each resource is characterised by a tag-cloud
tags are assigned by users, and appear with
different frequency.

22
similarity metrics
TF/IDF-like weighting procedure
STATEMENT Resources sharing rare tags
are closely related
23
case in study

Sample of 400 resources
200 resources tagged with design
200 resources tagged with politics
Does the similarity network show two clusters?

Finer structure? Subclusters?
24
similarity matrix
Broad variability of similarity strengths on
logarithmic scale.
P(w)
A small power (0.1) is used as an effective way
to treat with vanishing weights.
w
TASK Find column and row permutations that
uncover a block structure
25
spectral analysis
A. Capocci, V.D.P. Servedio, G. Caldarelli and F.
Colaiori, Physica A 352, 669 (2005). and many
others
Q Eigenvalues
Laplacian matrix
26
cluster identification
Correlation of homologous components reveals
cluster structure.
V2 v2,1, v2,2, ...,v2,n , V3 v3,1, v3,2,
..., v3,n , V4 v4,1, v4,2, ..., v4,n
v2,i v3,i v4,i
2
4
3
1
27
cooperative classification
Tag clouds of the six identified clusters of
resources
28
Conclusions and outlooks

Folksonomies are the way people is building the
information and communication systems of our
future.
Folksonomies are a laboratory to study
human/social/semiotic dynamics.
A Folksonomy is a growing tri-partite network,
whose nodes are users, resources and metadata
(tags), while (hyper)links are annotation events
(note that this structure is similar to search
queries
user, search string, resource retrieved).
Folksonomies statistical structure reveals many
complex features, typical of interacting humans.
Projections of folksonomy on different spaces can
be useful to study
spam infection
semantic of tags
emerging resource classification.