Title: Network Structure of Folksonomies
1Network Structure of Folksonomies
- Dipartimento di Fisica, Università di Roma "La
Sapienza - Centro Studi e Ricerche "Enrico Fermi"
TAGora Semiotic Dynamics of Online Social
Communities EU-IST-2006-034721
2In collaboration with
Andrea Baldassarri, Ciro Cattuto, Vittorio
Loreto
Miranda Grahl, Andreas Hotho, Christoph
Schmitz, Gerd Stumme
3AGENDA
- Properties of folksonomy hypergraphs
- Network of tag co-occurrences
- Clustering of resources
4a folksonomy example del.icio.us screenshot
5data structure basic units of information
- Post (tags, user, resource)
- TAS tag assignment (tag, user, resource)
(bookmarking, sharing, collaborative,
folksonomy, andreab, http//del.icio.us)
(bookmarking, andreab, http//del.icio.us) (shar
ing, andreab, http//del.icio.us)
(collaborative, andreab, http//del.icio.us) (f
olksonomy, andreab, http//del.icio.us)
6folksonomy hypergraph structure
7data collection
TAGora Project (STREP FP6)
Semiotic Dynamics in Online Social Communities
- del.icio.us
- Work co-ordinated by Uni of Kassel
- Collected data from Nov. 2006.
- Over 667K users
- 19 million resources
- Nearly 2.5 million tags
- 140 million tag assignments
- 50GB of data
www.tagora-project.eu
Consortium University of Roma La
Sapienza SONY CSL Paris University of
Kassel University of Koblenz-Landau University of
Southampton
- flickr
- Work co-ordinated by Uni of Koblenz
- Collected data 21st May 2007.
- 300K users
- 25 million photos
- 1.5 million tags
- over 110 million tag assignments
- bibsonomy
- Complete dataset (june 2007)
- 1385 users
- 37651 tags
- Over 149K resources
8artificial networks permuted and binomial
In the following slides we shall use some
artificial networks defined as
PERMUTED take the original folksonomy and
shuffle all nodes in the same class.
permuted example Resource1 User3 Tag2 Resource2
User2 Tag3 Resource1 User1 Tag1 Resource1 User1
Tag2 Resource1 User2 Tag1 Resource1 User1 Tag4
example Resource1 User1 Tag1 Resource1 User1
Tag2 Resource1 User1 Tag3 Resource1 User2
Tag1 Resource1 User2 Tag2 Resource2 User3 Tag4
We end up with a hypergraph with same degree of
the original one
BINOMIAL same number of hyperedges
endpoints chosen uniformly at random among T, U, R
9average path length (extimated)
moving on hyperedges
10cliquishness
A high resource cliquishness indicates that many
of the users related to that resource assign
overlapping sets of tags to it
Tr3 Ur2 tur3
tur of hyperlinks connected to r Tr
of adjacent tags Ur of adjacent
users
11connectedness / transitivity
12AGENDA
- Properties of folksonomy hypergraphs
- Network of tag co-occurrences
- Clustering of resources
13networks of tag co-occurrence
Tags acquire a stronger semantic context when
they co-occur each other e.g. Roma, holidays,
Italy vs Roma, football, we_won vs Roma,
love, girls etc.
- Tag co-occurrences in posts
- Weighted graph of tags
- Weight number of common posts
- Strength of a tag
- Sum of its edge weights
- Can we study sematics of tags?
- (japan,tokyo more frequent than physics,sex)
- --check with Google!
- Compare statistics with shuffled graphs
14weighted network of tag co-occurrence
Two Tags co-occur if they are present in the same
post
We can say more
Two tags t, t co-occur with weight w if they are
simultaneously present in w posts.
In terms of adjacency Tensors
tensor contraction in flat space
We examine the weighted undirected network
defined by W
15strength cumulative distribution
tag shuffled example Resource1 User1
Tag2 Resource1 User1 Tag3 Resource1 User1
Tag1 Resource1 User2 Tag4 Resource1 User2
Tag1 Resource2 User3 Tag2
The tag reshuffling procedure makes almost no
changes in the P(s) the strength is related to
frequency of tags, not on semantics
16Average neighbour strength
- Examine strength correlation between neighbors
- Positive correlation Assortative mixing
- e.g. Social networks
- Negative correlation Disassortative mixing
- e.g. Technological networks
- Look for spam infection
- Reveal semantics via shuffled graph
17average neighbor strength
Scatter plot
spam
- Tags introduced with spamming, cluster together
- Shuffling the graph changes the measure
- Correlations related to semantics correspond to
a region in the graph
spam
spam
18AGENDA
- Properties of folksonomy hypergraphs
- Network of tag co-occurrences
- Clustering of resources
19clustering and community detection
- Folksonomies complex tripartite networks
- (tag, user, resource)
- Clustering detection can reveal
- sub-set of users (social communities)
- sub-set of tags (semantic frames, jargons)
- sub-set of resources (social classification)
- Other
Now we focus on clustering of resources using
only tag assignments
20resource similarity network
Weighted network How to choose weights?
How to take into account tag frequency?
21tag clouds for resources
- Each resource is characterised by a tag-cloud
- tags are assigned by users, and appear with
different frequency.
22similarity metrics
TF/IDF-like weighting procedure
STATEMENT Resources sharing rare tags
are closely related
23case in study
- Sample of 400 resources
- 200 resources tagged with design
- 200 resources tagged with politics
- Does the similarity network show two clusters?
Finer structure? Subclusters?
24similarity matrix
Broad variability of similarity strengths on
logarithmic scale.
P(w)
A small power (0.1) is used as an effective way
to treat with vanishing weights.
w
TASK Find column and row permutations that
uncover a block structure
25spectral analysis
A. Capocci, V.D.P. Servedio, G. Caldarelli and F.
Colaiori, Physica A 352, 669 (2005). and many
others
Q Eigenvalues
 Laplacian matrix
26cluster identification
Correlation of homologous components reveals
cluster structure.
V2 v2,1, v2,2, ...,v2,n , V3 v3,1, v3,2,
..., v3,n , V4 v4,1, v4,2, ..., v4,n
v2,i v3,i v4,i
2
4
3
1
27cooperative classification
Tag clouds of the six identified clusters of
resources
28Conclusions and outlooks
- Folksonomies are the way people is building the
information and communication systems of our
future. - Folksonomies are a laboratory to study
human/social/semiotic dynamics. - A Folksonomy is a growing tri-partite network,
whose nodes are users, resources and metadata
(tags), while (hyper)links are annotation events - (note that this structure is similar to search
queries - user, search string, resource retrieved).
- Folksonomies statistical structure reveals many
complex features, typical of interacting humans. - Projections of folksonomy on different spaces can
be useful to study - spam infection
- semantic of tags
- emerging resource classification.