Informationssuche in sozialen Netzen - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Informationssuche in sozialen Netzen

Description:

Informationssuche in sozialen Netzen Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 57
Provided by: RalfSc1
Category:

less

Transcript and Presenter's Notes

Title: Informationssuche in sozialen Netzen


1
Informationssuche in sozialen Netzen
  • Ralf Schenkel

Joint work with Tom Crecelius, Mouna Kacimi,
Sebastian Michel, Thomas Neumann, Josiane
Parreira, Marc Spaniol, Gerhard Weikum
2
Social Tagging Networks
  • Definition Social Tagging Network
  • Website where people
  • publish tag information
  • review rate information
  • publish their interests
  • maintain network of friends
  • interact with friends

3
Some Statistics
  • Flickr (as of Nov 2008)
  • 3 billion photos, 3 million new photos per day
  • Facebook (as of Nov 2008)
  • 10 billion photos, 30 million new photos per
    day
  • 120 million active users
  • 150,000 new users per day
  • Myspace (as of Apr 2007)
  • 135 million users (6th largest country on Earth)
  • 2 billion images (150,000 req/s), millions added
    daily
  • 25 million songs
  • 60TB videos
  • StudiVZ.net (as of Nov 2008)
  • 11 million users
  • 300 million images, 1 million added daily

Huge volume of highly dynamic data
4
Showcase librarything.com
5
librarything.com Social Interaction
6
librarything.com Tag Clouds
7
librarything.com Search
Search results independent of the querying
user(and the social context)
8
librarything.com Search
Search automatically expanded with similar
tags(synonyms)
9
Librarything.com Recommendations
Recommendations depend on user and tags(but not
on social context)
10
Librarything.com Recommendations
Explanation for the recommendation
11
Librarything.com Explanations
12
Librarything.com Explanations
13
Outline
  • Search in Social Tagging Networks
  • Graph Model
  • Different Information Needs
  • Effective Query Scoring
  • Efficient Query Evaluation
  • Summary Further Challenges

14
Querying Social Tagging Networks
15
Querying Social Tagging Networks
16
Information Need 1 Globally Popular
harry potter
Most frequently tagged items bestTags by all
users equally important
17
Information Need 2 Similar Users
travel
18
Information Need 2 Similar Users
travel
Tags by users with similar tags/items(brothers
in spirit)more important
19
Information Need 3 Trusted Friends
probability
20
Information Need 3 Trusted Friends
probability
Tags by closely related and well-known users more
important
21
Towards Social-Aware Social Search
  • Search results may depend on
  • Global popularity of items
  • Spiritual context of the querying user(users
    with similar books and/or tags)
  • Social context of the querying user(known and
    trusted friends)

22
Outline
  • Search in Social Tagging Networks
  • Effective Query Scoring
  • Quantifying Friendship Strengths
  • User-specific Scoring Functions
  • Experimental Evaluation
  • Efficient Query Evaluation
  • Summary Further Challenges

23
Notation
  • U set of users
  • T set of tags
  • I set of items
  • tags(u) tags used by user u
  • items(u) items tagged by user u
  • items(t) items tagged with tag t by at least one
    user
  • df(t) number of items tagged with tag t
  • tfu(i,t) number of times user u tagged item i
    with tag t
  • tf(i,t) number of times item i was tagged with
    tag t

24
Quantifying Friendship Strengths
  • Global friendship strength
  • Spiritual friendship strength
  • Social friendship strength
  • Integrated friendship strength

25
Spritual Friendship Strength
overlap in interests of u and u
  • Several alternatives
  • based on overlap of tag usage

harrypotterwizard
deathlyhallows
philosopherstone
u
u
  • based on overlap of tagged items
  • overlap of behavior (tagging, searching, rating,
    )
  • For all
  • Pspirit(u,u)0
  • normalization such that

tags(u) tags used by user u items(u) items
tagged by user u
26
Graph-Based Friendship Strength
distance of u and u in user network
u1
u5
u3
u7
u2
u6
Psocial( ,u)
u4
u2
u
u3
u4
u5
u6
u7
27
Integrated Friendship Strength
  • Query-dependent mixture of
  • spiritual friendship strength
  • social friendship strength
  • background model (global)
  • (0??,??1 ???1)

Pint(u,u)
28
Excursion Scoring in Text Retrieval
General scoring framework
Importance of t in the collection(the less
frequent, the better)
Importance of t for item i(the more frequent,
the better)
29
Towards a User-specific Score
SIGIR 2008
30
Including Tag Expansion
  • Problem Users use different tags for similar
    things
  • ? poor recall (missing relevant results)

ExampleMPI, MPII, MPI-INF, MPI-CS,
Max-Planck-Institut, D5, AG5, DBIS, MMCI, UdS,
Saarland University,
Solution 1. Define notion of similar tags 2.
Expand queries with similar tags 3. Modify
scoring function for expanded queries
31
Heuristics for finding similar tags
  • Co-Occurrence heuristics
  • Tags t1 and t2 similar if they occur (almost)
    always together

32
Scoring Expanded Queries
  • Naive approach
  • For query tag t, add similar tags t with
    sim(t,t)gtd to query

But transportation disaster expanded by train
car bus plane
international crime expanded by mafia camorra
yakuza
Result quality drops due to topic drift
Better auto-tuning incremental expansion For
query tag t, consider only expansion with highest
combined score per item
33
Experimental Evaluation Effectiveness
  • Systematic evaluation of result quality difficult
  • Three possible setups
  • Manual queries human assessments
  • Queriesassessments derived from external info
    (ex DMOZ categories)
  • Automated assessments from context of user
  • Items tagged by friends
  • Items tagged in the future

?
?
?
34
Prototype VLDB/SIGIR 2008 demo
35
Preliminary User Study
  • LibraryThing user study Data Engineering
    Bulletin, June 2008
  • 6 librarything users with reasonably large
    library and friend sets
  • Overall 49 queries like mystery magic,
    wizard, yakuza
  • Crawled (part of) librarything 1,3 mio books,
    15 mio tags, 12,000 users, 18,000 friends
  • Measured NDCG10

? (spiritual)
0.0 0.2 0.5 0.8 1.0
0.0 0.546 0.572 0.568 0.565 0.565
0.2 0.564 0.572 0.579 0.581 -
0.5 0.539 0.552 0.559 - -
0.8 0.515 0.546 - - -
1.0 0.465 - - - -
a (social)
  • Result quality generally very high
  • Combination of spiritual and social friends is
    best

36
Outline
  • Search in Social Tagging Networks
  • Effective Query Scoring
  • Efficient Query Evaluation
  • Threshold Algorithms
  • ContextMerge
  • Experimental Evaluation
  • Summary Further Challenges

37
Algorithmic Overview
  • Input query qt1tn for user u, a, ?
  • Output k items with highest scores
  • Goals
  • Avoid computing all results
  • Minimize disk I/O and CPU load
  • Utilize precomputed information on disk

harry potter
..
38
Excursion Threshold Algorithms for Text IR
  • Input
  • query qt1tn
  • lists L(tp) with pairs lti,score(i,tp)gt, sorted by
    score(i,tp)?
  • Output k items with highest aggregated score
  • Family of Threshold Algorithms
  • scan lists in parallel
  • maintain partial candidate results with score
    bounds
  • terminate as soon as top-k results are stable

39
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
candidates
40
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
0.9
candidates
41
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
min-k
0.9
1.0
candidates
42
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
candidates
43
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
candidates
No more new candidates considered
44
Example Top-1 for 2-term query (NRA)
L1
L2
top-1 item
A 0.9
G 0.3
H 0.3
I 0.25
J 0.2
K 0.2
D 0.15
D 1.0
E 0.7
F 0.7
B 0.65
C 0.6
A 0.3
G 0.2
1.0
min-k
1.3
candidates
Algorithm safely terminates
45
Can we reuse this here?
harry
travel
0.87
0.95
0.82
0.85
0.69
0.51
Number of lists to precompute would
explode!(tags ? users ? parameter space)
46
Revisiting the Social Frequency
Compute sfu(i,t) on the fly from tf(i,t), friends
of u and their tagged documents
47
Top-K in Social Networks ContextMerge
  • Precomputed lists
  • ITEMS(t) pairs lti,tf(i,t)gt, sorted by tf(i,t)?
  • USERITEMS(u,t) pairs lti,tfu(i,t)gt, unsorted
  • FRIENDS(u) pairs ltu,F(u,u)gt, sorted by
    F(u,u)?

ITEMS(harry)
alreadyexist insystems
32
47
26

USERITEMS( , harry)
FRIENDS( )
0.12
0.085
0.10

48
ContextMerge
  • Adapted Threshold Algorithm for query u,t
  • Scan ITEMS(t) and FRIENDS(u) in parallel
  • pick best list
  • If ITEMS(t) read next entry
  • If FRIENDS(u) read USERITEMS(u,t) for next
    friend u
  • Maintain candidates with bounds for min and max
    score and current results

ITEMS(harry)
FRIENDS( )
47
0.12
0.10
32
0.085
26


49
ContextMerge
  • Adapted Threshold Algorithm for query u,t
  • Scan ITEMS(t) and FRIENDS(u) in parallel
  • pick best list
  • If ITEMS(t) read next entry
  • If FRIENDS(u) read USERITEMS(u,t) for next
    friend u
  • Maintain candidates with bounds for min and max
    score and current results

ITEMS(harry)
FRIENDS( )
User-indeppart of sf
47
User-specpart of sf
47
0.12
?
? U
0.10
32
0.085
26


50
ContextMerge
  • Adapted Threshold Algorithm for query u,t
  • Scan ITEMS(t) and FRIENDS(u) in parallel
  • pick best list
  • If ITEMS(t) read next entry
  • If FRIENDS(u) read USERITEMS(u,t) for next
    friend u
  • Maintain candidates with bounds for min and max
    score and current results

ITEMS(harry)
FRIENDS( )
User-indeppart of sf
47
User-specpart of sf
47
0.12
? 0.88U
? U
?
0.10
32
? 47
0.085
? U
26


51
Experimental Evaluation Efficiency
  • Testbed 3 large crawls of real social networks
  • Flickr 10 mio pictures, 50,000 users
  • Del.icio.us 175,000 bookmarks, 12,000 users
  • Librarything 6.5 mio books, 10,000 users
  • Queries
  • 150 frequent tag pairs
  • for each query pick user with enough results
    friends
  • Abstract cost measure ? disk load
  • Baseline full merge sort

52
Experimental Evaluation Efficiency (?0)
a
53
Outline
  • Search in Social Tagging Networks
  • Effective Query Scoring
  • Efficient Query Evaluation
  • Summary Further Challenges

54
Summary
  • Need for social-aware social search, supporting
  • global
  • social
  • spiritual
  • information needs
  • Social scoring
  • integrating global, collection, and social
    context
  • including dynamic tag expansion
  • ContextMerge scalable implementation

55
Further Challenges
  • Meaningful common benchmark
  • Incremental maintenance for high dynamics
  • Extend to ratings, user weights, item weights,
  • Extend to non-tags (like image features)
  • Automatic query parameterization
  • Meaningful explanations of results
  • Exploit dynamics (hot topics, evolving groups,.)

Social-Aware Search Recommendationsat planet
scale
56
Thank you.
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com