Topical search in the Twitter OSN - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Topical search in the Twitter OSN

Description:

Topical search in the Twitter OSN Saptarshi Ghosh Collaborators: Naveen Sharma, Parantapa Bhattacharya, Niloy Ganguly (IITKGP) Muhammad Bilal Zafar, Krishna Gummadi ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 53
Provided by: Sapt5
Category:
Tags: osn | search | slide | topical | twitter

less

Transcript and Presenter's Notes

Title: Topical search in the Twitter OSN


1
Topical search in the Twitter OSN
  • Saptarshi Ghosh

Collaborators Naveen Sharma, Parantapa
Bhattacharya, Niloy Ganguly (IITKGP) Muhammad
Bilal Zafar, Krishna Gummadi (MPI-SWS)
2
Topical search in Twitter
  • Twitter has emerged as an important source of
    information real-time news
  • Search for breaking news and trending topics
  • Topical search
  • Searching for topical experts
  • Searching for information on specific topics
  • Primary requirement Identify topical expertise
    of users

3
Profile of a Twitter user
4
Example tweets
5
Prior approaches to find topic experts
  • Research studies
  • Pal et. al. (WSDM 2011) uses 15 features from
    tweets, network, to identify topical experts
  • Weng et. al. (WSDM 2010) uses ML approach
  • Application systems
  • Twitter Who To Follow (WTF), Wefollow,
  • Methodology not fully public, but reported to
    utilize several features

6
Prior approaches use features extracted from
  • User profiles
  • Screen-name, bio,
  • Tweets posted by a user
  • Hashtags, others retweeting a given user,
  • Social graph of a user
  • Number of followers, PageRank,

7
Problems with prior approaches
  • User profiles screen-name, bio,
  • Bio often does not give meaningful information
  • Tweets posted by a user
  • Tweets mostly contain day-to-day conversation
  • Social graph of a user number of followers,
    PageRank
  • Helps to identify authoritative users, but
  • Does not provide topical information

8
We propose
  • Use a completely different feature to infer
    topics of expertise for an individual Twitter
    user
  • Utilize social annotations
  • How does the Twitter crowd describe a user?
  • Social annotations obtained through Twitter Lists
  • Approach essentially relies on crowdsourcing

9
Twitter Lists
  • Primarily an organizational feature
  • Used to organize the people one is following
  • Create a named list, add an optional List
    description
  • Add related users to the List
  • Tweets posted by these users will be grouped
    together as a separate stream

10
How Lists work ?
11
Using Lists to infer topics for users
  • If U is an expert / authority in a certain topic
  • U likely to be included in several Lists
  • List names / descriptions provide valuable
    semantic cues to the topics of expertise of U

12
Inferring topical attributes of users
13
Dataset
  • Collected Lists of 55 million Twitter users who
    joined before or in 2009
  • 88 million Lists collected in total
  • All studies consider 1.3 million users who are
    included in 10 or more Lists
  • Most List names / descriptions in English, but
    significant fraction also in French, Portuguese,

14
Mining Lists to infer expertise
  • Collect Lists containing a given user U
  • List names / descriptions collected into a topic
    document for the given user
  • Identify Us topics from the document
  • Ignore domain-specific stopwords
  • Identify nouns and adjectives
  • Unify similar words based on edit-distance, e.g.,
    journalists and jornalistas, politicians and
    politicos (not unified by stemming)

15
Mining Lists to infer expertise
  • Unigrams and bigrams considered as topics
  • Extracted from topic document of U
  • Topics for user U
  • Frequencies of the topics in the document

16
Topics inferred from Lists
politics, senator, congress, government,
republicans, Iowa, gop, conservative
politics, senate, government, congress,
democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy, funny,
music, hollywood, pop culture
linux, tech, open, software, libre, gnu,
computer, developer, ubuntu, unix
17
Lists vs. other features
Profile bio
love, daily, people, time, GUI, movie, video,
life, happy, game, cool
Most common words from tweets
Most common words from Lists
celeb, actor, famous, movie, stars, comedy,
music, Hollywood, pop culture
18
Lists vs. other features
Profile bio
Fallon, happy, love, fun, video, song, game,
hope, fjoln, fallonmono
Most common words from tweets
Most common words from Lists
celeb, funny, humor, music, movies, laugh,
comics, television, entertainers
19
Evaluation of inferred topics 1
  • Evaluated through user-survey
  • Evaluator shown top 30 topics for a chosen user
  • Are the inferred attributes (i) accurate, (ii)
    informative?
  • Binary response for both queries
  • More than 93 evaluators judged the topics to be
    both accurate and informative
  • The few negative judgments were a result of
    subjectivity

20
Evaluation of inferred topics 2
  • Comparison with topics identified by Twitter WTF
  • Obtained top 20 WTF results for about 200 queries
    ? 3495 distinct users
  • Topics inferred by us from Lists include
    query-topic for 2916 users (83.4)
  • For the rest
  • Case 1 inferred topics include semantically
    very similar words, but not exact query-word
    (18)
  • Case 2 wrong results by WTF, unrelated to query
    (58)

21
Comparison with Twitter WTF
Case 1
  • Restaurant dineLA for query dining
  • Inferred topics food, restaurant, recipes, los
    angeles
  • Space explorer HubbleHugger77 for query hubble
  • Inferred topics science, tech, space,
    cosmology, nasa
  • Comedian jimmyfallon for query astrophysicist
  • Inferred topics celebs, comedy, humor, actor
  • Web developer ScreenOrigami for query origami
  • Inferred topics webdesign, html, designers

Case 2
22
Who-is-who service
  • Developed a Who-is-Who service for Twitter
  • Shows word-cloud for major topics for a user
  • http//twitter-app.mpi-sws.org/who-is-who/

Inferring Who-is-who in the Twitter Social
Network, WOSN 2012 (Highest rated paper in
workshop)
23
Identifying topical experts
24
Topical experts in Twitter
  • 400 million tweets posted daily
  • Quality of tweets posted by different users vary
    widely
  • News, pointless babble, conversational tweets,
    spam,
  • Challenge to find topical experts
  • Sources of authoritative information on specific
    topics

25
Basic methodology
  • Given a query (topic)
  • Identify experts on the topic using Lists
  • Discussed earlier
  • Rank identified experts w.r.t. expertise on the
    given topic
  • Need a suitable ranking algorithm
  • Commonly used ranking metrics such as number of
    followers, PageRank does not consider topic

26
Ranking experts
  • Two components of ranking user U w.r.t. query Q
    relevance of U to Q, popularity of U
  • Relevance of user to query
  • Cover density ranking between topic document TU
    of user U and Q
  • Cover Density ranking preferred for short queries
  • Popularity of user Number of Lists including the
    user

Topic relevance( TU, Q ) log( Lists including
U )
27
Cognos
  • Search system for topical experts in Twitter
  • Publicly deployed at
  • http//twitter-app.mpi-sws.org/whom-to-follow/

Cognos Crowdsourcing Search for Topic Experts in
Microblogs, ACM International SIGIR Conference
2012
28
Cognos results for politics
29
Cognos results for stem cell
30
Cognos results for earthquake
31
Evaluation of Cognos
  • System evaluated in-the-wild
  • People were asked to try the system and give
    feedback
  • Evaluators were students researchers from the
    home institutes of researchers
  • Advantage lot of varied queries tried
  • Disadvantage subjectivity in relevance judgement

32
User-evaluation of Cognos
33
Sample queries for evaluation
34
Evaluation results
  • Overall 2136 relevance judgments over 55 queries
  • 1680 said relevant (78.7)
  • Large amount of subjectivity in evaluations
  • Same result for same query received both relevant
    and non-relevant judgments
  • E.g., for query cloud computing, Werner Vogels
    got 4 relevant judgments, 6 non-relevant
    judgments

35
Cognos vs Twitter Who-to-follow
  • Evaluator shown top 10 results by both systems
  • Result-sets anonymized
  • Evaluator judges which is better / both good /
    both bad
  • Queries chosen by evaluators themselves
  • 27 distinct queries were asked at least twice
  • In total, asked 93 times
  • Judgment by majority voting

36
(No Transcript)
37
Cognos vs Twitter WTF
  • Cognos judged better on 12 queries
  • Computer science, Linux, mac, Apple, ipad, India,
    internet, windows phone, photography, political
    journalist
  • Twitter WTF judged better on 11 queries
  • Music, Sachin Tendulkar, Anjelina Jolie, Harry
    Potter, metallica, cloud computing, IIT Kharagpur
  • Mostly names of individuals or organizations
  • Tie on 4 queries
  • Microsoft, Dell, Kolkata, Sanskrit as an official
    language

38
Topical content search
39
Challenges in topical content search
  • Services today are limited to keyword search
  • Search for politics ? get only tweets which
    contain the word politics
  • Knowing which keywords to search for, is itself
    an issue
  • Individual tweets are too small to deduce topics
  • Scalability 400M tweets posted per day
  • Tweets may contain spam / rumors / phishing URLs

40
Our approach
  • Look at tweets posted by a selected set of
    topical experts
  • Inferring topic of tweets from tweeters
    expertise
  • Large fraction of tweets posted by experts are
    only about day-to-day conversation
  • Solution If multiple experts on a topic tweet
    about something, it is most likely related to the
    topic

41
Sampling Tweets from Experts
  • We capture all tweets from 585K topical experts
  • Identified through Lists
  • Expertise in a wide variety of topics
  • The experts generate 1.46 million tweets per day
  • 0.268 of all tweets on twitter ? scalable
  • Trustworthiness
  • Experts not likely to post spam / phishing URLs
  • Less chance of rumors in what is posted by
    several experts

42
Methodology at a Glance
  • Gather tweets from experts on given topic
  • Group tweets on the same news-story
  • We use a group of hashtags to represent a
    news-story
  • Multi-level clustering (cluster news-story)
  • Cluster tweets based on the hashtags they contain
  • Cluster hashtags based on co-occurrence
  • Rank new-stories by popularity
  • Number of distinct experts tweeting on the story
  • Number of tweets on the story

43
Results for the last week on Politics (a popular
topic)
44
Hashtags which co-occur frequently grouped
together
Related tweets grouped together by common
hashtags.
The most popular tweet in the story shown
45
Our system specially excels for niche topics.
46
Evaluation Relevance
  • Evaluated using human feedback
  • Used Amazon Mechanical Turk for user evaluation
  • Evaluated top 10 clusters for 20 topics
  • Users have to judge if the tweet shown was
    relevant to the given topic
  • Options are Relevant / Not Relevant / Cant Say

47
Evaluating Tweet Relevance
  • We obtained 3150 judgments
  • 80 of tweets marked relevant by majority
    judgment
  • Non-relevant results primarily due to
  • Global events that were discussed by experts
    across all topics, e.g., Hurricane Sandy in the
    USA
  • Sometimes, topic is too specific and several
    experts tweet on a broader topic (e.g., baseball
    and ESPN Sports Update)

48
Effect of global events
  • Experts on all topics tweeting on sandy
  • Most of these got negative judgments

49
Diversity of topics in Twitter
50
Topics in Twitter
  • Discovering thousands of experts on diverse
    topics ? characterizing the Twitter platform as a
    whole
  • On what topics is expert content available in
    Twitter?
  • Popular view few topics such as politics,
    sports, music, celebs,
  • We find lots of niche topics along with the
    popular ones

51
Topics in Twitter major topics to niche ones
what Twitter is mostly known for
wide variety of niche topics
52
Thank You
  • Contact sghosh_at_cs.becs.ac.in
Write a Comment
User Comments (0)
About PowerShow.com