Topical search in Twitter - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Topical search in Twitter

Description:

... senate, government, congress, democrats, Missouri, progressive, women celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture Lists vs ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 60
Provided by: Saptars6
Category:

less

Transcript and Presenter's Notes

Title: Topical search in Twitter


1
Topical search in Twitter
  • Complex Network Research Group
  • Department of CSE, IIT Kharagpur

2
Topical search on Twitter
  • Twitter has emerged as an important source of
    information real-time news
  • Most common search in Twitter search for
    trending topics and breaking news
  • Topical search
  • Identifying topical attributes / expertise of
    users
  • Searching for topical experts
  • Searching for information on specific topics

3
Prior approaches to find topic experts
  • Research studies
  • Pal et. al. (WSDM 2011) uses 15 features from
    tweets, network, to identify topical experts
  • Weng et. al. (WSDM 2010) uses ML approach
  • Application systems
  • Twitter Who To Follow (WTF), Wefollow,
  • Methodology not fully public, but reported to
    utilize several features

4
Prior approaches use features extracted from
  • User profiles
  • Screen-name, bio,
  • Tweets posted by a user
  • Hashtags, others retweeting a given user,
  • Social graph of a user
  • followers, PageRank,

5
Problems with prior approaches
  • User profiles screen-name, bio,
  • Bio often does not give meaningful information
  • Information in users profiles mostly unvetted
  • Tweets posted by a user
  • Tweets mostly contain day-to-day conversation
  • Social graph of a user followers, PageRank
  • Does not provide topical information

6
We propose
  • Use a different way to infer topics of expertise
    for an individual Twitter user
  • Utilize social annotations
  • How does the Twitter crowd describe a user?
  • Social annotations obtained through Twitter Lists
  • Approach essentially relies on crowdsourcing

7
Twitter Lists
  • A feature used to organize the people one is
    following on Twitter
  • Create a named list, add an optional List
    description
  • Add related users to the List
  • Tweets posted by these users will be grouped
    together as a separate stream

8
How Lists work ?
9
Using Lists to infer topics for users
  • If U is an expert / authority in a certain topic
  • U likely to be included in several Lists
  • List names / descriptions provide valuable
    semantic cues to the topics of expertise of U

10
Dataset
  • Collected Lists of 55 million Twitter users who
    joined before or in 2009
  • 88 million Lists collected in total
  • All studies consider 1.3 million users who are
    included in 10 or more Lists
  • Most List names / descriptions in English, but
    significant fraction also in French, Portuguese,

11
Inferring topical attributes of users
12
Mining Lists to infer expertise
  • Collect Lists containing a given user U
  • List names / descriptions collected into a
    document for the given user
  • Identify Us topics from the document
  • Handle CamelCase words, case-folding
  • Ignore domain-specific stopwords
  • Identify nouns and adjective
  • Unify similar words based on edit-distance, e.g.,
    journalists and jornalistas, politicians and
    politicos (not unified by stemming)

13
Mining Lists to infer expertise
  • Unigrams and bigrams considered as topics
  • Result Topics for U along with their frequencies
    in the document

14
Topics inferred from Lists
politics, senator, congress, government,
republicans, Iowa, gop, conservative
politics, senate, government, congress,
democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy, funny,
music, hollywood, pop culture
linux, tech, open, software, libre, gnu,
computer, developer, ubuntu, unix
15
Lists vs. other features
Profile bio
love, daily, people, time, GUI, movie, video,
life, happy, game, cool
Most common words from tweets
Most common words from Lists
celeb, actor, famous, movie, stars, comedy,
music, Hollywood, pop culture
16
Lists vs. other features
Profile bio
Fallon, happy, love, fun, video, song, game,
hope, fjoln, fallonmono
Most common words from tweets
Most common words from Lists
celeb, funny, humor, music, movies, laugh,
comics, television, entertainers
17
Who-is-who service
  • Developed a Who-is-Who service for Twitter
  • Shows word-cloud for major topics for a user
  • http//twitter-app.mpi-sws.org/who-is-who/

Inferring Who-is-who in the Twitter Social
Network, WOSN 2012 (Highest rated paper in
workshop)
18
Identifying topical experts
19
Topical experts in Twitter
  • 400 million tweets posted daily
  • Quality of tweets posted by different users vary
    widely
  • News, pointless babble, conversational tweets,
    spam,
  • Challenge to find topical experts
  • Sources of authoritative information on specific
    topics

20
Basic methodology
  • Given a query (topic)
  • Identify experts on the topic using Lists
  • Discussed earlier
  • Rank identified experts w.r.t. given topic
  • Need ranking algorithm
  • Additional challenge keeping the system
    up-to-date in face of thousands of users joining
    Twitter daily

21
Ranking experts
  • Used a ranking scheme solely based on Lists
  • Two components of ranking user U w.r.t. query Q
  • Relevance of user to query cover density
    ranking between topic document TU of user and Q
  • Popularity of user number of Lists including
    the user
  • Cover Density ranking preferred for short queries

Topic relevance( TU, Q ) log( Lists including
U )
22
Cognos
  • Search system for topical experts in Twitter
  • Publicly deployed at
  • http//twitter-app.mpi-sws.org/whom-to-follow/

Cognos Crowdsourcing Search for Topic Experts in
Microblogs, ACM SIGIR 2012
23
Cognos results for politics
24
Cognos results for stem cell
25
Evaluation of Cognos - 1
  • Competes favorably with prior research attempts
    to identify topical experts (Pal et al. WSDM
    2011)

26
Evaluation of Cognos 2
  • Cognos compared with Twitter WTF
  • Evaluator shown top 10 results by both systems
  • Result-sets anonymized
  • Evaluator judges which is better / both good /
    both bad
  • Queries chosen by evaluators themselves
  • 27 distinct queries were asked at least twice
  • In total, asked 93 times
  • Judgment by majority voting

27
(No Transcript)
28
Cognos vs Twitter WTF
  • Cognos judged better on 12 queries
  • Computer science, Linux, mac, Apple, ipad, India,
    internet, windows phone, photography, political
    journalist
  • Twitter WTF judged better on 11 queries
  • Music, Sachin Tendulkar, Anjelina Jolie, Harry
    Potter, metallica, cloud computing, IIT Kharagpur
  • Mostly names of individuals or organizations
  • Tie on 4 queries
  • Microsoft, Dell, Kolkata, Sanskrit as an official
    language

29
Cognos vs Twitter WTF
  • Low overlap between top 10 results
  • In spite of same topic being inferred for 83
    experts
  • Major differences are due to List-based ranking
  • Top Twitter WTF results mostly business
    accounts
  • Top Cognos results mostly personal accounts

30
(No Transcript)
31
Keeping system up-to-date
  • Any search / recommendation system on OSN
    platform needs to be kept up-to-date
  • Thousands of new users join every day
  • Need efficient way of discovering topical experts
  • Can brute force approach be used?
  • Periodically crawl data (profile, Lists) of all
    users

32
Scalability problem
  • 200 million new users joined Twitter during 9
    months in 2011 ? 740K new users join daily
  • Lower-bound estimate 1480K API calls per day
    required to crawl their profiles and Lists
  • Twitter allows only 3.6K API calls per day per IP
  • 480K API calls per day from whitelisted IP
  • Plus, 465 million users already

33
How many experts in Twitter?
  • Only 1 listed 10 or more times
  • Only 0.12 listed 100 or more times
  • If experts can be identified efficiently,
    possible to crawl their Lists

34
Identifying experts efficiently
  • Hubs users who follow many experts and add them
    to Lists
  • Identified top hubs in social network using HITS
  • Crawled Lists created by top 1 million hubs
  • Top 1M hubs listed 4.1M users
  • 2.06M users included in 10 or more Lists (50)
  • Discovered 65 of the estimated number of experts
    listed 100 or more times

35
Identifying experts efficiently
  • More than 42 of the users listed by top hubs
    have joined Twitter after 2009
  • Discovered several popular experts who joined
    within the duration of the crawl
  • All experts reported by Pal et. al. discovered
  • Discovered all Twitter WTF top 20 results for 50
    of the queries, 15 or more for 80 of the queries

36
Topical search in Twitter
37
Looking for Tweets by Topic
  • Services today are limited to keyword search
  • Knowing which keywords to search for, is itself
    an issue
  • Keyword search is not context aware
  • Tweets are too small to deduce topics
  • Topic analysis of 400M tweets/day is a challenge

38
Challenges
  • Some tweets are more important than others
  • Millions of tweets are posted on popular topics
  • Only some are relevant to the context intended
  • Tweets may contain wrong or misleading info
  • Twitter has a large population of spammers
  • Twitter is also a potent source of rumors
  • Some tweets are outright malicious

39
Our Approach to the Issues
  • Scalability
  • We only look at tweets from as small subset of
    users who are experts on different topics
  • Topic deduction
  • We map user expertise topics, to tweets/hashtags,
    instead of the other way round
  • Trustworthiness
  • Our source of tweets is a small subset of users
  • It is practical to vet their expertise and
    reputation

40
Advantages of list-based methodology
  • 600K experts on 36K distinct topics

41
TopicalDiversityofExpertSample
CSCW14
42
PopularTopics
43
NicheTopics
44
Challenges in Used Approach
  • We assign topics to tweets/hashtags
  • Inferring tweet topics from tweeter expertise
  • Experts can have multiple topics of expertise
  • Experts do tweet about topics beyond their
    expertise
  • Solution If multiple experts on a subject tweet
    about something, it is most likely related to the
    topic.

45
Sampling Tweets from Experts
  • We capture all tweets from 585K topical experts
  • This is a set we obtained from our previous study
  • This about 0.1 of the whole Twitter population
  • The experts generate 1.46 million tweets/per day
  • This is 0.268 of all tweets on twitter
  • Expertise in diverse topics (36K)
  • Our topics of expertise is crowd sourced
  • We will have more topics as more users show
    interests

46
Methodology at a Glance
  • Given a topic, we gather tweets from experts
  • We use hashtags to represent subjects
  • Clustering Tweets by similar hashtags
  • A cluster represents information on related
    subjects
  • Ranking clusters by popularity
  • Number of unique experts tweeting on the subject
  • Number of unique tweets on the subject
  • Ranking tweets by authority
  • Tweets from highest ranked user is shown first

47
What-is-happening on Twitter
  • twitter-app.mpi-sws.org/what-is-happening/

Topical search in Microblogs with Cognoscenti,
Or The Wisdom of Crowdsourced Experts,
48
Results for the last week on Politics (a popular
topic)
49
Related tweets are grouped together by common
hashtags.
Number of experts tweeting on the subject and the
number of tweets on the subject decides ranking.
The most popular tweet from the
most authoritative user represents the group.
50
Our system specially excels for niche topics.
51
Evaluation Relevance
  • We used Amazon Mechanical Turk for user
    evaluation
  • We chose to evaluate 20 topics
  • We picked top 10 tweets and hashtags
  • We picked results for all 3 time groups
  • Users have to judge if the tweet/hashtag was
    relevant to the given topic
  • Options are Relevant/Not Relevant/Cant Say
  • We chose master workers only
  • Every tweet/hashtag was evaluated by at least 4
    users

52
Evaluating Tweet Relevance
  • We obtained 3150 judgments
  • 76 of which were Relevant
  • 22 Not Relevant, 2 Cant Say
  • 80 of the Tweets were marked relevant by
    majority judgment

53
Dissecting Negative Judgments
  • Iphone was the topic which received most negative
    results
  • Experts on Iphone were generally tweeting on the
    overall topic (such as androids, tablets, )
  • Last week time group had most positive results
  • Scarcity of information led to bad ranking

54
Evaluating Hashtag Relevance
  • Total 3200 judgments
  • 62.3 were Relevant
  • Much less than tweets (76 were marked relevant)
  • Relevance of hashtags is very context sensitive

55
Perspectival relevance
  • The generic hashtag sandy is very relevant to
    the topics in context of the tweet.
  • These got negative judgments when shown without
    the tweets.

56
Generic Hashtags
  • Some hashtags are generic, but our service brings
    our their specificity with respect to the topic.
  • These hashtags received negative judgments when
    shown without the context of the tweet.

57
Summary
  • Simple Core Observation
  • Users curate experts
  • Services
  • who-is who (WOSN12, CCR12)
  • whom-to-follow (SIGIR12)
  • what-is-happening (in-submission)
  • Sample-stream (CIKM13, CSCW14)

58
Complex Network Research Group
59
Thank You
  • Contact niloy_at_cse.iitkgp.ernet.in
  • Complex Network Research Group (CNeRG)
  • CSE, IIT Kharagpur, India
  • http//cse.iitkgp.ac.in/resgrp/cnerg/
Write a Comment
User Comments (0)
About PowerShow.com