Topical search in Twitter - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Topical search in Twitter

Description:

... senate, government, congress, democrats, Missouri, progressive, women celebs, actors, famous, movies, comedy, funny, music, hollywood, pop culture Lists vs ... – PowerPoint PPT presentation

Number of Views:153

Avg rating:3.0/5.0

Slides: 60

Provided by: Saptars6

Category:

more less

Transcript and Presenter's Notes

Title: Topical search in Twitter

1
Topical search in Twitter

Complex Network Research Group
Department of CSE, IIT Kharagpur

2
Topical search on Twitter

Twitter has emerged as an important source of
information real-time news
Most common search in Twitter search for
trending topics and breaking news
Topical search
Identifying topical attributes / expertise of
users
Searching for topical experts
Searching for information on specific topics

3
Prior approaches to find topic experts

Research studies
Pal et. al. (WSDM 2011) uses 15 features from
tweets, network, to identify topical experts
Weng et. al. (WSDM 2010) uses ML approach
Application systems
Twitter Who To Follow (WTF), Wefollow,
Methodology not fully public, but reported to
utilize several features

4
Prior approaches use features extracted from

User profiles
Screen-name, bio,
Tweets posted by a user
Hashtags, others retweeting a given user,
Social graph of a user
followers, PageRank,

5
Problems with prior approaches

User profiles screen-name, bio,
Bio often does not give meaningful information
Information in users profiles mostly unvetted
Tweets posted by a user
Tweets mostly contain day-to-day conversation
Social graph of a user followers, PageRank
Does not provide topical information

6
We propose

Use a different way to infer topics of expertise
for an individual Twitter user
Utilize social annotations
How does the Twitter crowd describe a user?
Social annotations obtained through Twitter Lists
Approach essentially relies on crowdsourcing

7
Twitter Lists

A feature used to organize the people one is
following on Twitter
Create a named list, add an optional List
description
Add related users to the List
Tweets posted by these users will be grouped
together as a separate stream

8
How Lists work ?
9
Using Lists to infer topics for users

If U is an expert / authority in a certain topic
U likely to be included in several Lists
List names / descriptions provide valuable
semantic cues to the topics of expertise of U

10
Dataset

Collected Lists of 55 million Twitter users who
joined before or in 2009
88 million Lists collected in total
All studies consider 1.3 million users who are
included in 10 or more Lists
Most List names / descriptions in English, but
significant fraction also in French, Portuguese,

11
Inferring topical attributes of users
12
Mining Lists to infer expertise

Collect Lists containing a given user U
List names / descriptions collected into a
document for the given user
Identify Us topics from the document
Handle CamelCase words, case-folding
Ignore domain-specific stopwords
Identify nouns and adjective
Unify similar words based on edit-distance, e.g.,
journalists and jornalistas, politicians and
politicos (not unified by stemming)

13
Mining Lists to infer expertise

Unigrams and bigrams considered as topics
Result Topics for U along with their frequencies
in the document

14
Topics inferred from Lists
politics, senator, congress, government,
republicans, Iowa, gop, conservative
politics, senate, government, congress,
democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy, funny,
music, hollywood, pop culture
linux, tech, open, software, libre, gnu,
computer, developer, ubuntu, unix
15
Lists vs. other features
Profile bio
love, daily, people, time, GUI, movie, video,
life, happy, game, cool
Most common words from tweets
Most common words from Lists
celeb, actor, famous, movie, stars, comedy,
music, Hollywood, pop culture
16
Lists vs. other features
Profile bio
Fallon, happy, love, fun, video, song, game,
hope, fjoln, fallonmono
Most common words from tweets
Most common words from Lists
celeb, funny, humor, music, movies, laugh,
comics, television, entertainers
17
Who-is-who service

Developed a Who-is-Who service for Twitter
Shows word-cloud for major topics for a user
http//twitter-app.mpi-sws.org/who-is-who/

Inferring Who-is-who in the Twitter Social
Network, WOSN 2012 (Highest rated paper in
workshop)
18
Identifying topical experts
19
Topical experts in Twitter

400 million tweets posted daily
Quality of tweets posted by different users vary
widely
News, pointless babble, conversational tweets,
spam,
Challenge to find topical experts
Sources of authoritative information on specific
topics

20
Basic methodology

Given a query (topic)
Identify experts on the topic using Lists
Discussed earlier
Rank identified experts w.r.t. given topic
Need ranking algorithm
Additional challenge keeping the system
up-to-date in face of thousands of users joining
Twitter daily

21
Ranking experts

Used a ranking scheme solely based on Lists
Two components of ranking user U w.r.t. query Q
Relevance of user to query cover density
ranking between topic document TU of user and Q
Popularity of user number of Lists including
the user
Cover Density ranking preferred for short queries

Topic relevance( TU, Q ) log( Lists including
U )
22
Cognos

Search system for topical experts in Twitter
Publicly deployed at
http//twitter-app.mpi-sws.org/whom-to-follow/

Cognos Crowdsourcing Search for Topic Experts in
Microblogs, ACM SIGIR 2012
23
Cognos results for politics
24
Cognos results for stem cell
25
Evaluation of Cognos - 1

Competes favorably with prior research attempts
to identify topical experts (Pal et al. WSDM
2011)

26
Evaluation of Cognos 2

Cognos compared with Twitter WTF
Evaluator shown top 10 results by both systems
Result-sets anonymized
Evaluator judges which is better / both good /
both bad
Queries chosen by evaluators themselves
27 distinct queries were asked at least twice
In total, asked 93 times
Judgment by majority voting

27
(No Transcript)
28
Cognos vs Twitter WTF

Cognos judged better on 12 queries
Computer science, Linux, mac, Apple, ipad, India,
internet, windows phone, photography, political
journalist
Twitter WTF judged better on 11 queries
Music, Sachin Tendulkar, Anjelina Jolie, Harry
Potter, metallica, cloud computing, IIT Kharagpur
Mostly names of individuals or organizations
Tie on 4 queries
Microsoft, Dell, Kolkata, Sanskrit as an official
language

29
Cognos vs Twitter WTF

Low overlap between top 10 results
In spite of same topic being inferred for 83
experts
Major differences are due to List-based ranking
Top Twitter WTF results mostly business
accounts
Top Cognos results mostly personal accounts

30
(No Transcript)
31
Keeping system up-to-date

Any search / recommendation system on OSN
platform needs to be kept up-to-date
Thousands of new users join every day
Need efficient way of discovering topical experts
Can brute force approach be used?
Periodically crawl data (profile, Lists) of all
users

32
Scalability problem

200 million new users joined Twitter during 9
months in 2011 ? 740K new users join daily
Lower-bound estimate 1480K API calls per day
required to crawl their profiles and Lists
Twitter allows only 3.6K API calls per day per IP
480K API calls per day from whitelisted IP
Plus, 465 million users already

33
How many experts in Twitter?

Only 1 listed 10 or more times
Only 0.12 listed 100 or more times
If experts can be identified efficiently,
possible to crawl their Lists

34
Identifying experts efficiently

Hubs users who follow many experts and add them
to Lists
Identified top hubs in social network using HITS
Crawled Lists created by top 1 million hubs
Top 1M hubs listed 4.1M users
2.06M users included in 10 or more Lists (50)
Discovered 65 of the estimated number of experts
listed 100 or more times

35
Identifying experts efficiently

More than 42 of the users listed by top hubs
have joined Twitter after 2009
Discovered several popular experts who joined
within the duration of the crawl
All experts reported by Pal et. al. discovered
Discovered all Twitter WTF top 20 results for 50
of the queries, 15 or more for 80 of the queries

36
Topical search in Twitter
37
Looking for Tweets by Topic

Services today are limited to keyword search
Knowing which keywords to search for, is itself
an issue
Keyword search is not context aware
Tweets are too small to deduce topics
Topic analysis of 400M tweets/day is a challenge

38
Challenges

Some tweets are more important than others
Millions of tweets are posted on popular topics
Only some are relevant to the context intended
Tweets may contain wrong or misleading info
Twitter has a large population of spammers
Twitter is also a potent source of rumors
Some tweets are outright malicious

39
Our Approach to the Issues

Scalability
We only look at tweets from as small subset of
users who are experts on different topics
Topic deduction
We map user expertise topics, to tweets/hashtags,
instead of the other way round
Trustworthiness
Our source of tweets is a small subset of users
It is practical to vet their expertise and
reputation

40
Advantages of list-based methodology

600K experts on 36K distinct topics

41
TopicalDiversityofExpertSample
CSCW14
42
PopularTopics
43
NicheTopics
44
Challenges in Used Approach

We assign topics to tweets/hashtags
Inferring tweet topics from tweeter expertise
Experts can have multiple topics of expertise
Experts do tweet about topics beyond their
expertise
Solution If multiple experts on a subject tweet
about something, it is most likely related to the
topic.

45
Sampling Tweets from Experts

We capture all tweets from 585K topical experts
This is a set we obtained from our previous study
This about 0.1 of the whole Twitter population
The experts generate 1.46 million tweets/per day
This is 0.268 of all tweets on twitter
Expertise in diverse topics (36K)
Our topics of expertise is crowd sourced
We will have more topics as more users show
interests

46
Methodology at a Glance

Given a topic, we gather tweets from experts
We use hashtags to represent subjects
Clustering Tweets by similar hashtags
A cluster represents information on related
subjects
Ranking clusters by popularity
Number of unique experts tweeting on the subject
Number of unique tweets on the subject
Ranking tweets by authority
Tweets from highest ranked user is shown first

47
What-is-happening on Twitter

twitter-app.mpi-sws.org/what-is-happening/

Topical search in Microblogs with Cognoscenti,
Or The Wisdom of Crowdsourced Experts,
48
Results for the last week on Politics (a popular
topic)
49
Related tweets are grouped together by common
hashtags.
Number of experts tweeting on the subject and the
number of tweets on the subject decides ranking.
The most popular tweet from the
most authoritative user represents the group.
50
Our system specially excels for niche topics.
51
Evaluation Relevance

We used Amazon Mechanical Turk for user
evaluation
We chose to evaluate 20 topics
We picked top 10 tweets and hashtags
We picked results for all 3 time groups
Users have to judge if the tweet/hashtag was
relevant to the given topic
Options are Relevant/Not Relevant/Cant Say
We chose master workers only
Every tweet/hashtag was evaluated by at least 4
users

52
Evaluating Tweet Relevance

We obtained 3150 judgments
76 of which were Relevant
22 Not Relevant, 2 Cant Say
80 of the Tweets were marked relevant by
majority judgment

53
Dissecting Negative Judgments

Iphone was the topic which received most negative
results
Experts on Iphone were generally tweeting on the
overall topic (such as androids, tablets, )
Last week time group had most positive results
Scarcity of information led to bad ranking

54
Evaluating Hashtag Relevance

Total 3200 judgments
62.3 were Relevant
Much less than tweets (76 were marked relevant)
Relevance of hashtags is very context sensitive

55
Perspectival relevance

The generic hashtag sandy is very relevant to
the topics in context of the tweet.
These got negative judgments when shown without
the tweets.

56
Generic Hashtags

Some hashtags are generic, but our service brings
our their specificity with respect to the topic.
These hashtags received negative judgments when
shown without the context of the tweet.

57
Summary