Title: Topical search in Twitter
1Topical search in Twitter
- Complex Network Research Group
- Department of CSE, IIT Kharagpur
2Topical search on Twitter
- Twitter has emerged as an important source of
information real-time news - Most common search in Twitter search for
trending topics and breaking news - Topical search
- Identifying topical attributes / expertise of
users - Searching for topical experts
- Searching for information on specific topics
3Prior approaches to find topic experts
- Research studies
- Pal et. al. (WSDM 2011) uses 15 features from
tweets, network, to identify topical experts - Weng et. al. (WSDM 2010) uses ML approach
- Application systems
- Twitter Who To Follow (WTF), Wefollow,
- Methodology not fully public, but reported to
utilize several features
4Prior approaches use features extracted from
- User profiles
- Screen-name, bio,
- Tweets posted by a user
- Hashtags, others retweeting a given user,
- Social graph of a user
- followers, PageRank,
5Problems with prior approaches
- User profiles screen-name, bio,
- Bio often does not give meaningful information
- Information in users profiles mostly unvetted
- Tweets posted by a user
- Tweets mostly contain day-to-day conversation
- Social graph of a user followers, PageRank
- Does not provide topical information
6We propose
- Use a different way to infer topics of expertise
for an individual Twitter user - Utilize social annotations
- How does the Twitter crowd describe a user?
- Social annotations obtained through Twitter Lists
- Approach essentially relies on crowdsourcing
7Twitter Lists
- A feature used to organize the people one is
following on Twitter - Create a named list, add an optional List
description - Add related users to the List
- Tweets posted by these users will be grouped
together as a separate stream
8How Lists work ?
9Using Lists to infer topics for users
- If U is an expert / authority in a certain topic
- U likely to be included in several Lists
- List names / descriptions provide valuable
semantic cues to the topics of expertise of U
10Dataset
- Collected Lists of 55 million Twitter users who
joined before or in 2009 - 88 million Lists collected in total
- All studies consider 1.3 million users who are
included in 10 or more Lists - Most List names / descriptions in English, but
significant fraction also in French, Portuguese,
11Inferring topical attributes of users
12Mining Lists to infer expertise
- Collect Lists containing a given user U
- List names / descriptions collected into a
document for the given user - Identify Us topics from the document
- Handle CamelCase words, case-folding
- Ignore domain-specific stopwords
- Identify nouns and adjective
- Unify similar words based on edit-distance, e.g.,
journalists and jornalistas, politicians and
politicos (not unified by stemming)
13Mining Lists to infer expertise
- Unigrams and bigrams considered as topics
- Result Topics for U along with their frequencies
in the document
14Topics inferred from Lists
politics, senator, congress, government,
republicans, Iowa, gop, conservative
politics, senate, government, congress,
democrats, Missouri, progressive, women
celebs, actors, famous, movies, comedy, funny,
music, hollywood, pop culture
linux, tech, open, software, libre, gnu,
computer, developer, ubuntu, unix
15Lists vs. other features
Profile bio
love, daily, people, time, GUI, movie, video,
life, happy, game, cool
Most common words from tweets
Most common words from Lists
celeb, actor, famous, movie, stars, comedy,
music, Hollywood, pop culture
16Lists vs. other features
Profile bio
Fallon, happy, love, fun, video, song, game,
hope, fjoln, fallonmono
Most common words from tweets
Most common words from Lists
celeb, funny, humor, music, movies, laugh,
comics, television, entertainers
17Who-is-who service
- Developed a Who-is-Who service for Twitter
- Shows word-cloud for major topics for a user
- http//twitter-app.mpi-sws.org/who-is-who/
Inferring Who-is-who in the Twitter Social
Network, WOSN 2012 (Highest rated paper in
workshop)
18Identifying topical experts
19Topical experts in Twitter
- 400 million tweets posted daily
- Quality of tweets posted by different users vary
widely - News, pointless babble, conversational tweets,
spam, - Challenge to find topical experts
- Sources of authoritative information on specific
topics
20Basic methodology
- Given a query (topic)
- Identify experts on the topic using Lists
- Discussed earlier
- Rank identified experts w.r.t. given topic
- Need ranking algorithm
- Additional challenge keeping the system
up-to-date in face of thousands of users joining
Twitter daily
21Ranking experts
- Used a ranking scheme solely based on Lists
- Two components of ranking user U w.r.t. query Q
- Relevance of user to query cover density
ranking between topic document TU of user and Q - Popularity of user number of Lists including
the user - Cover Density ranking preferred for short queries
Topic relevance( TU, Q ) log( Lists including
U )
22Cognos
- Search system for topical experts in Twitter
- Publicly deployed at
- http//twitter-app.mpi-sws.org/whom-to-follow/
Cognos Crowdsourcing Search for Topic Experts in
Microblogs, ACM SIGIR 2012
23Cognos results for politics
24Cognos results for stem cell
25Evaluation of Cognos - 1
- Competes favorably with prior research attempts
to identify topical experts (Pal et al. WSDM
2011)
26Evaluation of Cognos 2
- Cognos compared with Twitter WTF
- Evaluator shown top 10 results by both systems
- Result-sets anonymized
- Evaluator judges which is better / both good /
both bad - Queries chosen by evaluators themselves
- 27 distinct queries were asked at least twice
- In total, asked 93 times
- Judgment by majority voting
27(No Transcript)
28Cognos vs Twitter WTF
- Cognos judged better on 12 queries
- Computer science, Linux, mac, Apple, ipad, India,
internet, windows phone, photography, political
journalist - Twitter WTF judged better on 11 queries
- Music, Sachin Tendulkar, Anjelina Jolie, Harry
Potter, metallica, cloud computing, IIT Kharagpur - Mostly names of individuals or organizations
- Tie on 4 queries
- Microsoft, Dell, Kolkata, Sanskrit as an official
language
29Cognos vs Twitter WTF
- Low overlap between top 10 results
- In spite of same topic being inferred for 83
experts - Major differences are due to List-based ranking
- Top Twitter WTF results mostly business
accounts - Top Cognos results mostly personal accounts
30(No Transcript)
31Keeping system up-to-date
- Any search / recommendation system on OSN
platform needs to be kept up-to-date - Thousands of new users join every day
- Need efficient way of discovering topical experts
- Can brute force approach be used?
- Periodically crawl data (profile, Lists) of all
users
32Scalability problem
- 200 million new users joined Twitter during 9
months in 2011 ? 740K new users join daily - Lower-bound estimate 1480K API calls per day
required to crawl their profiles and Lists - Twitter allows only 3.6K API calls per day per IP
- 480K API calls per day from whitelisted IP
- Plus, 465 million users already
33How many experts in Twitter?
- Only 1 listed 10 or more times
- Only 0.12 listed 100 or more times
- If experts can be identified efficiently,
possible to crawl their Lists
34Identifying experts efficiently
- Hubs users who follow many experts and add them
to Lists - Identified top hubs in social network using HITS
- Crawled Lists created by top 1 million hubs
- Top 1M hubs listed 4.1M users
- 2.06M users included in 10 or more Lists (50)
- Discovered 65 of the estimated number of experts
listed 100 or more times
35Identifying experts efficiently
- More than 42 of the users listed by top hubs
have joined Twitter after 2009 - Discovered several popular experts who joined
within the duration of the crawl - All experts reported by Pal et. al. discovered
- Discovered all Twitter WTF top 20 results for 50
of the queries, 15 or more for 80 of the queries
36Topical search in Twitter
37Looking for Tweets by Topic
- Services today are limited to keyword search
- Knowing which keywords to search for, is itself
an issue - Keyword search is not context aware
- Tweets are too small to deduce topics
- Topic analysis of 400M tweets/day is a challenge
38Challenges
- Some tweets are more important than others
- Millions of tweets are posted on popular topics
- Only some are relevant to the context intended
- Tweets may contain wrong or misleading info
- Twitter has a large population of spammers
- Twitter is also a potent source of rumors
- Some tweets are outright malicious
39Our Approach to the Issues
- Scalability
- We only look at tweets from as small subset of
users who are experts on different topics - Topic deduction
- We map user expertise topics, to tweets/hashtags,
instead of the other way round - Trustworthiness
- Our source of tweets is a small subset of users
- It is practical to vet their expertise and
reputation
40Advantages of list-based methodology
- 600K experts on 36K distinct topics
41TopicalDiversityofExpertSample
CSCW14
42PopularTopics
43NicheTopics
44Challenges in Used Approach
- We assign topics to tweets/hashtags
- Inferring tweet topics from tweeter expertise
- Experts can have multiple topics of expertise
- Experts do tweet about topics beyond their
expertise - Solution If multiple experts on a subject tweet
about something, it is most likely related to the
topic.
45Sampling Tweets from Experts
- We capture all tweets from 585K topical experts
- This is a set we obtained from our previous study
- This about 0.1 of the whole Twitter population
- The experts generate 1.46 million tweets/per day
- This is 0.268 of all tweets on twitter
- Expertise in diverse topics (36K)
- Our topics of expertise is crowd sourced
- We will have more topics as more users show
interests
46Methodology at a Glance
- Given a topic, we gather tweets from experts
- We use hashtags to represent subjects
- Clustering Tweets by similar hashtags
- A cluster represents information on related
subjects - Ranking clusters by popularity
- Number of unique experts tweeting on the subject
- Number of unique tweets on the subject
- Ranking tweets by authority
- Tweets from highest ranked user is shown first
47What-is-happening on Twitter
- twitter-app.mpi-sws.org/what-is-happening/
Topical search in Microblogs with Cognoscenti,
Or The Wisdom of Crowdsourced Experts,
48Results for the last week on Politics (a popular
topic)
49Related tweets are grouped together by common
hashtags.
Number of experts tweeting on the subject and the
number of tweets on the subject decides ranking.
The most popular tweet from the
most authoritative user represents the group.
50Our system specially excels for niche topics.
51Evaluation Relevance
- We used Amazon Mechanical Turk for user
evaluation - We chose to evaluate 20 topics
- We picked top 10 tweets and hashtags
- We picked results for all 3 time groups
- Users have to judge if the tweet/hashtag was
relevant to the given topic - Options are Relevant/Not Relevant/Cant Say
- We chose master workers only
- Every tweet/hashtag was evaluated by at least 4
users
52Evaluating Tweet Relevance
- We obtained 3150 judgments
- 76 of which were Relevant
- 22 Not Relevant, 2 Cant Say
- 80 of the Tweets were marked relevant by
majority judgment
53 Dissecting Negative Judgments
- Iphone was the topic which received most negative
results - Experts on Iphone were generally tweeting on the
overall topic (such as androids, tablets, ) - Last week time group had most positive results
- Scarcity of information led to bad ranking
54Evaluating Hashtag Relevance
- Total 3200 judgments
- 62.3 were Relevant
- Much less than tweets (76 were marked relevant)
- Relevance of hashtags is very context sensitive
55Perspectival relevance
- The generic hashtag sandy is very relevant to
the topics in context of the tweet. - These got negative judgments when shown without
the tweets.
56Generic Hashtags
- Some hashtags are generic, but our service brings
our their specificity with respect to the topic. - These hashtags received negative judgments when
shown without the context of the tweet.
57Summary
- Simple Core Observation
- Users curate experts
- Services
- who-is who (WOSN12, CCR12)
- whom-to-follow (SIGIR12)
- what-is-happening (in-submission)
- Sample-stream (CIKM13, CSCW14)
58Complex Network Research Group
59Thank You
- Contact niloy_at_cse.iitkgp.ernet.in
- Complex Network Research Group (CNeRG)
- CSE, IIT Kharagpur, India
- http//cse.iitkgp.ac.in/resgrp/cnerg/