Diversity in search: what, how, and what for? - PowerPoint PPT Presentation

About This Presentation
Title:

Diversity in search: what, how, and what for?

Description:

Title: Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Author: Bettina Berendt – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 41
Provided by: Bettina94
Category:

less

Transcript and Presenter's Notes

Title: Diversity in search: what, how, and what for?


1
Diversity in search what, how, and
what for?
  • Bettina Berendt
  • Dept. Computer Science,
  • KU Leuven

2
Thanks to
  • Sebastian Kolbe-Nusser
  • Anett Kralisch
  • Siegfried Nijssen
  • Ilija Subašic
  • Mathias Verbeke
  • Hugo Zaragoza
  • ...

3
Diversity in natural language
  • diverse (s2), various
  • distinctly dissimilar or unlike
  • ..., diversity (s1), ..., variety
  • noticeable heterogeneity
  • (Wordnet)
  • the fact that members of a set are different
    from one another

4
Why is diversity interesting for search?
  • People like to see a range of different,
    non-redundant things/views/etc.
  • Different people search differently.
  • ? How?
  • ? When / under what conditions?
  • ? (What) can we do?

5
What is diverse?
  • Documents
  • the relevance of a document must be determined
    considering the documents appearing before it
    (Goffman, 1964)
  • E.g. MMR (Carbonell Goldstein, 1998)
  • Many further developments, e.g. for images
  • Presentation choices, e.g. re-ranking or
    clustering?

6
What is diverse?
  • Documents
  • People
  • The term diversity is a form of euphemistic
    shorthand to describe differences in racial or
    ethnic classifications, age, gender, religion,
    philosophy, physical abilities, socioeconomic
    background, sexual orientation, gender identity,
    intelligence, mental health, physical health,
    genetic attributes, behavior, attractiveness,
    place of origin, cultural values, or political
    view as well as other identifying features.
  • http//en.wikipedia.org/wiki/Diversity_(politics)

7
What is diverse?
  • Documents
  • People
  • Knowledge and its articulations
  • ( documents in a wider sense?!)
  • Knowledge and its articulations are strongly
    influenced by diversity in, e.g., cultural
    backgrounds, schools of thought, geographical
    contexts.
  • LivingKnowledge will study the effect of
    diversity and time on opinions and bias.
  • The goal is to improve navigation and search
    in very large multimodal datasets (e.g., the Web
    itself).

8
How we got here
The impact of language and culture on Web usage behaviour
Diversity of users
9
How we got here
The impact of language and culture on Web usage behaviour
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search
Diversity of users Diversity of documents
10
How we got here
The impact of language and culture on Web usage behaviour
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
Diversity of users Diversity of documents
11
How we got here
The impact of language and culture on Web usage behaviour
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
The impact of language and culture on Web usage behaviour Collaborative re-use of literature search results Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
Diversity of users Diversity of diversity ? Diversity of documents
12
Why this talk?
The impact of language and culture on Web usage behaviour
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
The impact of language and culture on Web usage behaviour Collaborative re-use of literature search results Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
Diversity of users Diversity of diversity ? Diversity of documents
13
Why this talk?
The impact of language and culture on Web usage behaviour
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search
The impact of language and culture on Web usage behaviour Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
The impact of language and culture on Web usage behaviour Collaborative re-use of literature search results Tools for sense-making in literature search PORPOISE, STORIES tools for graphical news summa-rization and understanding
e.g. Information Retrieval J. 2009 Proceedings Living Web WS_at_ISWC 2009 Inf. Processing Management 2010 e.g. Knowledge and Information Systems J. 2009
Towards an integrated understanding of diversity
14
The impact of linguistic diversity on Web usage
and thereby on the Web
  • Or
  • Why are non-English languages under-represented
    on the Web?
  • A web-analysis approach asking for underlying
  • cognitive-linguistic
  • behavioural
  • attitude
  • factors

15
A simple expectation of how much content exists
in which language
16
But Dynamics of content creation, link setting,
link following, attitudes, and use
17
But Dynamics of content creation, link setting,
link following, attitudes, and use
People create less content
People link less to content
People use links less
People think the content is bad ... and use it
less
18
But Dynamics of content creation, link setting,
link following, attitudes, and use
? Under-representation !
19
Underlying data and methods
  • Database of countries and official languages
  • Distribution comparisons between
  • worldwide proportions of native speakers of
    different languages
  • worldwide distribution of servers registered by
    country
  • crawler analysis of links to a multilingual site
    S
  • log analysis assigning each session a native
    language
  • log analysis of
  • (user native language) (S-entry-page language)
  • Questionnaire/TAM analysis of native and
    non-native users of S
  • usability, ease of use, competence in English,
    beliefs about availability of content in native
    language

20
Some questions
  • Does one find such dynamics also in search
    engines?
  • What factors stop or reverse such
    language-marginalisation trends?
  • Critical mass?
  • Laws?
  • Volunteers?
  • Did / can Web 2.0/3.0 change this?
  • (When) is it better to work without pre-defined
    labels for users?

21
? Part 2 An approach that ...
  • Does one find such dynamics also in search
    engines?
  • What factors stop or reverse such
    language-marginalisation trends?
  • Critical mass?
  • Laws?
  • Volunteers?
  • Did / can Web 2.0/3.0 change this?
  • (When) is it better to work without pre-defined
    labels for users?

22
Motivation (1) Diversity of people is ...
  • Speaking different languages (etc.) ?
    localisation / internationalisation
  • Having different abilities ? accessibility
  • Liking different things ? collaborative filtering
  • Structuring the world in different ways ? ?

23
Motivation (2) Diversity-aware applications ...
  • Must have a (formal) notion of diversity
  • Can follow a
  • personalization approach
  • ? adapt to the users value on the diversity
    variable(s)
  • ? transparently? Is this paternalistic?
  • customization approach
  • ? show the space of diversity
  • ? allow choice / raise awareness / semi-automatic!

24
Measuring grouping diversity
  • Diversity 1 similarity 1 - Normalized
    mutual information

By colour
NMI 0
NMI 0.35
25
Measuring user diversity
  • How similarly do two users group documents?
  • For each query q, consider their groupings gr
  • How similarly do two users group documents?
  • For each query q, consider their groupings gr
  • For various queries aggregate

26
... and now the application domain
... thats only the 1st step!
27
Workflow
  • Query
  • Automatic clustering
  • Manual regrouping
  • Re-use
  • Learn present way(s) of grouping
  • Transfer the constructed concepts

28
Concepts
  • Extension
  • the instances in a group
  • Intension
  • Ideally squares vs. circles
  • Pragmatically defined via a classifier

29
Step 1 Retrieve
  • CiteseerX via OAI
  • Output set of
  • document IDs,
  • document details
  • their texts

30
Step 2 Cluster
  • the classic bibliometric solution
  • CiteseerCluster
  • Similarity measure co-citation, bibliometric
    coupling, word or LSA similarity, combinations
  • Clustering algorithm k-means, hierarchical
  • Damilicious phrases ? Lingo
  • How to choose the best?
  • Experiments Lingo better than k-means at
    reconstruction and extension-over-time

31
Step 3 (a) Re-organise work on document groups
32
Step 3 (b) Visualising document groups
33
Steps 45 Re-use
  • Basic idea
  • learn a classifier from the final grouping (Lingo
    phrases)
  • apply the classifier to a new search result
  • ? re-use semantics
  • Whose grouping?
  • Ones own
  • Somebody elses
  • Which search result?
  • the same (same query, structuring by somebody
    else)
  • More of the same (same query, later time ?
    more doc.s)
  • related (... Measured how? ...)
  • arbitrary

34
Visualising user diversity (1)
  • Simulated users with different strategies
  • U0 did not change anything (System)
  • U1 tried produce a better fit of the document
    groups to the cluster intensions 5 regroupings
  • U2 attempted to move everything that did not fit
    well into the remainder group Other topics,
    better fit 10 regroupings
  • U3 attempted to move everything from Other
    topics into matching real groups 5 regroupings
  • U4 regrouping by author and institution 5
    regroupings
  • ? 55 matrix of diversities gdiv(A,B,q)
  • ? multidimensional scaling

35
Visualising user diversity (2)
Web mining
  • aggregated
  • using gdiv(A,B)

36
Evaluating the application
  • Clustering only Does it generate meaningful
    document groups?
  • yes (tradition in bibliometrics) but data?
  • Small expert evaluation of CiteseerCluster
  • Clustering regrouping
  • End-user experiment with CiteseerCluster
  • 5-person formative user study of Damilicious

37
The Damilicious tool Summary and (some) open
questions
  • A tool that helps users in sense-making,
    exploring diversity, and re-using semantics
  • diversity measures when queries and result sets
    are different?
  • how to best present of diversity?
  • How to integrate into an environment supporting
    user and community contexts?
  • Incentives to use the functionalities?
  • how to find the best balance between similarity
    and diversity?
  • which measures of grouping diversity are most
    meaningful?
  • Extensional?
  • Intensional? Structure-based? Hybrid? (cf.
    ontology matching)
  • which other sources of user diversity?
  • Diversity and relevance can we learn from
    user-dependent relevance judgements?

38
Some lessons learned (or questions raised?)
  • We need to embrace diversity.
  • We need to take into account
  • The diversity of documents / knowledge
  • The diversity of people
  • The diversity of diversity .
  • We need to be clear about what we mean.
  • We need to ask whether / when striving for
    diversity is in itself A Good Thing.
  • We need to ask whether / when raising awareness
    of diversity is in itself A Good Thing.

Thanks!
39
Diversity in search what, how, and
what for?
  • Bettina Berendt
  • Dept. Computer Science,
  • KU Leuven

40
... and now the application domain
... thats only the 1st step!
Write a Comment
User Comments (0)
About PowerShow.com