Study Goal - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Study Goal

Description:

Better performing algorithms produce more effective labels. If high performance labeling algorithms can be identified, the effectiveness of ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 29

Provided by: comminfo

Category:

Tags: goal | study

more less

Transcript and Presenter's Notes

Title: Study Goal

1
(No Transcript)
2
Study Goal

Measure the effect of different labels to
represent clusters of news documents on users
browsing collections of news stories.

Michael Cole Measuring label effect 3.22.05
3
Practical Benefit

The practical benefit of this study is to
validate an instrument to rank the performance of
automated labeling algorithms.
Better performing algorithms produce more
effective labels. If high performance labeling
algorithms can be identified, the effectiveness
of browsers for very large document collections
may be significantly improved.

Michael Cole Learning cluster labels 12.11.04
4
Basic Approach

Vary the labels while the other representational
properties are held constant

Michael Cole Learning cluster labels 12.11.04
5
Interface
Michael Cole Learning cluster labels 12.11.04
6
Michael Cole Learning cluster labels 12.11.04
7
Labeling is a Representation Problem
Michael Cole Learning cluster labels 12.11.04
8
Theory

The user recognizes some semantic similarity
between the label and a description of their task
(Polson Lewis 1990)
Here the task understanding includes the user's
general knowledge of words that are germane to
the interest that drives the browsing experience.

Michael Cole Learning cluster labels 12.11.04
9
Tasks

Difficult Problem
Browsing tasks are all quite similar in that they
rely on a user interest in a topic area. So
browsing for information on sports is a similar
task to browsing for business information.
Yet, the familiarity of the user with the topic
is likely to have an impact on the number of
words that might be recognized as semantically
similar

Michael Cole Learning cluster labels 12.11.04
10
Tasks
The goal, however, to find good labeling
algorithms over all, or many classes, of users,
so familiarity is ignored in evaluating the main
results of this study. A pretest questionnaire
concerning general news familiarity is
administered as it may be useful in interpreting
results later.
Michael Cole Learning cluster labels 12.11.04
11
Learning Effects
Since only the labels are varied, there is the
probability that a subject will remember the
labels from the previous treatment. To mitigate
this learning effect a Latin Square Design is
used to assign the treatments (Tague-Sutcliffe,
1997).
Michael Cole Learning cluster labels 12.11.04
12
Measurement

The measurement of effective labels is the
selection of clusters that contain in aggregate,
the highest percentage of relevant documents that
could be obtained given the test procedure
(selecting three clusters)

Michael Cole Learning cluster labels 12.11.04
13
Linear Model

A linear model is adopted for this study. The
rationale is based on the assumption that
scanning the interface and selecting cluster
representations are independent of one another.
There is no sequential process that takes place.

Michael Cole Learning cluster labels 12.11.04
14
Cluster Labeling

Not much previous work
All of the work has concentrated on creating
labels strictly from the cluster contents.
Lin, Chen, and Nunamaker (1999)
Schweighofer, Raubner, and Dittenbach (2001)
Popescul and Ungar (2000)

Michael Cole Learning cluster labels 12.11.04
15
Corpus

NIST TDT-3 collection
gt 37,000 English documents from six news sources,
including newswires, transcriptions of radio and
TV broadcasts.
113 human coded topic clusters with gt 7400
stories
About 8000 documents explicitly coded as negative
examples for some topic.

Michael Cole Learning cluster labels 12.11.04
16
Corpus Cluster Example
0003. Pinochet Trial Seminal Event WHAT
Pinochet, who ruled Chile from 1973-1990, is
arrested on charges of genocide and torture
during his reign. WHO Former Chilean dictator
General Augusto Pinochet Judge Baltasar Garzon
("Superjudge") WHERE Pinochet is arrested and
held in London, then later extradited to Spain.
WHEN The arrest occurs on 10/16/98 court
negotiations last the rest of the year. Topic
Explication Pinochet was arrested in a London
hospital on a warrant issued by Spanish Judge
Baltasar Garzon. Pinochet appealed his arrest
and a London court agreed, but the decision was
overturned by Britain's highest court. After
much legal wrangling over the site of the trial,
the British Courts ruled that Spain should
proceed with the extradition request Pinochet
continues to fight it. On topic stories
covering any angle of the legal process
surrounding this trial (including Pinochet's
initial arrest in October, his appeals, British
Court rulings, reactions of world leaders and
Chilean citizens to the trial, etc.). Stories
about Pinochet's reign or legacy are not on topic
unless they explicitly discuss this trial. Rule
of Interpretation ule 3 Legal/Criminal Cases
Michael Cole Learning cluster labels 12.11.04
17
Popescul Ungar (2000)

Chi-square test for common terms
1. generate a tree with bags of words at each
node,
2. calculate confidence that a term in a root has
a similar frequency in all subtree nodes it has
no distinguishing power
Frequent Predictive words

Michael Cole Learning cluster labels 12.11.04
18
Popescul Ungar results
FP frequent predictive product least/most FP
blend of highest and lowest FP scores
Michael Cole Learning cluster labels 12.11.04
19
Subjects Procedure

Drawn from a convenience sample of undergraduates
and graduates at a large Northeastern university.
Fill out pre-test questionnaire.
Perform assigned interface tasks.
Fill out intermediate questionnaire asking about
experience with interface.
Repeat with another interface and task set
(twice).

Michael Cole Learning cluster labels 12.11.04
20
Measuring the Effect of Labels

Hypothesis 1 Automatically-generated labels will
be associated with reduced usability and
effectiveness compared to 'gold standard' labels.
Hypothesis 2 Different labeling algorithms will
be associated with differences in browsing
effectiveness.

Michael Cole Learning cluster labels 12.11.04
21
Results
Michael Cole Learning cluster labels 12.11.04
22
Results
Michael Cole Learning cluster labels 12.11.04
23
Results
Michael Cole Learning cluster labels 12.11.04
24
Results
Michael Cole Learning cluster labels 12.11.04
25
ANOVA
Michael Cole Learning cluster labels 12.11.04
26
Conclusions
The human-assigned labels that defined the topic
clusters were clearly the most effective
labels. The other two label sets were clearly
distinguishable from the human-assigned labels
and from each other. The instrument is a
promising approach to provide an objective
ranking of the quality of labels and, so,
automated labeling algortihms.
27
References
Popescul, A., and Ungar, L. (2000). Automatic
Labeling of Document Clusters. Unpublished
retrieved September 23, 2004 from
ttp//www.citeseer.com/ Tague-Sutcliffe, Jean.
"The Pragmatics of Information Retrieval
Experimentation, Revisited." In Readings in
Information Retrieval, ed. Karen Sparck Jones and
Peter Willett, 205-216. San Francisco, CA Morgan
Kaufmann, 1997. Originally published in
Information Processing Management 28 (1992)
467-490. Soto,R.(1999) Learning and performing
by exploration label quality measured by latent
semantic analysis, Proceedings of the SIGCHI
conference on Human factors in computing
systemsthe CHI is the limit, p.418-425, May
15-20, 1999, Pittsburgh, Pennsylvania, United
States
28
Tasks

Difficult Problem
Browsing tasks are all quite similar in that they
rely on a user interest in a topic area. So
browsing for information on sports is a similar
task to browsing for business information.
Yet, the familiarity of the user with the topic
is likely to have an impact on the number of
words that might be recognized as semantically
similiar

Michael Cole Learning cluster labels 12.11.04

Write a Comment

User Comments (0)