CMU TDT Report TIDES PI Meeting 2002 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

CMU TDT Report TIDES PI Meeting 2002

Description:

Eager auto-segmentation a problem (misses) Recommendations for TDT labeling ... 261,209 transcripts for news articles from ABC, CNN, NPR and MSNBC in the period ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 19
Provided by: cjin
Category:
Tags: cmu | tdt | tides | auto | meeting | news | report

less

Transcript and Presenter's Notes

Title: CMU TDT Report TIDES PI Meeting 2002


1
CMU TDT Report TIDES PI Meeting 2002
  • The CMU TDT Team
  • Jaime Carbonell, Yiming Yang, Ralf Brown, Jian
    Zhang, Nianli Ma, Chun Jin
  • Language Technologies Institute, CMU

2
Time Line for TDT Activities
  • ReStarted TDT Summer 2001
  • Tasks FSD, SLD, Detection
  • New Techniques Nov 2001 Present
  • Topic-conditional Novelty (FSD)
  • Situated NEs (all tasks)
  • Source-conditional interpolated training (SLD)
  • Evaluations
  • TDT Oct 2001, July 2002
  • New FSD (internal) July 2002 (KDD Conference)

3
2002 Dry Run Results DET
1 Using our Mandarin to English EBMT, and
replace our boundary with systrans boundary. 2
Using our Dictionary-Based Arabic to English
translation, and with our own boundaries.
So the boundaries of evaluation and our results
are mismatching. 3 Using our Dictionary-Based
Arabic to English translation, and replace our
boundary with systrans boundary.
4
Baseline FSD Method
  • (Unconditional) Dissimilarity with Past
  • Decision threshold on most-similar story
  • (Linear) temporal decay
  • Length-filter (for teasers)
  • Cosine similarity with standard weights

5
2002 Dry Run Results FSD
6
2002 Dry Run DET CMU-FSD
7
FSD Observations
  • Cross-site comparable baselines (cost .7)
  • Events-vs-Topics issue (e.g. Asia crisis)
  • A few mislabled stories wreak havoc for FSD
  • Eager auto-segmentation a problem (misses)
  • Recommendations for TDT labeling
  • FSD on true events, or events within topic(s)
  • Change auto-segmentation optimality criterion ??
  • Recommendations for TDT reserachers
  • Keep working hard on FSD not cracked yet

8
New FSD Directions
  • Topic-conditional models
  • E.g. airplane, investigation, FAA, FBI,
    casualties, ? topic, not event
  • TWA 800, March 12, 1997 ? event
  • First categorize into topic, then use
    maximally-discriminative terms within topic
  • Rely on situated named entities
  • E.g. Arcan as victim, Sharon as peacemaker

9
Broad Topics vs Events
10
Two-level Scheme for FSD
11
Confusability between Intra-topic Events
  • AIRPLANE ACCIDENTS
    BOMBINGS
  • Each data point in the matrix is the similarity
    between the two corresponding documents.
  • Documents are sorted by event as the first key
    and by the time of arrival as second key, so the
    diagonal sub-matrices are intra-event document
    similarities, while the off-diagonal sub-matrices
    are inter-event document similarities.

12
Measuring Effectiveness of NEs
1 f means a Named Entity Sk the Kth type of
Named Entities among seven types of NEs. 2 We
use the effectiveness of each type of NEs to
measure how well they can differentiate
intra-topic events.
13
Effectiveness of Named Entities
14
Experimental Design
  • Baseline conventional FSD
  • Simple case two-level FSD with perfect topic
    labels
  • Ideal case two-level FSD with perfect topic
    labels, weighted NE and removing topic-specific
    stop words
  • Real case the same as Ideal Case except using
    system-predicted topic labels

15
Data Description
  • Broadcast News published by Primary Source
    Media,
  • 261,209 transcripts for news articles from ABC,
    CNN, NPR and MSNBC in the period from 1992 to
    1998.
  • Document Structure each document (story) is
    composed of several fields, such as Title, Topic,
    Keywords, Date, Abstract and Body.
  • (Training) topic labels provided by PSM (4
    topics)
  • Airplane accidents, bombings, tornados,
    hijackings
  • CMU students labeled 36 events within 4 topics
    (divided into 50 training and 50 test)

16
Results for Topic-Conditioned FSD
17
Confusability Reduction (5 events within topic
airplane accident in test data)
  • NOTE
  • These graphs only contains test data (5 events
    for topic airplane accidents)
  • The left graph is the Baseline, and the right one
    is the Ideal Case.

18
Topic-Conditioned Approach to First Story
Detection for TDT
Write a Comment
User Comments (0)
About PowerShow.com