Resolution of References to Document Elements in Meeting Dialogues - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Resolution of References to Document Elements in Meeting Dialogues

Description:

Graphics. 4% dessin, photo, image, figure, sch ma. Other elem. 11 ... WITH STATIC DOCUMENTS', ERCIM News No. 62 'Multimedia Informatics', July 2005. ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 23
Provided by: AndreiPop1
Category:

less

Transcript and Presenter's Notes

Title: Resolution of References to Document Elements in Meeting Dialogues


1
Resolution of Referencesto Document Elements in
Meeting Dialogues
  • Andrei Popescu-Belis Denis Lalanne
  • ISSCO/TIM/ETI DIVA/DIUF
  • University of Geneva University of Fribourg
  • (IM)2.MDM (IM)2.DI

2
Purpose of research
  • Context meeting processing, storage and
    retrieval
  • Provide a two-way connection between the meeting
    transcript and the meeting documents
  • What was said about a given document?
  • query retrieve the episodes where a given
    document was discussed
  • Which document was discussed at a specific time?
  • query retrieve documents discussed in the
    episode that is currently viewed
  • include document sub-parts chapters, sections,
    figures
  • Enhanced access to archive, for people who
  • missed a meeting and want to know what happened
  • attended a meeting and want to review details

3
Document/Speech alignment methods
4
Aligned browsing of meeting transcript and
documents
  • Option 1 display all articles discussed in a
    given episode
  • Option 2 hyperlink referring expressions (words)
    and related document elements
  • Option 3 combine 1 and 2

TQB
5
Ref2doc Annotation of Meeting Transcripts
(SDA.XML)
  • ltdialoggt
  • ltchannel id"1" name"Dalila"gt
  • ...
  • lter id"12"gtThe titlelt/ergtsuggests that the
    issue
  • lt/channelgt
  • ...
  • ltref2docgt
  • ...
  • ltref er-id"12"
  • doc-file"LeMonde030404.Logic.xml"
  • doc-id"//Article_at_ID'3'/Title"/gt
  • ...
  • lt/ref2docgt
  • lt/dialoggt

Referring expression (uttered by Dalila)
Document referred to (XML logical structure)
Document element (XPath)
6
Reference-based transcript-to-document alignment
2
3
1
7
Ref2doc annotation data and ground truth
  • DIVA/University of Fribourg
  • press-review corpus (gt25 meetings, 15 each)
  • 22 meetings (6 hours) used here 30 documents
  • Manual processing
  • ground truth annotation for training and
    evaluation
  • transcription, RE annotation (ltregt), ref2doc
    annotation (ltref doc-file doc-id gt), document
    structuring (XML)
  • total of 437 REs
  • Study of inter-annotator agreement
  • 3 annotators on 1/4 of the data
  • before discussion
  • 96 for document assignment (3 errors)
  • 90 for document element assignment (9 errors)
  • after discussion
  • 100 for documents, 97 for document elements

8
Extraction of semantic abstractions from
electronic documents
1
108 1 obj ltlt /Filter /FlateDecode /Length 109 1 R
gtgt stream HWÛnÜÈýþó ½ÐûÎî¼ã½Z8eÔ 5
Ã!G¼ÈvöT7ÉÇAà,¾TWSuªÈS_"RÎä\SÁÓ
ÿÿÒwïyºKäñ³7ltÝåLk "F³ÜKlêÕ4LY!Á¹d\
yïW1c½³ylÜJ endstream endobj
  • PDF preserves layout and content
  • Poorly structured priority is layout
    preservation (PostScript)
  • Goal Reverse engineer document to extract
    content and structures

Content extraction
XML Content, Layout, Logic
Layout structure extraction
TIFF
9
A particular abstractionLogical Structure of
Documents
1
  • Newspaper ? Date, Name, MasterArticle,
    Highlight, Article, Other, Filename
  • MasterArticle ? Title, Subheading?, Summary,
    Author, Source?, Content?, Reference?, Other,
    JournalArticle
  • Article ? Title, Subtitle?, Source?, Content,
    Author, Summary, Reference, Other?
  • JournalArticle ? Title, Source?, Summary,
    Content?,Reference
  • Highlight ? Title, Subtitle, Reference

10
Method for the detection of REs referring to
documents
2
  • Referring expressions (RE)
  • specify the documents and documents elements
    about which the speaker talks
  • Patterns
  • Used to construct a grammar for the detection of
    REs (25 rules with left and right contexts)

11
Results RE detection
  • Grammar based on linguistic analysis
  • R 0.65 P 0.85 f 0.74
  • Observations
  • tried all pronouns refer to documents
  • R 0.71 P 0.52 f 0.60 ?
  • tried all indexicals refer to documents
  • R 0.70 P 0.46 f 0.56 ?
  • however, two indexicals (celui-ci and
    celui-là) always refer to documents added rule
  • Best score
  • R 0.675 P 0.875 f 0.762

12
RE resolution Algorithm based on anaphora
tracking
3
  • Definition
  • Co-reference two REs that refer to the same
    document entity
  • Anaphora an anaphoric RE refers to an
    antecedent RE
  • Method scans meeting transcript linearly
    (regardless of channels/speakers)
  • For each RE update ltcurrent documentgt and
    ltcurrent document elementgt
  • Document assignment
  • if ltcurrent REgt includes newspaper name ? refers
    to that newspaper
  • referent becomes ltcurrent documentgt
  • otherwise (anaphor) ? refers to ltcurrent
    documentgt
  • Document element assignment
  • is ltcurrent REgt anaphoric? (list of typical
    anaphors)
  • yes ? refers to ltcurrent document elementgt
  • no ? match ltcurrent REgt to each candidate
    element
  • use words of RE context, vs. words of document
    weighted match
  • best matching article ? referent of the RE ?
    becomes ltcurrent document elementgt

13
Results accuracy of RE resolution
  • Best results
  • RE ?? document
  • 97 (428 REs out of 437) or
  • 93 (meetings involving several newspapers)
  • RE ?? document element
  • 67 (303 REs out of 437)
  • Baseline scores (simple methods, for comparison)
  • RE ?? document
  • most frequent doc. 82 (all meetings) or50
    (meetings involving several newspapers)
  • RE ?? document element
  • entire front page 16
  • main article (MasterArticleID'1') 18

14
Score-based analysis of features
  • Right/left contexts
  • only the words after the RE (right context) are
    considered for matching
  • words before RE (left context) are irrelevant (do
    not increase score)
  • without right context 40 accuracy only
  • optimal number of words to look for 10
  • Weights
  • match between RE and title of an article gt
    match between the right context and title gtgt
    matches with content of the article
  • values 15 gt 10 gtgt 1
  • Relevance of anaphora tracking
  • when disabled, RE ?? document element
    identification reaches 60 accuracy only

15
Combined resultsRE detection recognition
2
3
  • Results with best configurations
  • 60 document accuracy (265 REs out of 437)
  • 32 document element accuracy (141 REs out of
    437)
  • Accuracy detection recognition do not combine
    linearly
  • if they did 73 and resp. 50 accuracy
  • Tentative explanation
  • contextual algorithm each RE resolution depends
    on the correct resolution of the previous one
  • gt when too many REs are missing (R 0.67),
    especially pronouns, the algorithm loses track of
    the current document element

16
Relation to other types of document/speech
alignment
  • Citation-based alignment (Mekhaldi, Lalanne,
    Ingold 2004)
  • lexicographic match between terms in documents
    and transcripts
  • Thematic alignment (Mekhaldi, Lalanne, Ingold
    2004)
  • semantic similarity between
  • sections of documents (sentences, paragraphs,
    logical blocks, etc.) and
  • units of the dialog structure (utterances, turns,
    and thematic episodes)
  • algorithm based on similarity metrics between
    bags of weighted words
  • Results for thematic alignment
  • matching spoken utterances with document logical
    blocks
  • cosine metric recall is 0.84, and precision is
    0.77
  • matching speech turns with document logical
    blocks
  • cosine metric recall is 0.84, precision is 0.85
  • alignment of spoken utterances to document
    sentences
  • Jaccard metric recall is 0.83, precision is 0.76
  • Reference-based alignment is complementary to
    other methods
  • integration ? increase robustness of
    document-to-speech alignment

17
Document
ltblock id77"gtJustice les surprises du procès
Elflt/blockgt ltblock id78"gtLA JUSTICE marque des
points au procès Elf, dont la troisième semaine
devait s'ouvrir, lundi 31 mars, lt/blockgt ltblock
id79"gtLes deux premières semaines d'audience
ont profondément déstabilisé les systèmes de
défense des principaux prévenus M Le
Floch-Prigent, mais aussi Alfred Sirven, l'ancien
directeur des affaires générales, et André
Tarallo, l'ex- M Afrique d'Elflt/blockgt
Physical
Logical
paragraphs
Syntactical
Sentences
sentences
Thematic
ltblock id8" ST"10.64" ET"11.81"gtDidier euh..
un point sur la justice.lt/blockgt ltblock id9"
ST"11.81" ET"12.7"gtVoilà, alors les surprises
du procès Elflt/blockgt ltblock id11" ST "14.7"
ET "18.1"gtEuh.. ce procès juge une affaire de
détournement de fonds, certainement l'une des
plus spectaculaires de nos jours.lt/blockgt ltblock
id12" ST "18.3 " ET 22.8"gtEuh.. durant les
deux premières ...lt/blockgt
Speech
Thematic
Turns
Syntactical
Utterances
Utterances
18
Document Inquisitor
TQB
FriDoc
FaericWorld
19
Related publications
  • Denis Lalanne, Rolf Ingold. "STRUCTURING
    MULTIMEDIA ARCHIVES WITH STATIC DOCUMENTS", ERCIM
    News No. 62 "Multimedia Informatics", July 2005.
  • Andrei Popescu-Belis and Denis Lalanne. 2004.
    Ref2doc Reference Resolution over a Restricted
    Domain. In ACL 2004 Workshop on Reference
    Resolution and its Applications, Barcelona,
    p.71-78.
  • Popescu-Belis A.. 2003. Evaluation-driven design
    of a robust reference resolution system. Natural
    Language Engineering, 9(3)281306.
  • Lalanne D., Ingold R., von Rotz D., Behera A.,
    Mekhaldi D. Popescu-Belis A., 2005. Using
    Static Documents as Structured and Thematic
    Interfaces to Multimedia Meeting Archives. In
    Bengio S. Bourlard H., eds., Machine Learning
    for Multimodal Interaction, LNCS 3361,
    Springer-Verlag, Berlin/Heidelberg, p.87-100.
  • Karim Hadjar, Maurizio Rigamonti, Denis Lalanne,
    and Rolf Ingold. 2004. Xed a new tool for
    extracting hidden structures from electronic
    documents. In Workshop on Document Image Analysis
    for Libraries, Palo Alto, CA.
  • Dalila Mekhaldi, Denis Lalanne, and Rolf Ingold.
    2003. Thematic alignment of recorded speech with
    documents. In ACM DocEng 2003, Grenoble, France.
  • Denis Lalanne, Dalila Mekhaldi, and Rolf Ingold.
    2004. Talking about documents revealing a
    missing link to multimedia meeting archives. In
    Document Recognition and Retrieval XI -
    IST/SPIEs Annual Symposium on Electronic
    Imaging, San Jose, CA.
  • Popescu-Belis A., Georgescul M., Clark A., and
    Armstrong S. 2004. Building and using a corpus of
    shallow dialogue annotated meetings. In LREC
    2004, Lisbon, p.1451-1454.
  • Popescu-Belis A., Clark A., Georgescul M.,
    Zufferey S. Lalanne D., 2005. Shallow Dialogue
    Processing Using Machine Learning Algorithms (or
    not). In Bengio S. Bourlard H., eds., Machine
    Learning for Multimodal Interaction, LNCS 3361,
    Springer-Verlag, Berlin/Heidelberg, p.277-290.

20
Evaluation metrics RE detection
  • Compare the correct REs with those found
    automatically
  • precision and recall
  • Two difficulties
  • tolerance on the RE boundaries
  • given our goal of doc/speech alignment
  • acceptable if system_RE ? correct_RE
  • consider embedded REs
  • e.g. the title of the first article

21
RE resolution Algorithm based on anaphora
tracking (2)
  • In addition
  • some guesses for the first RE of a meeting
  • cannot be anaphoric
  • Parameters
  • weights of the various matches 9 pairs
  • RE_word, left_context_word, right_context_word
    x title_or_subtitle_word, author_word,
    contents_word
  • size of the left and right context of the RE
  • number of preceding and following utterances

22
Alignments Grouping
Levels grouping
Speech
Thematic
Final framework
Merging
Quotations
References
Document
Document/speech alignment
Write a Comment
User Comments (0)
About PowerShow.com