Title: David Farwell, Stephen Helmreich
1Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
- David Farwell, Stephen Helmreich
- Computing Research Laboratory/New Mexico State
University - Lori Levin, Teruko Mitamura
- Language Technologies Institute/Carnegie Mellon
University - Bonnie Dorr, Rebecca Green
- Institute for Advanced Computer
Studies/University of Md. - Eduard Hovy
- Information Sciences Institute/University of S.
California - Keith Miller, Florence Reeder
- MITRE Corporation
- Owen Rambow, Nizar Habash
- Columbia University
2Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
- What we annotate
- multiple comparable bilingual text corpora
- parallel text corpora
- multiple translations of texts
- Genre - newspaper texts / DARPA corpus
- Goals
- common representation (interlingua)
- common methodology and tools
- observe and catalogue different surface
realizations of the same meaning across and
within languages
3Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
4Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
5Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
- Annotation Process
- Text is syntactically parsed (Connexor / IL0)
- Reviewed and corrected (TrEd)
- Annotation to IL1 (Tiamat)
- Content words annotated for sense (Omega)
- Arguments annotated for thematic role (LCS)
- 2 English translations of 6 articles
- Arabic, French, Hindi, Japanese, Korean, Spanish
- 12 annotators, 2 at each site
- Total 144 annotated texts to IL1 level
6Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE,
UMIACS/UMD
- Results Agreement Time
- Tools (Tiamat)
- Manuals (IL0 for 7 languages, IL1)
- Inter-annotator agreement kappa .83 (mK), .66
(wn), .59 (theta-roles) - Annotation time 4 hours/annotator/ text, 250
words/text, 2 annotators/text approx. 2 person
years for 100K at IL1 - Next step merge IL1 representations and develop
transformation algorithms to produce IL2