Title: Models and Tools for Collaborative Annotation
1Models and Tools for Collaborative Annotation
- Xiaoyi Ma, Haejoong Lee,
- Steven Bird and Kazuaki Maeda
- Linguistic Data Consortium
- University of Pennsylvania
2Outline
- Background AG, AGTK
- Collaborative annotation with AGTK
- TableTrans with column locking a real world
example of collaborative annotation with AGTK - Efficient AG Query
- Conclusion
3Background Annotation Graph
- Annotation Graphs (AG) provide a comprehensive
formal framework for constructing, maintaining
and searching linguistic annotations
4Background AGTK
- AGTK provides software infrastructure allowing
developers to quickly create special-purpose
annotation tools using common components - AGTK consists of three parts
- The annotation graph library
- The I/O library
- Wrappers providing interfaces for scripting
languages
5Collaborative Annotation
- Multiple annotators/sites involved in a single
annotation project - Control access to different regions and types of
annotation - Log modifications
- Track the quality checks that have been made
- Multi-pass annotation
- Different people work on different passes
- Only one person edits an annotation file at any
given time - Version control software and database servers
- Difficult to incorporate its functionality to
existing annotation tools and end users
6Collaborative Annotationwith AGTK
- Exploiting the annotations
- store management information with annotations
- Exploiting the database
- server manages collaboration
- Exploiting the query language
- precompute Kleene Star to solve arc tracing
problem
71. Exploiting the Annotations
Commentsdispute settled by a third party
WordPhiladelphia
Last modified byxma
Quality control3
Complete date2002-05-14-1510
81. Exploiting the Annotations
- Management information
- Accessed by the same API for annotation label
data - Can reside only on server, with option allowing
only relevant fields to be exported - Collaborative parties can agree on additional
fields for managing their joint work
92. Exploiting the Database
- Users/groups maybe granted different levels of
access to the server - Users maybe assigned to different groups
- Updates can occur at various levels of
granularity - Annotations can be queried in SQL or in a
customized query language - Queries can cross corpora
- Question How to store annotation data in a
relational database
10AG Object Model
11AG Database API
Need to go further than simple load/store
123. Exploiting the Query Language
- Flexible management of annotation data requires a
query language - Queries can operate across corpora
- Problem AG queries with path expressions cant
be translated to SQL
13Collaborative Annotationwith TableTrans Context
- Research by Robert Seyfarth et al at UPENN
- Social behavior and vocal communication in
nonhuman primates - Enter into spreadsheet fields
- Recording offsets
- Tape number
- Date
- Time
- Code observation for regions of interest of a
signal - Location
- Animal id
- Group id
- Call type
- Signal quality
14The Traditional Process
Field recordings
First pass by annotator
Further annotation by specialist
15Annotating with TableTrans
16The Process We need to support
SQL
SQL
17TableTrans with Column Locking
18Efficient AG Queries
- (AG SQL) is unsuitable for annotation graphs
- Solution
- (AG SQL Precomputed K)
- Experiment
- Result
19(AG SQL) Insufficientfor AG Queries
- Too verbose to express AG queries
- Arc tracing problem
- Efficiency
202. AG SQL Precomputed K
- Pre-computing K, transitive closure of
annotations - Problem combinatorial explosion
- Solution restrict K to types
- Intuition whenever we do K, we know in advance
what the annotation types are
213. Experiment with TIMIT
- Part of TIMIT corpus used
- PostgreSQL used for relational database
- Hardware PII 500M Hz running Linux
22Example Query
- Query find word arcs whose phonetic
transcription starts with a hv and contains a
dcl
23SQL Translation
244. Results withPrecomputed K
- AG Query can be efficiently executed in SQL
25The Process We Can Support
SQL
SQL
26Collaborative Annotationwith AGTK
- Exploiting the annotations
- Store management information with annotations
- Exploiting the database
- Annotation graphs can be stored in a relational
database and accessed remotely - AGTK makes it easy for developers to create
annotation tools that store all their data in a
server - Exploiting the query language
- AG queries can be efficiently executed by using
precomputed K table
27Conclusion
- AGTK supports Collaborative Annotation
- Open Source
- C/Tcl/Python/Java
- Documentation, source and binary distributions
- Download from agtk.sf.net
- Credits
- NSF Grants 9978056 and 9980009