Models and Tools for Collaborative Annotation - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Models and Tools for Collaborative Annotation

Description:

... sites involved in a single annotation project ... Hz running Linux. Example Query ... Documentation, source and binary distributions. Download from agtk. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 28
Provided by: xiao86
Category:

less

Transcript and Presenter's Notes

Title: Models and Tools for Collaborative Annotation


1
Models and Tools for Collaborative Annotation
  • Xiaoyi Ma, Haejoong Lee,
  • Steven Bird and Kazuaki Maeda
  • Linguistic Data Consortium
  • University of Pennsylvania

2
Outline
  • Background AG, AGTK
  • Collaborative annotation with AGTK
  • TableTrans with column locking a real world
    example of collaborative annotation with AGTK
  • Efficient AG Query
  • Conclusion

3
Background Annotation Graph
  • Annotation Graphs (AG) provide a comprehensive
    formal framework for constructing, maintaining
    and searching linguistic annotations

4
Background AGTK
  • AGTK provides software infrastructure allowing
    developers to quickly create special-purpose
    annotation tools using common components
  • AGTK consists of three parts
  • The annotation graph library
  • The I/O library
  • Wrappers providing interfaces for scripting
    languages

5
Collaborative Annotation
  • Multiple annotators/sites involved in a single
    annotation project
  • Control access to different regions and types of
    annotation
  • Log modifications
  • Track the quality checks that have been made
  • Multi-pass annotation
  • Different people work on different passes
  • Only one person edits an annotation file at any
    given time
  • Version control software and database servers
  • Difficult to incorporate its functionality to
    existing annotation tools and end users

6
Collaborative Annotationwith AGTK
  • Exploiting the annotations
  • store management information with annotations
  • Exploiting the database
  • server manages collaboration
  • Exploiting the query language
  • precompute Kleene Star to solve arc tracing
    problem

7
1. Exploiting the Annotations
Commentsdispute settled by a third party
WordPhiladelphia
Last modified byxma
Quality control3
Complete date2002-05-14-1510
8
1. Exploiting the Annotations
  • Management information
  • Accessed by the same API for annotation label
    data
  • Can reside only on server, with option allowing
    only relevant fields to be exported
  • Collaborative parties can agree on additional
    fields for managing their joint work

9
2. Exploiting the Database
  • Users/groups maybe granted different levels of
    access to the server
  • Users maybe assigned to different groups
  • Updates can occur at various levels of
    granularity
  • Annotations can be queried in SQL or in a
    customized query language
  • Queries can cross corpora
  • Question How to store annotation data in a
    relational database

10
AG Object Model
11
AG Database API
Need to go further than simple load/store
12
3. Exploiting the Query Language
  • Flexible management of annotation data requires a
    query language
  • Queries can operate across corpora
  • Problem AG queries with path expressions cant
    be translated to SQL

13
Collaborative Annotationwith TableTrans Context
  • Research by Robert Seyfarth et al at UPENN
  • Social behavior and vocal communication in
    nonhuman primates
  • Enter into spreadsheet fields
  • Recording offsets
  • Tape number
  • Date
  • Time
  • Code observation for regions of interest of a
    signal
  • Location
  • Animal id
  • Group id
  • Call type
  • Signal quality

14
The Traditional Process
Field recordings
First pass by annotator
Further annotation by specialist
15
Annotating with TableTrans
16
The Process We need to support
SQL
SQL
17
TableTrans with Column Locking
18
Efficient AG Queries
  • (AG SQL) is unsuitable for annotation graphs
  • Solution
  • (AG SQL Precomputed K)
  • Experiment
  • Result

19
(AG SQL) Insufficientfor AG Queries
  • Too verbose to express AG queries
  • Arc tracing problem
  • Efficiency

20
2. AG SQL Precomputed K
  • Pre-computing K, transitive closure of
    annotations
  • Problem combinatorial explosion
  • Solution restrict K to types
  • Intuition whenever we do K, we know in advance
    what the annotation types are

21
3. Experiment with TIMIT
  • Part of TIMIT corpus used
  • PostgreSQL used for relational database
  • Hardware PII 500M Hz running Linux

22
Example Query
  • Query find word arcs whose phonetic
    transcription starts with a hv and contains a
    dcl

23
SQL Translation
24
4. Results withPrecomputed K
  • AG Query can be efficiently executed in SQL

25
The Process We Can Support
SQL
SQL
26
Collaborative Annotationwith AGTK
  • Exploiting the annotations
  • Store management information with annotations
  • Exploiting the database
  • Annotation graphs can be stored in a relational
    database and accessed remotely
  • AGTK makes it easy for developers to create
    annotation tools that store all their data in a
    server
  • Exploiting the query language
  • AG queries can be efficiently executed by using
    precomputed K table

27
Conclusion
  • AGTK supports Collaborative Annotation
  • Open Source
  • C/Tcl/Python/Java
  • Documentation, source and binary distributions
  • Download from agtk.sf.net
  • Credits
  • NSF Grants 9978056 and 9980009
Write a Comment
User Comments (0)
About PowerShow.com