Automatic Functor Assignment AFA in the Prague Dependency Treebank

About This Presentation
Title:

Automatic Functor Assignment AFA in the Prague Dependency Treebank

Description:

free modifiers: TWHEN, LOC, DIR1, BEN, APP, CPR ... Raw text. Morphologically. tagged text ... AFA to classify symbolic vectors into 53 classes ... –

Number of Views:46
Avg rating:3.0/5.0
Slides: 6
Provided by: zdeneka
Category:

less

Transcript and Presenter's Notes

Title: Automatic Functor Assignment AFA in the Prague Dependency Treebank


1
Automatic Functor Assignment (AFA) in the
Prague Dependency Treebank
  • PDT
  • a long term research project
  • at the Institute of Formal and Applied
    Linguistics
  • aimed at a complex annotation of a part of the
    Czech National Corpus
  • annotation scheme - 3 levels

Raw text
AFAs position within the PDT
Morphologically tagged text
Analytic tree structures (ATS)
Tectogrammatical tree structures (TGTS)
2
Problem analysis, Data preprocessing
  • Motivation
  • to reduce the huge amount of human work involved
    in the development of the PDT
  • Problem statement
  • to assign a functor to every node in a TGTS
  • Initial situation
  • no AFA system with a reasonable cover existed
  • human annotators use mostly only their language
    knowledge, not formal rules
  • annotators take into account the whole-sentence
    context
  • a certain amout of manually annotated TGTSs are
    available
  • What is the minimal amount of information that is
    sufficient to decide about the functor ?
  • Problem reformulation
  • AFA ? to classify symbolic vectors into 53
    classes
  • Available material - 18 files (up to 50 sentences
    in each)

vectors with 12 symbolic attributes
feature selection feature extraction

3
Components of the proposed AFA system
  • Symbiosis of 4 different approaches
  • 7 Rule-based Methods (RBMs)
  • 3 Dictionary-based Methods (DBMs)
  • Nearest vector (similarity)
  • Machine learning (Quinlans C4.5, Sao Deroski)
  • Implementation
  • a set of small programs for preprocessing and
    format conversions, dictionary mining, functor
    assigning, and performance evaluation
  • Linux filters, Perl, SQL
  • assigners are applied in a strictly pipelined
    fashion
  • Data Flow Diagram

4
Performance evaluation
  • Detailed evaluation of several quantities for
    each assigner in a sequence
  • Several sequences of assigners were tested
  • e.g., a sequence of RBMs
  • Comparison of different sequences of assigners

5
Further work
  • Machine learning - searching for new regularities
  • Improvement of dictionaries
  • Tectogrammatical annotation of verb valency
    frames
  • Categorial grammars

Talks Publications
language ? ? fuzzy sets
Z Fuzzy ontroller as a Tool for Traffic
Simulation. Mendel 1999
Z Introduction to the PDT, Faculty of Arts,
Ljubljana, 2000
Z Constrained Fuzzy Arithmetic Engineers
View. CMP Research Rep.
Z AFA in the PDT, seminar at the IFAL, 2000
Z AFA in the PDT, TSD 2000
Z Comp. Problems of CFA,CMP seminar
S. Deroski, Z ML approach to AFA in the PDT,
5th TELRI seminar, 2000
M. Navara, Z Comp. Problems of CFA, ISCI 2000
?
S. Deroski, Z ML approach to AFA in the PDT,
ACL, 2001
M. Navara, Z How to make CFA efficient, Soft
Computing 2001
Stranáková, Skoumalová, Panevová, Z Tectogram.
annotation of verb. val. frames, TSD 2001
?
M. de Cock, Z Representing Ling. Hedges by
L-Fuzzy Modifiers, CIMCA 2001
?
Write a Comment
User Comments (0)
About PowerShow.com