Title: Automatic Functor Assignment AFA in the Prague Dependency Treebank
1Automatic Functor Assignment (AFA) in the
Prague Dependency Treebank
- PDT
- a long term research project
- at the Institute of Formal and Applied
Linguistics - aimed at a complex annotation of a part of the
Czech National Corpus - annotation scheme - 3 levels
Raw text
AFAs position within the PDT
Morphologically tagged text
Analytic tree structures (ATS)
Tectogrammatical tree structures (TGTS)
2Problem analysis, Data preprocessing
- Motivation
- to reduce the huge amount of human work involved
in the development of the PDT - Problem statement
- to assign a functor to every node in a TGTS
- Initial situation
- no AFA system with a reasonable cover existed
- human annotators use mostly only their language
knowledge, not formal rules - annotators take into account the whole-sentence
context - a certain amout of manually annotated TGTSs are
available - What is the minimal amount of information that is
sufficient to decide about the functor ? - Problem reformulation
- AFA ? to classify symbolic vectors into 53
classes - Available material - 18 files (up to 50 sentences
in each)
vectors with 12 symbolic attributes
feature selection feature extraction
3Components of the proposed AFA system
- Symbiosis of 4 different approaches
- 7 Rule-based Methods (RBMs)
- 3 Dictionary-based Methods (DBMs)
- Nearest vector (similarity)
- Machine learning (Quinlans C4.5, Sao Deroski)
- Implementation
- a set of small programs for preprocessing and
format conversions, dictionary mining, functor
assigning, and performance evaluation - Linux filters, Perl, SQL
- assigners are applied in a strictly pipelined
fashion - Data Flow Diagram
4Performance evaluation
- Detailed evaluation of several quantities for
each assigner in a sequence - Several sequences of assigners were tested
- e.g., a sequence of RBMs
- Comparison of different sequences of assigners
5Further work
- Machine learning - searching for new regularities
- Improvement of dictionaries
- Tectogrammatical annotation of verb valency
frames - Categorial grammars
Talks Publications
language ? ? fuzzy sets
Z Fuzzy ontroller as a Tool for Traffic
Simulation. Mendel 1999
Z Introduction to the PDT, Faculty of Arts,
Ljubljana, 2000
Z Constrained Fuzzy Arithmetic Engineers
View. CMP Research Rep.
Z AFA in the PDT, seminar at the IFAL, 2000
Z AFA in the PDT, TSD 2000
Z Comp. Problems of CFA,CMP seminar
S. Deroski, Z ML approach to AFA in the PDT,
5th TELRI seminar, 2000
M. Navara, Z Comp. Problems of CFA, ISCI 2000
?
S. Deroski, Z ML approach to AFA in the PDT,
ACL, 2001
M. Navara, Z How to make CFA efficient, Soft
Computing 2001
Stranáková, Skoumalová, Panevová, Z Tectogram.
annotation of verb. val. frames, TSD 2001
?
M. de Cock, Z Representing Ling. Hedges by
L-Fuzzy Modifiers, CIMCA 2001
?