Title: An Intuitive Representation of Human Languages for Translation
1An Intuitive Representation of Human Languages
for Translation
- Gábor Prószéky
- MorphoLogic
-
- Faculty of Information Technology,
- Pázmány University
Kalmár Workshop Szeged, October 1-2, 2003
2Contents
- Some words on Prof. Kalmárs activity in
computational linguistics - Problems of human language description with
formal tools - A new representation with patterns
- Introduction to machine translation methods
- Application of patterns to translation
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
3Kalmár languages
- Kalmárs paper in formal language theory An
Intuitive Representation of Context-Free
Languages - Kalmárs activity in machine translation
(conference in 1962) Representation of
Languages with the Help of Mathematical
Structures
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
4Linguistic representation problems of the 60s
- Dependency structure
- Constituent structure
- X-bar theory X ? (P) X (Q)
- Related structures
- Using transformations
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
5Structured symbols
- Linguistic categories atomic symbols
- Not enough subcategorization
- Semantic features alive, ...
- Syntactic features countable, ...
- Rule sets instead of rules
- ID/LP
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
6Feature structures
- DAGs
- Unification problems
- Feature geometry, typed features
- LFG, GPSG, HPSG
- Parsing CF-skeleton features or feature
structures only?
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
7Complexity of NL grammars
- RG/FSA not enough
- CF/RTN not enough
- CS ?
- 0/ATN Turing Machine
- Transformations and metarules
- Arguments for and against
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
8NL grammar formalisms
- Competence and performance?
- Kornai number (left-recursion, center-embedding,
respectively construction) - Gradually from unrestricted to regular
- (i) anbn -gtab (n is lost!)
- (ii) anbn -gte,ab,aabb,aaabbb
- Finitization by length
- No structure in FSA finite systems, however, can
produce structural output
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
9Syntax and semantics
- Logical representations(e.g. ?x.dog(x),
?x.run(x)) - World-knowledge representations(e.g. IS-A,
PART-OF, INSTANCE-OF) - Categorial grammar early logical representations
of syntax (Kalmár) - DCG interpretation representation
- Rule-to-rule hypothesis
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
10Conflict handling
- Lexicon meets syntax who is right?
- Lexicon off-line info coming from past
experiences - Which is more important in a specific situation?
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
11Open classes
- Open vs. closed classesthat is, features can or
cannot be overridden - Proper names, jabbers, folk etymology, loanwords,
... - Grammar of closed classesminimal grammar
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
12Finite morphology
- Finite patterns
- Finite number of entries
- Descriptions assigned to entries
- Finite open vs. infinite
closed - Underspecified entries for guessing
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
13Finite syntax
- Item and arrangement (as in morphology)
- Arrangement describes a rather free
constituent-order - Metawords in a meta-dictionary, e.g. (Det (Adj
(N))) ? DAN - Cascades without loop
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
14The plastic box
- John is a boy.
- John is a noun.
- Go is a verb.
- Go is a verb.
- ? is a sign.
- ? is a sign.
- is a . (where is a plastic box)
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
15Real examples
- (a) Unusual useGo is a verb.POS np ? POS v
- (b) MetaphorMy car drinks a lot.ANIMATE ?
ANIMATE - - (c) Unknown entryKalmár is a family name.POS
np
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
16Linguistic frames
- Psychology Gestalt
- Morphological complex structures treated as
frames by humans - Frames in AI shopping, walking, ...
- As high-level parsing relates to detailed
on-line analysis
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
17Translation of human languages
Kalmár Workshop 2003
- old problems (50s)
- direct (60s)
- interlingual (70s)
- transfer (80s)
- examples (90s)
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
18Patterns general linguistic information in
lexicalized form
- Short, fully specified patterns are lexical
entries - Longer, fully specified entries are multi-word
expressions - Partially underspecified patterns are
collocations, phrasal verbs, idioms - Totally underspecified patterns are linguistic
rules - Pattern/interpretation pairs Translation
Description Language
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
19The MetaMorpho principles
- No single words but contextual expressions (in
form of patterns) only - Pattern pairs input/interpretation structure
pairs - Single pass no separate transfer steps
- Target structure generation by-product of
parsing -
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
20Jabberwocky
- Twas brillig, and the slighty tovesDid gyre
and gimble in the wabeAll mimsy were the
borogroves,And the mone raths outgrabe.
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
21Kalmár Workshop 2003
- Twas , and the sDid and in the All
were the s,And the s .
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
22Translation rules for Jabberwocky
- twas ? volt
- , and ? , és
- the s did ? a ok tak
- and ? és
- in the ? a ban
- all ? teljesen
- were the s ? k voltak az ok
- the s ? a ok tek
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
23- Twas , and the sDid and in the All
were the s,And the s . - volt, és a oktak és tek a benteljesen
voltak a okés a ok tek.
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
24Translation of Jabberwocky
- Dzsebervoki Brillig volt, és a
szlájti tóvokgájertak és gimbeltek a
vébbenteljesen mimszik voltak a borogróvokés a
món rátok autgrébtek.
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
25An intuitive representation...
- X-bar based structures
- Feature-based descriptions
- Metarules (used off-line)
- Rule-to-rule principle
- Lexicon should be finite but open
- Closed classes belong to the minimal grammar
- Minimal grammar describes basically linguistic
elements
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
26An intuitive representation...(contd)
- Linguistic constructions can be described by
finite patterns - A huge finite description set is used rather
than a limited infinite grammar - In case of conflict, lexical information is
either redundant or contradicting to the actual
description - Known constructions need no real-time analysis
(Gestalt, frame)
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
27An intuitive representation... (contd)
- Broken frames are analyzed real-time
- Structural (source/target) pattern pair is
assigned to every frame to be translated - Target structure is computed while parsing
source structure
Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation
28Kalmár Workshop 2003
Gábor Prószéky An Intuitive Representation of
Human Languages for Translation