Title: Multilingual%20Generation%20of%20Controlled%20Languages
1Multilingual Generation of Controlled Languages
- Richard Power (ITRI)
- Donia Scott (ITRI)
- Anthony Hartley (CTS)
ITRI Information Technology Research Institute
University of Brighton, UK CTS Centre for
Translation Studies, University of Leeds, UK
2Background
- Since 1993, NLG projects at ITRI have focussed on
the problem of producing technical documentation
in multiple languages (Drafter, CLIME, PILLS,
CLEF). - Typical application is PILLS, in the
pharmaceutical domain, where for example patient
information leaflets are produced in around 150
languages and revised often. - ITRI introduced the WYSIWYM (What You See Is What
You Meant) method for editing knowledge for NLG.
A similar idea is used in XRCEs MDA
(Multilingual Document Authoring) approach. - The talk describes current work on widening the
coverage of WYSIWYM so that it can edit complete
patient information leaflets.
3Overview
- Problem how to produce documents in CLs
- Approach create a direct manipulation CL editor
by analogy with a drawing tool - Examples of how such an editor might work
- Snapshots of prototypes
- Advantages and disadvantages
- Future developments
4Methods for controlling language (1)
- A trained author writes a text, trying to comply
with the rules of a CL. - Tools for checking terminology, grammar, and
style, identify non-compliant sentences, and may
generate possible alternatives. - If versions in other languages are needed, an MT
system should make fewer mistakes if the source
text is in a CL.
5Methods for controlling language (2)
- The content of a document is already encoded in a
formal knowledge base. - A language generation tool generates text from
this encoding of content, using a grammar and
lexicon which guarantees compliance with a CL
(Danlos et al., 2000). - Versions in other languages can be generated from
the same knowledge base no interpretation is
required.
6Methods for controlling language (3)
- The author creates the text through a direct
manipulation interface in which all options are
generated by the program. These options guarantee
compliance with a CL. - Editing options are linked to features in an
underlying interlingua, so that as well as
creating a text, the author implicitly creates a
formal encoding of the content. - Versions in other languages can be generated from
the same formal encoding no interpretation is
required.
7Xfig editor for controlled drawings
Can we develop a CL editor by analogy with a
drawing tool?
8Complex nested drawings using Xfig
9Constraints of a drawing editor
- The author can create instances of a number of
predefined patterns (rectangle, oval, etc.). - Instances can be configured by changing a set of
predefined features (colour, size, line
thickness, etc.). - Instances can be located at various points in the
drawing (depending on grid setting).
10Controlled character editing
Text editor Text editor Text editor
The author can create instances of predefined
patterns (letters, punctuation marks), configure
them by predefined parameters (font, bold, size,
colour, etc.), and place them at permitted
locations.
11General requirements for editing tool
- The tool allows users to create instances of
predefined types, and to place them at
constrained locations. - Once created, instances can be configured by
varying a predefined set of parameters. - Instances can also be deleted, or cut, or copied,
or pasted into other locations.
12Editing tool for controlled languages
- Author can create instances of patterns based on
verbs, nouns etc. (e.g., sentences, noun
phrases). - Once created, instances can be configured by
varying parameters like tense, polarity, and
number, or by introducing modifiers. They can
also be deleted or cut/copied/pasted. - However, what counts as a location within a
linguistic pattern (e.g., a sentence)?
13Location in a CL editor
Location is a point within the character sequence
Text editor
Drawing tool
Location is an area within a two-dimensional grid
Controlled Language editor
Since we are editing linguistic form rather than
a character sequence, location might be defined
as a node within a hierarchical structure
?
14Editing a hierarchical structure (Step 1)
Some specialised drawing tools edit hierarchical
structures. In this example, the aim is to
configure a house. The first step is to choose a
basic house pattern.
In a hierarchical structure, locations are points
within an existing pattern where appropriate
constituents may be added.
15Step 2 Selecting a constituent (door)
Once a pattern has been selected, it can be
reconfigured. Having chosen the one-door
one-window pattern we can for example add a
garage.
Instead of reconfiguring the basic house pattern,
the author can click on a location where a
constituent must be added.
16Step 3 Choosing a basic door pattern
Highlighting in red shows which part of the
current design has been selected for adding a new
constituent, or for reconfiguring an existing one.
Having selected a location, the user is presented
with a set of suitable options. Each option is a
basic pattern which can be configured later.
17Step 4 Configuring the door pattern
- Three configuration parameters can be varied
- Cross on window
- Letter box
- Cat flap
Having chosen a basic door pattern, the user can
reconfigure it, for instance by adding a letter
box.
18Step 5 Selecting a constituent (window)
The configuration options change once the letter
box has been added. The options for varying the
other parameters (window cross, cat flap) now
include the letter box.
Satisfied with the door, the user selects the
other location where a new constituent can be
added.
19Step 6 Choosing a basic window pattern
The window location is now highlighted in red, to
show that it has been selected.
Once a basic window pattern has been selected,
the design will be potentially complete, because
all empty locations will be filled.
20Result Completed design for a house
To simplify, we assume there are no configuration
options for windows.
Editing could stop here. Alternatively, the user
could change the design by further operations
(delete window, reconfigure house, etc.).
21Editing a CL sentence (Step 1)
Options
Document
- Someone asks someone something
- Something attacks something
- - - etc. - - -
- Someone reads something
- Someone swallows something
- - - -etc. - - -
Something is the case
The Document pane shows an anchor, a generic
phrase in square brackets. This represents a
location where a specific event pattern may be
inserted. The pattern is selected from a list of
options.
22Step 2 Selecting a constituent (agent)
Options
Document
Someone might swallow something Someone
must swallow something Someone does not
swallow something Someone swallowed
something Someone will swallow
something Someone swallows something
somewhere Someone swallows something in
some way - - - etc. - - -
Someone swallows something
Having selected the swallow pattern with its
parameters defaulted (e.g., present tense), we
can choose from configuration options.
Alternatively we can select a location within the
pattern, such as the agent role.
23Step 3 Choosing a basic agent pattern
Options
Document
a doctor a man a patient a pharmacist a woman - -
-etc. - - -
Someone
swallows something
The location corresponding to the unspecified
agent is highlighted in red. As in the house
editor, options are offered only if they are
suitable for the location. The suitable options
in this case are noun phrases referring to agents.
24Step 4 Configuring the agent pattern
Options
Document
patients the patient a some kind of
patient a patient who does something - - -etc.
- - -
A patient swallows something
The configuration options for nominals vary
parameters corresponding to singular vs. plural,
definite vs. indefinite, and potential modifiers
(e.g., adjective, relative clause).
25Step 5 Selecting a constituent (object)
Options
Document
the patients a patient the some kind of
patient the patient who does something - -
-etc. - - -
The patient swallows something
Assuming the user does not want to configure the
agent any more, the next step is to select the
object location.
26Step 6 Choosing a basic object pattern
Options
Document
The patient swallows
something
- a button
- a capsule
- a cream
- - - etc. - - -
- a medicine
- a tablet
- water
- - - -etc. - - -
Once an object pattern has been selected, the
sentence is potentially complete, although it can
be configured further if desired.
27Result Completed event
Options
Document
tablets the tablet a some kind of tablet a
tablet which does something - - -etc. - - -
The patient swallows
a tablet
28What are we really editing?
Drawing editor
HEIGHT 3.0 in WIDTH 2.0 in LINE THICKNESS
1 LINE COLOUR black FILL COLOUR green
Underlying formal encoding
Presentational form
29What are we really editing?
Text editor
The patient
- 84 104 101 32
- 97 116 105
- 101 110 116
The patient
The patient
Underlying formal encoding
Presentational forms
30What are we really editing?
Controlled English editor
CATEGORY nominal HEAD NOUN patient DETERMINER
the NUMBER singular MODIFIERS none
the patient
Underlying formal encoding
Presentational form
31What are we really editing?
Controlled interlingua editor
CLASS person CONCEPT patient IDENTIFIABLE
yes NUMBER single QUALIFIERS none
the patient
il paziente
o paciente
patienten
Underlying formal encoding
Presentational forms
32Choosing an event concept
CLASS event CONCEPT MODALITY POLARITY TIME QUALIF
IERS
Something is the case
- ask(person,person,fact)
- attack(thing,thing)
- - - etc. - - -
- read(person,thing)
- swallow(person,thing)
- - - - etc. - - -
event
Anchors in the feedback text correspond to
generic types in the ontology (e.g., event),
which subsume a set of specific conceptual
patterns from which users may choose.
33Presenting event patterns
Options
Document
- Someone asks someone something
- Something attacks something
- - - etc. - - -
- Someone reads something
- Someone swallows something
- - - -etc. - - -
Something is the case
To present the options, a sentence pattern is
generated for each event pattern specified by the
ontology.
34Configuring an event
The heavy border on the rectangle means that this
node is currently selected.
CLASS event CONCEPT swallow MODALITY none
(possible, obligatory) POLARITY positive
(negative) TIME present (past,
future) QUALIFIERS none (place, manner)
Someone swallows something
ARG2
ARG1
CLASS person CONCEPT IDENTIFIABLE NUMBER QUALIFIE
RS
CLASS thing CONCEPT IDENTIFIABLE NUMBER QUALIFIER
S
When a pattern is chosen, its configuration
parameters are initially set to default values.
Configuration options are computed from the
alternative values for each parameter (shown here
in brackets).
35Presenting configuration options
Options
Document
Someone might swallow something Someone
must swallow something Someone does not
swallow something Someone swallowed
something Someone will swallow
something Someone swallows something
somewhere Someone swallows something in
some way - - - etc. - - -
Someone swallows something
Each configuration option is generated from an
event pattern which is identical to the current
pattern except that one parameter is varied.
36Choosing an agent concept
CLASS event CONCEPT swallow MODALITY
none POLARITY positive TIME present QUALIFIERS
none
Someone
swallows something
ARG1
ARG2
doctor man patient pharmacist woman - - etc. - -
CLASS person CONCEPT IDENTIFIABLE NUMBER QUALIFIE
RS
CLASS thing CONCEPT IDENTIFIABLE NUMBER QUALIFIER
S
person
37Presenting agent patterns
Options
Document
a doctor a man a patient a pharmacist a woman - -
-etc. - - -
Someone
swallows something
38Configuring a person/object
CLASS event CONCEPT swallow MODALITY
none POLARITY positive TIME present QUALIFIERS
none
A patient
swallows something
ARG1
ARG2
CLASS person CONCEPT patient IDENTIFIABLE no
(yes) NUMBER single (multiple) QUALIFIERS none
(property, event)
CLASS thing CONCEPT IDENTIFIABLE NUMBER QUALIFIER
S
39Presenting the configuration options
Options
Document
patients the patient a some kind of
patient a patient who does something - - -etc.
- - -
A patient swallows something
40Result of configuring operation
CLASS event CONCEPT swallow MODALITY
none POLARITY positive TIME present QUALIFIERS
none
The patient
swallows something
ARG1
ARG2
CLASS person CONCEPT patient IDENTIFIABLE yes
(no) NUMBER single (multiple) QUALIFIERS none
(property, event)
CLASS thing CONCEPT IDENTIFIABLE NUMBER QUALIFIER
S
41Presenting new configuration options
Options
Document
the patients a patient the some kind of
patient the patient who does something - -
-etc. - - -
something
The patient swallows
42Implementing the CL editor
- So far, two programs have been implemented
- Editing patient information leaflets in English
and Italian, using language-specific syntactic
structure as the underlying representation. The
English and Italian versions must be produced
separately. - The same, using an interlingual semantic
structure as the underlying representation. A
single underlying representation is sufficient
for both languages, so the author only needs to
create one version.
(No attempt has been made yet to comply with the
rules of any particular controlled language.)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47Advantages of CL editing
- The author need not learn the rules of a CL.
Compliance is guaranteed by the options offered
by the program. - If the underlying representation is a semantic
interlingua, equivalent versions can be generated
in other languages. - If the content of a document changes, the author
can use CL editing to modify the underlying
representation, and then regenerate documents in
all the required languages.
48Disadvantages of CL editing
- Within the limits of a CL, there are stylistic
options which a human author can probably control
better than a program. - An experienced author can create a CL document
more quickly by typing into a text editor than by
selecting options from menus. - While CL editing brings the added benefit of
reliable generation in other languages, authors
(and their bosses) may not perceive this as
sufficient compensation.
49Future developments
- Evaluating the user interface (some pilot studies
already under way) - Using CL editing to supplement and correct
semantic models derived using information
extraction from legacy documents - Allowing some control over stylistic options