Natural Language Generation An Introductory Tour - PowerPoint PPT Presentation

About This Presentation
Title:

Natural Language Generation An Introductory Tour

Description:

Natural Language Generation An Introductory Tour Anupam Basu Dept. of Computer Science & Engineering IIT Kharagpur Language Technology What is NLG? – PowerPoint PPT presentation

Number of Views:196
Avg rating:3.0/5.0
Slides: 85
Provided by: facwebIit8
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Generation An Introductory Tour


1
Natural Language GenerationAn Introductory Tour
  • Anupam Basu
  • Dept. of Computer Science Engineering
  • IIT Kharagpur

2
Language Technology
Meaning
Text
Text
Speech
Speech
3
What is NLG?
  • Thought / conceptualization of the world
  • ------? Expression

The block c is on block a The block a is under
block c The block b is by the side of a The block
b is on the right of a The block b has its top
free The block b is alone
4
Conceptualization
  • Some intermediate form of representation

ON (C, A) ON (A, TABLE) ON (B, TABLE) RIGHT_OF
(B,A) .
What to say?
5
Conceptualization
Is_a
Block
C
On
Is_a
B
A
Right_of
What to say?
6
What to say ? How to say ?
  • Natural language generation is the process of
    deliberately constructing a natural language text
    in order to meet specified communicative goals.
  • McDonald 1992

7
Some of the Applications
  • Machine Translation
  • Question Answering
  • Dialogue Systems
  • Text Summarization
  • Report Generation

8
Thought / Concept ? Expression
  • Objective
  • produce understandable and appropriate texts in
    human languages
  • Input
  • some underlying non-linguistic representation of
    information
  • Knowledge sources required
  • Knowledge of language and of the domain

9
Involved Expertise
  • Knowledge of Domain
  • What to say
  • Relevance
  • Knowledge of Language
  • Lexicon, Grammar, Semantics
  • Strategic Rhetorical Knowledge
  • How to achieve goals, text types, style
  • Sociolinguistic and Psychological Factors
  • Habits and Constraints of the end user as an
    information processor

10
Asking for a pen
  • have(X, z)
  • not have (Y,z)
  • want have (Y,z)
  • ask(give (X,z,Y)))
  • Could you please give me a pen?

Situation
Goal
Why?
Conceptualization
What?
Expression
How?
11
Some Examples
12
Example System 1 FoG
  • Function
  • Produces textual weather reports in English and
    French
  • Input
  • Graphical/numerical weather depiction
  • User
  • Environment Canada (Canadian Weather Service)
  • Developer
  • CoGenTex
  • Status
  • Fielded, in operational use since 1992

13
FoG Input
14
FoG Output
15
Example System 2 STOP
  • Function
  • Produces a personalised smoking-cessation leaflet
  • Input
  • Questionnaire about smoking attitudes, beliefs,
    history
  • User
  • NHS (British Health Service)
  • Developer
  • University of Aberdeen
  • Status
  • Undergoing clinical evaluation to determine its
    effectiveness

16
STOP Input
17
STOP Output
  • Dear Ms Cameron
  • Thank you for taking the trouble to return the
    smoking questionnaire that we sent you. It
    appears from your answers that although you're
    not planning to stop smoking in the near future,
    you would like to stop if it was easy. You think
    it would be difficult to stop because smoking
    helps you cope with stress, it is something to do
    when you are bored, and smoking stops you putting
    on weight. However, you have reasons to be
    confident of success if you did try to stop, and
    there are ways of coping with the difficulties.

18
Approaches
19
Template-based generation
  • Most common technique
  • In simplest form, words fill in slots
  • The train from Source to Destination will leave
    platform number at time hours
  • Most common sort of NLG found in commercial
    systems

20
Pros and Cons
  • Pros
  • Conceptually simple
  • No specialized knowledge needed
  • Can be tailored to a domain with good performance
  • Cons
  • Not general
  • No variation in style monotonous
  • Not scalable

21
Modern Approaches
  • Rule Based approach
  • Machine Learning Approach

22
Some Critical Issues
23
Context Sensitivity in Connected Sentences
  • X-town was a blooming city. Yet, when the
    hooligans started to invade the place, __________
    . The place was not livable any more.
  • the place was abandoned by its population
  • the place was abandoned by them
  • the city was abandoned by its population
  • it was abandoned by its population
  • its population abandoned it..

24
Referencing
  • John is Janes friend. He loves to swim with
    his dog in the pool. It is really lovely.
  • I am taking the Shatabdi Express tomorrow. It
    is a much better train than the Rajdhani Express.
    It has a nice restaurant car, while the other has
    nice seats.

25
Referencing
  • John stole the book from Mary, but he was caught.
  • John stole the book from Mary, but the fool was
    caught.

26
Aggregation
  • The dress was cheap.
  • The dress was beautiful
  • The dress was cheap and beautiful
  • The dress was cheap yet beautiful
  • I found the boy. The boy was lost.
  • I found the boy who was lost
  • I found the lost boy.
  • Sita bought a story book. Geeta bought a story
    book.
  • ???? Sita and Geeta bought a story book.
  • ???? Sita bought a story book and Geeta also
    bought a story book

27
Choice of words (Lexicalization)
  • The bus was in time. The journey was fine. The
    seats were bad.
  • The bus was in perfect time. The journey was
    fantastic. The seats were awful.
  • The bus was in perfect time. The journey was
    fantastic. However, the seats were not that good.

28
General Architecture
29
Component Tasks in NLG
  • Content Planning
  • Macroplanner
  • Document Structuring
  • Sentence Planner Microplanning
  • Aggregation Lexicalization Referring
    Expression

  • Generation
  • Surface Form Realization
  • Linguistic realization Structure Realization

30
A Pipelined Architecture
Microplanning
Text Specification
Surface Realization
31
An Example
  • Consider two assertions
  • has (Hotel_Bliss, food (bad))
  • has (Hotel_Bliss, ambience (good))
  • Content Planning selects information ordering
  • Hotel Bliss has bad food but its ambience is
    good
  • Hotel Bliss has good ambience but its food is
  • good

32
has (Hotel_Bliss, food (bad))
  • Sentence Planning
  • choose syntactic templates
  • choose lexicon
  • bad or awful
  • food or cuisine
  • good or excellent
  • Aggregate the two propositions
  • Generate referring expressions
  • It or this restaurant
  • Ordering
  • A big red ball OR A red big ball

Have Entity Feature
Modifier
Subj
Obj
33
  • Realization
  • correct verb inflection Have ? Has
  • may require noun inflection (not in this
    case)
  • Articles required? Where?
  • Conversion into final string
  • Capitalization and Punctuation

34
Content Planning
  • What to say
  • Data collection
  • Making domain specific inferences
  • Content selection
  • Proposition formulation
  • Each proposition ?? A clause
  • Text structuring
  • Sequential ordering of propositions
  • Specifying Rhetorical Relations

35
Content Planning Approaches
  • Schema based (McKeown 1985)
  • Specify what information, in which order
  • The schema is traversed to generate discourse
    plan
  • Application of operators (similar to Rule Based
    approach) --- Hovy 93
  • The discourse plan is generated dynamically
  • Output is Content Plan Tree

36
Discourse
Detailed view
Group nodes
Demograph
Summary
Name
Age
Care
Blood
Sugar
37
Content Plan
  • Plan Tree Generation
  • Ordering of Group nodes
  • Propositions
  • Rhetorical relations between leaf nodes
  • Paragraph and sentence boundaries

38
Rhetorical Relations
ENABLEMENT
MOTIVATION
MOTIVATION
EVIDENCE
You should ...
Im in ...
You can get ...
The show ...
It got a ...
39
Rhetorical Relations
  • Three basic rhetorical relationships
  • SEQUENCE
  • ELABORATION
  • CONTRAST
  • Others like
  • Justification
  • Inference

40
Nucleus and Satellites
Contrast
I drive my Maruti 800
Elaboration
I love to collect classic cars
My favourite car is Toyota Innova
N
41
Target Text
  • The month was cooler and drier than average, with
    the average number of rain days, but the total
    rain for the year so far is well below average.
    Although there was rain on every day for 8 days
    from 11th to 18th, rainfall amounts were mostly
    small.

42
Document Structuring in WeatherReporter
  • The Message Set
  • MonthlyTempMsg ("cooler than average")
  • MonthlyRainfallMsg ("drier than average")
  • RainyDaysMsg ("average number of rain days")
  • RainSoFarMsg ("well below average")
  • RainSpellMsg ("8 days from 11th to 18th")
  • RainAmountsMsg ("amounts mostly small")

43
Document Structuring in Weather Reporter
MonthlyTmpMsg
44
Some Common RST Relationships
  • Elaboration The satellite presents more details
    about the content of the nucleus
  • Contrast The nuclei presents things, which are
    similar in some respects but different in some
    other relevant way.
  • Multinuclear no distinction bet. N and S
  • Purpose S presents the goal of performing the
    activity presented in the nucleus
  • Condition S presents something that must occur
    before the situation presented in N can occur
  • Result N results from S

45
Planning Approach
Save Document
The system saves the document
Click Save Button
Choose Save option
Type Filename
Select Folder
A dialog box displayed
Dialog box closed
46
Planning Operator
  • Name Expand Purpose
  • Effect
  • (COMPETENT hearer(DO-ACTION ?action))
  • Constraints
  • (AND (get_all_substeps ?action ?subaction)
  • (NOT (singular list ?subaction))
  • Nucleus
  • (COMPETENT hearer (DO-SEQUENCE ?subaction))
  • Satellite
  • (((RST-PURPOSE (INFORM hearer (DO ?action)))

47
Expand Subactions
  • Effect
  • (COMPETENT hearer (DO-SEQUENCE ?actions))
  • Constraints
  • NIL
  • Nucleus
  • (for each ?actions (RST-SEQUENCE
  • (COMPETENT hearer (DO-ACTION ?actions))))
  • Satellites
  • NIL

48
Purpose
Sequence
Choose Folder
Choose Save
Dialog Box Opens
Result
49
Discourse
  • To save a file
  • 1. Choose save option from file menu
  • A dialog box will appear
  • 2. Choose the folder
  • 3. Type the file name
  • 4. Click the Save button
  • The system will save the document

50
Example Content Plan Tree
51
Rhetorical Relations Difficult to infer
  • Johh abused the duck
  • The duck buzzed John
  • John abused the duck that had buzzed him
  • The duck buzzed John who had abused it
  • The duck buzzed John and he abused it
  • John abused the duck and it buzzed him

52
On Clause Aggregation
53
Benefits of Aggregation
  • Conciseness
  • Same information with fewer words
  • Cohesion
  • We want a semantic unit not a jumble of
    disconnected phrases
  • Fluency
  • Less effort to read
  • Unambiguous and acc. to communication conventions

54
Complex interactions
  • Aggregation adds to fluency
  • The patient was admitted on Monday and released
    on Friday.
  • Someone ate apples. Someone ate oranges
  • Someone, who ate apples also ate oranges

55
Aggregation Operators
Category Operators Resources Surface markers
Interpretive Summarization Inference Common sense knowledge Ontology
Referential Ref. expr. Generation Quantified expression Ontology Discourse Each, all both some
Syntactic Paratactic Hypotactic Syntactic rules Lexicon And, with, who, which
Lexical Paraphrasing Lexicon
56
Interpretive
  • John punched Mary
  • Mary kicked John gt John fought with Mary
  • John kicked Mary
  • Not always meaning preserving
  • Note use of Ontology
  • John kicked Mary John punched Mary
  • /gt
  • John fights with Mary

57
Referential Aggregation
  • Reference Expression generation
  • The patient is Mary name.
  • The patient is female gender
  • The patient is 80 years old age.
  • The patient has hypertension med.history
  • The patient is Mary. She is an 80 year old
    female. She has hypertension.

How much info in one sentence?
58
Reference ( Quantification)
  • John is doing well
  • Mary is doing well ? All the patients are
    doing well
  • Note the use of background knowledge
  • The patients leftarm
  • The patients right arm ? Each arm
  • Note the use of Ontology

59
Syntactic Aggregation
  • Paratactic Entities are of equal syntactic
    status
  • Ram likes Sita and Geeta
  • Main operator is co-ordinating conjunction
  • Hypotactic Unequal status
  • NP modified by a PP
  • Ram likes Sita, who is a nurse

60
Lexical Aggregation
  • In hypotactic aggregation, the satellite
    propositions are modified.
  • The Maths score was 99.8
  • 99.8 is a record high score
  • The Maths score was 99.8, a record high score
    (apposition modification)
  • The Maths score was a record high score of 99.8
  • A dog used by police ? A police dog
  • Rise sharply ? shoot
  • Drop sharply ? plunge

61
Rhetorical Relations and Hypotactics
  • Use of cue operators
  • RR Concession
  • He was fine He just had an accident
  • Although he had an accident he was fine
  • RR Evidence
  • My car is not Indian My car is a Toyota
  • My car is not Indian because it is a Toyota
  • RR Elaboration
  • My car is not Indian My car is expensive
  • My expensive car is not Indian

62
Hypotactic Operators
  • If propositions do not share any common entity,
    the operator can simply join using cue phrase
  • NTom is feeling cold SThe window is open
    Cause
  • Tom is feeling cold because the window is open
  • If the linked propositions share common entities,
    the internals of the linked propositions undergo
    modifications
  • N The child stopped hunger S The child ate an
    apple Purpose
  • To stop hunger, the child ate an apple.

63
  • Two stage transformation
  • RR Evidence
  • N Tom was hungry
  • S Tom did not eat dinner
  • Replace Tom in N by he
  • Apply Rule 1
  • Because Tom did not eat dinner, he was hungry

64
Corpus study to Rules Example RR Purpose N
Lift the cover S Install battery
Example
To-infinitive 59.6 To install battery, lift the cover
For-Nominalization 7.5 Lift the cover for battery installation
For-Gerund 2.5 Lift the cover for installing battery
By-pupose 10 Install battery by lifting cover
So-Tat Purpose 8.4 Lift cover so battery can be installed
65
Syntactic constructions for realizing Elaboration
relations
Verbosity M-direction Examples
R-Clause Short Before An apple which weighs 3 oz
Reduced R-Clause Shorter Before An apple weighing 3oz
PP Shorter Before An apple in the basket
Apposition Shortest Before An apple, a small fruit
Prenominalization Shortest After A 3 oz apple
Adjective Shortest After A dark red apple
66
Lexical Constraints
  • Except for R-clause and Reduced R-clause,
    transforming a proposition into an apposition, an
    adjective or a PP requires that the satellite
    proposition be of a specific syntactic type ( a
    noun, an adj or a PP respectively).
  • N Jack is a runner.
  • S Jack is fast.
  • Jack is a fast runner
  • Fast and runner has a possible qualifying
    relationship.
  • Qualia Structure (Pustejovsky 91)

67
Constraints
  • Linear Ordering
  • Paratactic
  • Years 1998,1999 and 2000
  • Not Years 1999, 1998 and 2000
  • Hypotactic
  • Uncommon orderings between premodifiers create
    disfluencies
  • A happy old man ---- An old happy man

68
Linear Ordering and Scope of Modifiers
  • Problem when multiple modifiers modify the same
    noun
  • Decide the order
  • Avoid ambiguity
  • Ms. Jones is a patient of Dr. Smith, undergoing
    heart surgery
  • Old men and women should board first
  • Women and old men should board first

69
Linear Ordering of Modifiers
  • A simplex NP is a maximal noun phrase that
    includes pre-modifiers such as determiners and
    possessives, but not post-nominals such as PPs
    and R-Cls.
  • A POS tagger along with FS grammar can be used to
    extract simples NPs.
  • A morphology module transforms plurals of nouns,
    comparative and superlative adjectives into their
    base form for frequency count.
  • Regular expression filter to remove
    concatenations of NPs
  • Takeover bid last week
  • Metformin 500 milligrams

70
Three stages of subsequent analysis
  • Direct Evidence
  • Modifier sequences are transformed in ordered
    pairs
  • Well known traditional brand name drug
  • Well known lt traditional
  • Well known lt brand name
  • traditional lt brand name
  • Three possibilities
  • A lt B Blt A BA (no order)

71
  • For n modifiers nC2 ordered pairs
  • Form a w X w matrix where w is the number of
    distinct modifiers.
  • Find CountA,B and CountB,A
  • For small corpus binomial distribution of one
    following the other is observed.

72
  • Transitivity
  • Again from corpus
  • A lt B and Blt C ?? A lt C
  • Long, boring and strenuous stretch
  • Long strenuous lecture
  • Clustering Formation of equivalence classes of
    words with same ordering with respect tp other
    premodifiers

73
  • John is a 74 year old hypertensive diabetic white
    male patient with a swollen mass in the left
    groin
  • John is a diabetic male white 74 year old
    hypertensive patient with a red swollen mass in
    the left groin

74
Other Constraints
  • For conjunctions
  • John ate an apple and an orange (NP and NP)
  • John ate in the morning and in the evening (PP
    and PP)
  • X John ate an apple and in the evening (NP and
    PP)
  • Moral Same syntactic category?
  • John and a hammer broke the window ???
  • He is Nobel Prize winner and at the peak of his
    career.
  • Others Adj phrase attachment, PP attachment etc.

75
Conjunctions
76
Three interesting types
  • John ate fish on Monday and rice on Tuesday
    (non-constituent coordination)
  • John ate fish and Bill rice (gapping)
  • Right node raising
  • John caught and Mary killed the spider

77
A Naïve Algorithm
  • Group propositions and order them according to
    similarities
  • 1.I sold English books on Monday
  • 2.I sold Hindi books on Wednesday
  • 3.I sold onion on Monday
  • 4.I sold Bengali books on Monday
  • ((1,3,4),2) OR ((1,4),3,2) OR..

78
  • 2. Identify recurring elements
  • 3. Determine sentence boundary
  • 4. Delete redundant elements

79
Still Funny Scenarios
  • The baker baked. The bread baked.
  • ? The baker and the bread baked.
  • I dont drink. I dont chew tobacco.
  • ? I dont drink and chew tobacco.
  • What should the constraints be?

80
Morphological Synthesis
  • Inflections depending on tense, aspect, mood,
    case, gender, number, person and familiarity.
  • A typical Bengali verb has 63 different inflected
    forms (120 if we consider the causative
    derivations)
  • Exceptions

81
Synthesis Approach
  • Classification of words based on Syllable
    structure 19 classes for Bengali verbs
  • Paradigm tables for each of the classes
  • Table-driven modification of the words
  • Exceptions treated separately.

82

Noun Morphology Synthesis
  • Different rules are used to inflect qualifiers
    and headwords
  • The rule to inflect proper noun as a headword in
    a particular SSU
  • IF (headword type proper noun AND the SSU to
    which the headword belongs kAke AND the last
    character of root word a),
  • THEN
  • Rule1 headword headword ke
  • rAma ? rAmake
  • IF (Verb1verb2 AND the Conjunction Ebong
    AND SSU2 to which the headword belongs
    kakhana AND the last character of root word
    a)
  • THEN
  • Rule1 headword headword a.
  • Rule2 headword headword o.
  • Aaem gfkal bl /K/leClam ybL Aajo /Klb.
  • Headword Aaj o

83

Verb Morphology Synthesis
  • Depends upon TAM option. Category
    Identification Table lookup
  • Category Identification Structure of root verb
    X VC . where X Any Character, V vowel,
    Cconstant and Ø, a, A, oYA .

84
  • Table Look Up
  • The Table Lookup Stage
  • Pr ? Present
  • Pa ? Past
  • iii) Sim ? Simple
  • iv) Per ? Perfect
  • v) Co ? Continuous
  • vi) Ind ? Indicative
  • vii) Neg ? Negation.

85
?Questions?
Write a Comment
User Comments (0)
About PowerShow.com