Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data

1 / 32
About This Presentation
Title:

Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data

Description:

Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data ... Knowledge base. Cause-effect. relation. 19/11/49 ... –

Number of Views:162
Avg rating:3.0/5.0
Slides: 33
Provided by: ana83
Category:

less

Transcript and Presenter's Notes

Title: Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data


1
Causality Knowledge Extraction based on A Single
Sentence from Thai Textual Data
  • Chaveevan Pechsiri
  • Dhurakij Pundij University
  • Assoc. Prof. Dr. Asanee Kawtrakul
  • NAiST Laboratory, Kasetsart University
  • SNLP 2007
  • 14 December, 2007

2
Outline
  • Motivation
  • Introduction
  • Related work
  • Crucial Problems
  • System Overview
  • Evaluation
  • Conclusion

3
Motivation
  • Most of Knowledge is spread throughout the text.
  • Instead of reading huge amount of report, we need
    the automatic system of Knowledge Extraction from
    text to gain the causality knowledge for
    diagnosis problems , decision support or question
    answering systems.

4
Introduction
  • What is knowledge?
  • Knowledge is the awareness and understanding of
    facts, truths or information gained in the form
    of experience or learning. (Wikipedia
    encyclopedia, 2006 )
  • The information, understanding, and skills that
    you gain through education or experience (Oxford
    advanced learners Dictionary, 2000)
  • Knowledge types (Jana Trnková, Wolfgang
    Theilmann,2004)
  • Orientation knowledge (know what a topic is
    about)
  • Action knowledge (know how)
  • Explanation knowledge (know why something is
    the way it is)
  • Reference knowledge (know where to find
    additional information).

5
Introduction
  • What is causality?
  • refers to the set of all particular "causal" or
    "cause-and-effect" relations (Wikipedia
    Encyclopedia http//en.wikipedia.org/wiki/Main_Pa
    ge )
  • The relationship between something that happens
    and the reason for it happening (Oxford advanced
    learners Dictionary, 2000)

6
Causality Knowledge
  • Inter-causal EDU (20)
  • If aphids suck sap from plant, leaves will be
    yellow and flowers start to drop out.
  • Plant leaves shrink because the aphids destroy
    the plant.
  • Intra-causal EDU (7)
  • Earthquake generates Zunami. (NP1 V NP2)
  • Bird Flu is caused by virus H5N1.(NP1 cue NP2)
  • Leaves have black spots from bacteria.
  • (NP1 V NP2 Prep NP3)

7
Related Work
8
Causal Verb (linking verb)
9
Cccoordinationjunvtion
10
Example of using causal verb in Lexico syntactic
pattern
  • Earthquake generates Zunami. (causality)
  • Fungus in peanut produces alpha toxin.(causality)
  • A manufacture produces clothes.(non causality)

11
Crucial Problems
  • How to identify causality with in one sentence
  • Implicit noun phrase as zero anaphora

12
How to identify causality
  • By using causal verb (linking verb)
  • List of causal verbs from Girju, 2002(1)
  • ???????????????????????/Fungus in peanut produces
    alpha toxin.
  • Cue phase set (Chang and Choi, 2004)(2)
  • ?????H5N1?????????????????????????????
  • Bird Flu is caused by virus H5N1.
  • General verbinformationpreposition phrase
  • Verb preposition phrase

4
13
Causal Verb
  • General verb information preposition phrase
  • General verb ????/be, ??/have, ??????/get,
  • Information ???/scar, ???/spot, ???/mark,
    ???/scratch, ?????/defect, ???/disease..
  • Preposition from, with
  • NP1 Verb NP2 Prep NP3
  • For example
  • ????/be ???/disease get disease
  • A kid get disease from virus H5N1.

14
How to identify causality
  • NP1 Verb NP2 Prep NP3
  • Ex. 1. ???/ Plant ???? is ???/disease ??? /
    from ??????? /fungi
  • 2. ???/ Disease ????/ occurs ???/ from ?????/
    virus
  • 3. ????/ Kid ???/ dies ????/ with
    ????????????/ the Bird flu disease
  • 4. ????/ Kid ??????/ gets ?????/ disease ???/
    from ?????????????????????????????/ touching the
    infected chicken

15
Problems of using causal verb
  • Verb ambiguity
  • Causality
  • ?????/Plant leaf ??/has ??????????? /brown
    sports ???/from ???????/fungi
  • ?????/The patient ???/dies ????/with
    ?????????/cancer
  • Non causality
  • ?????/Plant leaf ??/has ??????????? /brown
    sports ???/from ?????/the leaf base
  • ?????/patient ???/dies ????/with
    ?????????/suspicion

16
Zero Anaphora Problem
  • For example
  • ???????????? /The Bird flu disease ???? /is
    ??????????????????? /an important disease . ?
    ???? /occur ??? / from ????? H5N1/ H5N1 virus.
  • where ? is zero anaphora Bird flu disease.

17
Zero Anaphora
?????? ??????????cSuchad Sarataphan ?????????????
???????? Microfilariasis in horse 282-291 HorsesM
icrofilariasis??????????????????? ??????????????
?????? ??? ?????? 2526 ??????????????????????????
?????? 7.27 ?????????? ?????????? 110 ???
?????????????? ? ??????????????????????? ?????????
??? 7 ?? ??????????????????? ???????
???????????????????????????? ??????? ??????????
???????? ???????? ????????? ??????? ??????? ???
??????????????????? ???????? ?????????????????????
????????? ?????? 200 ??? ????? 1
????????????????? ????????????? ????????
???????????? ???????????????????????? ?
?????????? ??????? ?????? 136.8 ??? 148.8 ??????
?????????????? 2.40 ??? 2.80 ??????
???????????? ????????? 10 ?????????? ??????? ????
5 ??./???????????? ????? 3 ?????
???????? 1??????? ????????????????????????????????
???????????????????
?zero anaphora
18
System Overview
Corpus Preparation
WordNet, Lexitron, Plant encyclopedia
Text
Causality learning
Learnt model
Causality extraction
Knowledge base
Cause-effect relation
19
Corpus Preparation
  • Word segmentation (Sudprasert and Kawtrakul, 2003
    )
  • Name entity determination(Chanlekha and
    Kawtrakul, 2004 )
  • EDU segmentation(Charoensuk and et al.,2005)
  • EDU (Elementary Discause Unit) is the minimal
    building blocks of a discourse tree. Mann and
    Thompson (1988, p. 244) simple sentence, clause

20
Corpus Preparation
  • Mamually feature annotation (reference to WordNet
    and Plant encyclopedia, and Lexitron dictionary)
    for learning
  • ltEDU typecausalitygt
  • ltNP1 conceptplant
    organ1gt?????lt/NP1gt
  • ltVerb havegt ??lt/Verbgt
  • ltNP2 conceptsymptom1gt
    ???????????lt/NP2gt
  • ltPreposition fromgt
    ???lt/Prepositiongt
  • ltNP3 conceptfungus1gt
    ???????lt/NP3gt
  • lt/EDUgt

21
Causality Learning
  • ID3 (Mitchell T.M., 1997)
  • SVM (Cristianini and Shawe-Taylor, 2000)

22
ID3
ID3 uses the statistical property called
information gain as shown in the following with
the entropy measurement to measure the ability of
a given attribute (A e.g. NP1, Verb, NP2,
Preposition, NP3) in separating the collected
examples (S) according to their target
classification.
where the entropy is that it specifies the
minimum number of bits of information needed to
encode the classification of an arbitrary member
of S (Charniak E., 1993), c is the different
values of the target attribute, and pi is the
proportion of S belonging to class i.
23
ID3
NP3
pathogen
food poisoning
contraction
prep
prep
prep
from
with
from
verb
verb
verb
be
infect
have
Causality
Causality
Causality
24
ID3
-Rule mining by using Weka tool
25
ID3
  • Rule Generalization Verifying
  • There are some rules having the same general
    concept which can be combined into one rule as in
    the following example
  • R1 IFltNP1gtltVerbbegtltNP2gtltPrep
  • ???/fromgt ltNP3 fungi gt then
    causality
  • R2 IFltNP1gtltVerbbegtltNP2gtltPrep
  • ???/fromgt ltNP3 bacteriagt then
    causality
  • R3 IFltNP1gtltVerbbegtltNP2gtltPrep
  • ???/fromgt ltNP3pathogen gt then
    causality

26
ID3
  • Verifying rules
  • The testing corpus from agricultural and health
    news domains of 2000 EDUs contain 102 EDUs of the
    specified sentence pattern, which only 87 EDUs
    are causality within 20 causal verb rules.

27
SVM
The following linear function, f(x), of the
input x (x1xn) assigned to the positive class
if f(x) ? 0, and otherwise to the negative class
if f(x) lt0
(where xi is each of five features as NP1, Verb,
NP2, Preposition, and NP3 of the specified
sentence pattern from the annotated corpus )
28
(No Transcript)
29
Causality Extraction
  • Causality identification
  • Use causal verb rules from ID3
  • Use weight vectors with the bias from SVM
  • Solving zero anaphora
  • Using the heuristic rule (Ching-Long Yeh and
    Chris Mellish, 1997)

30
Evaluation
  • 2000 EDUs from the agricultural and health news
    for training. And 2000 EDUs for testing base on
    precision and recall for training
  • The result is then evaluated by experts with max
    win voting.

31
Evaluation
32
Discussion
  • The reason that the precision of the extraction
    through using SVM is higher than ID3 is that ID3
    is based on feature occurrences which will not
    effect to SVM
  • the 73 of recall can be increased if we use a
    larger corpus

33
Conclusion
  • our model will be very beneficial for causal
    question answering and causal generalization for
    knowledge discovery.

34
Future work
  • Knowledge generalization

35
references
  • Daniel Marcu. 1997. The Rhetorical Parsing of
    Natural Language Texts, The proc. of the 35th
    annual meeting of the association for
    computational linguistics(ACL97/EACL97),
    Madrid, Spain .
  • Roxana Girju and Dan Moldovan. 2002. Mining
    Answers for Question Answering , In proc. of AAI
    Symposium on Mining Answers from Texts and
    Knowledge Bases
  • Du-Seong Chang and Key-Sun Choi.. 2004. Causal
    Relation Extraction Using Cue Phrase and Lexical
    Pair Probabilities, IJCNLP 2004, Hainan Island,
    China.
  • Takashi Inui, K.Inui and Y Matsumoto. 2004.
    Acquiring causal knowledge from text using the
    connective markers, Journal of the information
    processing society of Japan(2004) 45(3) .
  • Jirawan Chareonsuk, Tana Sukvakree and Aasanee
    Kawtrakul. 2005. Elementary Discourse unit
    Segmentation for Thai using Discourse Cue and
    Syntactic Information, NCSEC 2005,Thailand.

36
Thank you
Write a Comment
User Comments (0)
About PowerShow.com