Title: Causality Knowledge Extraction based on A Single Sentence from Thai Textual Data
1Causality Knowledge Extraction based on A Single
Sentence from Thai Textual Data
- Chaveevan Pechsiri
- Dhurakij Pundij University
- Assoc. Prof. Dr. Asanee Kawtrakul
- NAiST Laboratory, Kasetsart University
- SNLP 2007
- 14 December, 2007
2Outline
- Motivation
- Introduction
- Related work
- Crucial Problems
- System Overview
- Evaluation
- Conclusion
3Motivation
- Most of Knowledge is spread throughout the text.
- Instead of reading huge amount of report, we need
the automatic system of Knowledge Extraction from
text to gain the causality knowledge for
diagnosis problems , decision support or question
answering systems. -
4Introduction
- What is knowledge?
- Knowledge is the awareness and understanding of
facts, truths or information gained in the form
of experience or learning. (Wikipedia
encyclopedia, 2006 ) - The information, understanding, and skills that
you gain through education or experience (Oxford
advanced learners Dictionary, 2000) - Knowledge types (Jana Trnková, Wolfgang
Theilmann,2004) - Orientation knowledge (know what a topic is
about) - Action knowledge (know how)
- Explanation knowledge (know why something is
the way it is) - Reference knowledge (know where to find
additional information).
5Introduction
- What is causality?
- refers to the set of all particular "causal" or
"cause-and-effect" relations (Wikipedia
Encyclopedia http//en.wikipedia.org/wiki/Main_Pa
ge ) - The relationship between something that happens
and the reason for it happening (Oxford advanced
learners Dictionary, 2000)
6Causality Knowledge
- Inter-causal EDU (20)
- If aphids suck sap from plant, leaves will be
yellow and flowers start to drop out. - Plant leaves shrink because the aphids destroy
the plant. - Intra-causal EDU (7)
- Earthquake generates Zunami. (NP1 V NP2)
- Bird Flu is caused by virus H5N1.(NP1 cue NP2)
- Leaves have black spots from bacteria.
- (NP1 V NP2 Prep NP3)
7Related Work
8Causal Verb (linking verb)
9Cccoordinationjunvtion
10Example of using causal verb in Lexico syntactic
pattern
- Earthquake generates Zunami. (causality)
- Fungus in peanut produces alpha toxin.(causality)
- A manufacture produces clothes.(non causality)
11Crucial Problems
- How to identify causality with in one sentence
- Implicit noun phrase as zero anaphora
12How to identify causality
- By using causal verb (linking verb)
- List of causal verbs from Girju, 2002(1)
- ???????????????????????/Fungus in peanut produces
alpha toxin. - Cue phase set (Chang and Choi, 2004)(2)
- ?????H5N1?????????????????????????????
- Bird Flu is caused by virus H5N1.
- General verbinformationpreposition phrase
- Verb preposition phrase
4
13Causal Verb
- General verb information preposition phrase
- General verb ????/be, ??/have, ??????/get,
- Information ???/scar, ???/spot, ???/mark,
???/scratch, ?????/defect, ???/disease.. - Preposition from, with
- NP1 Verb NP2 Prep NP3
- For example
- ????/be ???/disease get disease
- A kid get disease from virus H5N1.
14How to identify causality
- NP1 Verb NP2 Prep NP3
-
- Ex. 1. ???/ Plant ???? is ???/disease ??? /
from ??????? /fungi - 2. ???/ Disease ????/ occurs ???/ from ?????/
virus - 3. ????/ Kid ???/ dies ????/ with
????????????/ the Bird flu disease - 4. ????/ Kid ??????/ gets ?????/ disease ???/
from ?????????????????????????????/ touching the
infected chicken
15Problems of using causal verb
- Verb ambiguity
- Causality
- ?????/Plant leaf ??/has ??????????? /brown
sports ???/from ???????/fungi - ?????/The patient ???/dies ????/with
?????????/cancer - Non causality
- ?????/Plant leaf ??/has ??????????? /brown
sports ???/from ?????/the leaf base - ?????/patient ???/dies ????/with
?????????/suspicion
16Zero Anaphora Problem
- For example
- ???????????? /The Bird flu disease ???? /is
??????????????????? /an important disease . ?
???? /occur ??? / from ????? H5N1/ H5N1 virus.
- where ? is zero anaphora Bird flu disease.
17 Zero Anaphora
?????? ??????????cSuchad Sarataphan ?????????????
???????? Microfilariasis in horse 282-291 HorsesM
icrofilariasis??????????????????? ??????????????
?????? ??? ?????? 2526 ??????????????????????????
?????? 7.27 ?????????? ?????????? 110 ???
?????????????? ? ??????????????????????? ?????????
??? 7 ?? ??????????????????? ???????
???????????????????????????? ??????? ??????????
???????? ???????? ????????? ??????? ??????? ???
??????????????????? ???????? ?????????????????????
????????? ?????? 200 ??? ????? 1
????????????????? ????????????? ????????
???????????? ???????????????????????? ?
?????????? ??????? ?????? 136.8 ??? 148.8 ??????
?????????????? 2.40 ??? 2.80 ??????
???????????? ????????? 10 ?????????? ??????? ????
5 ??./???????????? ????? 3 ?????
???????? 1??????? ????????????????????????????????
???????????????????
?zero anaphora
18System Overview
Corpus Preparation
WordNet, Lexitron, Plant encyclopedia
Text
Causality learning
Learnt model
Causality extraction
Knowledge base
Cause-effect relation
19Corpus Preparation
- Word segmentation (Sudprasert and Kawtrakul, 2003
) - Name entity determination(Chanlekha and
Kawtrakul, 2004 ) - EDU segmentation(Charoensuk and et al.,2005)
- EDU (Elementary Discause Unit) is the minimal
building blocks of a discourse tree. Mann and
Thompson (1988, p. 244) simple sentence, clause
20Corpus Preparation
- Mamually feature annotation (reference to WordNet
and Plant encyclopedia, and Lexitron dictionary)
for learning - ltEDU typecausalitygt
- ltNP1 conceptplant
organ1gt?????lt/NP1gt - ltVerb havegt ??lt/Verbgt
- ltNP2 conceptsymptom1gt
???????????lt/NP2gt - ltPreposition fromgt
???lt/Prepositiongt - ltNP3 conceptfungus1gt
???????lt/NP3gt - lt/EDUgt
21Causality Learning
- ID3 (Mitchell T.M., 1997)
- SVM (Cristianini and Shawe-Taylor, 2000)
22ID3
ID3 uses the statistical property called
information gain as shown in the following with
the entropy measurement to measure the ability of
a given attribute (A e.g. NP1, Verb, NP2,
Preposition, NP3) in separating the collected
examples (S) according to their target
classification.
where the entropy is that it specifies the
minimum number of bits of information needed to
encode the classification of an arbitrary member
of S (Charniak E., 1993), c is the different
values of the target attribute, and pi is the
proportion of S belonging to class i.
23ID3
NP3
pathogen
food poisoning
contraction
prep
prep
prep
from
with
from
verb
verb
verb
be
infect
have
Causality
Causality
Causality
24ID3
-Rule mining by using Weka tool
25ID3
- Rule Generalization Verifying
- There are some rules having the same general
concept which can be combined into one rule as in
the following example - R1 IFltNP1gtltVerbbegtltNP2gtltPrep
- ???/fromgt ltNP3 fungi gt then
causality - R2 IFltNP1gtltVerbbegtltNP2gtltPrep
- ???/fromgt ltNP3 bacteriagt then
causality - R3 IFltNP1gtltVerbbegtltNP2gtltPrep
- ???/fromgt ltNP3pathogen gt then
causality
26ID3
- Verifying rules
- The testing corpus from agricultural and health
news domains of 2000 EDUs contain 102 EDUs of the
specified sentence pattern, which only 87 EDUs
are causality within 20 causal verb rules.
27SVM
The following linear function, f(x), of the
input x (x1xn) assigned to the positive class
if f(x) ? 0, and otherwise to the negative class
if f(x) lt0
(where xi is each of five features as NP1, Verb,
NP2, Preposition, and NP3 of the specified
sentence pattern from the annotated corpus )
28(No Transcript)
29Causality Extraction
- Causality identification
- Use causal verb rules from ID3
- Use weight vectors with the bias from SVM
- Solving zero anaphora
- Using the heuristic rule (Ching-Long Yeh and
Chris Mellish, 1997) -
30Evaluation
- 2000 EDUs from the agricultural and health news
for training. And 2000 EDUs for testing base on
precision and recall for training - The result is then evaluated by experts with max
win voting.
31Evaluation
32Discussion
- The reason that the precision of the extraction
through using SVM is higher than ID3 is that ID3
is based on feature occurrences which will not
effect to SVM - the 73 of recall can be increased if we use a
larger corpus
33Conclusion
- our model will be very beneficial for causal
question answering and causal generalization for
knowledge discovery.
34Future work
35references
- Daniel Marcu. 1997. The Rhetorical Parsing of
Natural Language Texts, The proc. of the 35th
annual meeting of the association for
computational linguistics(ACL97/EACL97),
Madrid, Spain . - Roxana Girju and Dan Moldovan. 2002. Mining
Answers for Question Answering , In proc. of AAI
Symposium on Mining Answers from Texts and
Knowledge Bases - Du-Seong Chang and Key-Sun Choi.. 2004. Causal
Relation Extraction Using Cue Phrase and Lexical
Pair Probabilities, IJCNLP 2004, Hainan Island,
China. - Takashi Inui, K.Inui and Y Matsumoto. 2004.
Acquiring causal knowledge from text using the
connective markers, Journal of the information
processing society of Japan(2004) 45(3) . - Jirawan Chareonsuk, Tana Sukvakree and Aasanee
Kawtrakul. 2005. Elementary Discourse unit
Segmentation for Thai using Discourse Cue and
Syntactic Information, NCSEC 2005,Thailand.
36Thank you