The%20Descent%20of%20Hierarchy,%20and%20Selection%20in%20Relational%20Semantics* - PowerPoint PPT Presentation

About This Presentation
Title:

The%20Descent%20of%20Hierarchy,%20and%20Selection%20in%20Relational%20Semantics*

Description:

Disease in Anatomy? Using the CPs for classification ... A01 M01.898. Classification Decision Levels. Anatomy: 250 CPs. 187 (75%) remain first level ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: The%20Descent%20of%20Hierarchy,%20and%20Selection%20in%20Relational%20Semantics*


1
The Descent of Hierarchy, and Selection in
RelationalSemantics
  • Barbara Rosario, Marti Hearst, Charles Fillmore
  • UC Berkeley

with apologies to Charles Darwin
2
Noun Compounds (NCs)
  • Technical text is rich with NCs
  • Open-labeled long-term study of the
    subcutaneous sumatriptan efficacy and
    tolerability in acute migraine treatment.
  • Any sequence of nouns that itself functions as a
    noun
  • asthma hospitalizations
  • health care personnel hand wash

3
NCs 3 computational tasks
  • Identification
  • Syntactic analysis (attachments)
  • Baseline headache frequency
  • Tension headache patient
  • Our Goal Semantic analysis
  • Headache treatment ? treatment for
    headache
  • Corticosteroid treatment ? treatment that uses

    corticosteroid

4
Descent of Hierarchy
  • Idea
  • Use the top levels of a lexical hierarchy to
    identify semantic relations
  • Hypothesis
  • A particular semantic relation holds between all
    2-word NCs that can be categorized by a lexical
    category pair.

5
Outline
  • Related work
  • Linguistic motivation
  • Lexical Hierarchy MeSH
  • Labeling NC relations
  • Method and results
  • Discussion of ambiguity

6
Related work (Semantic analysis of NCs)
  • Rule-based
  • Finin (1980)
  • Detailed AI analysis, hand-coded
  • Vanderwende (1994)
  • automatically extracts semantic information from
    an on-line dictionary, manipulates a set of
    handwritten rules. 13 classes,
    52 accuracy
  • Probabilistic
  • Lauer (1995)
  • probabilistic model, 8 classes, 47 accuracy
  • Lapata (2000)
  • classifies nominalizations into subject/object.
    2 classes, 80 accuracy

7
Related work (Semantic analysis of NCs)
  • Lexical Hierarchy
  • Barrett et al. (2001)
  • WordNet, heuristics to classify a NC given the
    similarity to a known NC
  • Rosario and Hearst (2001)
  • MeSH, Neural Network. 18 classes, 60 accuracy
  • Relations pre-defined

8
Linguistic Motivation
  • Semantics of the NCs head-modifier relationship
  • Head noun has argument structure
  • Meaning of the head noun determines what kinds of
    things can be done to it, what it is made of,
    what it is a part of

9
Linguistic Motivation (cont.)
  • Material Cutlery ? Made of
  • steel knife, plastic fork, wooden spoon  
  • Food Cutlery ? Used on
  • meat knife, dessert spoon, salad fork 
  • Profession Cutlery ? Used by
  • chef's knife, butcher's knife

10
Outline
  • Related work
  • Linguistic motivation
  • Lexical Hierarchy MeSH
  • Labeling NC relations
  • Method and results
  • Discussion of ambiguity

11
The lexical Hierarchy MeSH
  • Tree Structures
  • 1. Anatomy A
  • 2. Organisms B
  • 3. Diseases C
  • 4. Chemicals and Drugs D
  • 5. Analytical, Diagnostic and Therapeutic
    Techniques and Equipment E
  • 6. Psychiatry and Psychology F
  • 7. Biological Sciences G
  • 8. Physical Sciences H
  • 9. Anthropology, Education, Sociology and
    Social Phenomena I
  • 10. Technology and Food and Beverages J
  • 11. Humanities K
  • 12. Information Science L
  • 13. Persons M
  • 14. Health Care N
  • 15. Geographic Locations Z

12
The lexical Hierarchy MeSH
  • 1. Anatomy A Body Regions A01
  • 2. B
    Musculoskeletal System A02
  • 3. C Digestive
    System A03
  • 4. D Respiratory
    System A04
  • 5. E Urogenital
    System A05
  • 6. F
  • 7. G
  • 8. Physical Sciences H
  • 9. I
  • 10. J
  • 11. K
  • 12. L
  • 13. M

13
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H
  • 9. I
  • 10. J
  • 11. K
  • 12. L
  • 13. M

14
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
  • 9. I
    Astronomy
  • 10. J
    Nature
  • 11. K
    Time
  • 12. L
    Weights and Measures
  • 13. M .

15
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
    Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures
  • 13. M .

16
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
    Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures Calibration
  • 13. M .
    Metric
    System


  • Reference Standard

17
Descending the Hierarchy
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H Electronics
    Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures Calibration
  • 13. M .
    Metric
    System


  • Reference Standard

Homogeneous
Heterogeneous
18
Mapping Nouns to MeSH Concepts
  • headache recurrence
  • C23.888.592.612.441 C23.550.291.937
  • headache pain
  • C23.888.592.612.441 G11.561.796.444
  • breast cancer cells
  • A01.236 C04 A11

19
Levels of Description
  • headache pain
  • Level 0 C.23 G.11
  • Level 1 C23.888 G11.561
  • Level 1 C23.888.592 G11.561.796
  • Original C23.888.592.612.441 G11.561.796.444

20
Outline
  • Related work
  • Linguistic motivation
  • Lexical Hierarchy MeSH
  • Labeling NC relations
  • Method and results
  • Discussion of ambiguity

21
Descent of Hierarchy
  • Idea
  • Words falling in homogeneous MeSH subhierarchies
    behave similarly with respect to relation
    assignment
  • Hypothesis
  • A particular semantic relation holds between all
    2-word NCs that can be categorized by a MeSH
    category pairs

22
Grouping the NCs
  • CP A02 C04 (Musculoskeletal System, Neoplasms)
  • skull tumors, bone cysts, bone metastases, skull
    osteosarcoma
  • CP C04 M01 (Neoplasms, Person)
  • leukemia survivor, lymphoma patients, cancer
    physician, cancer nurses

23
Distribution of Category Pairs
24
Collection
  • 70,000 NCs extracted from titles and abstracts
    of Medline
  • 2,627 CPs at level 0 (with at least 10 unique
    NCs)
  • We analyzed
  • 250 CPs with Anatomy (A)
  • 21 CPs with Natural Science (H01)
  • 3 CPs with Neoplasm (C04)
  • This represents 10 of total CPs and 20 of total
    NCs

25

Classification Method
  • For each CP
  • Divide its NCs into training-testing sets
  • Training inspect NCs by hand
  • Start from level 0 0
  • While NCs are not all similar
  • descend one level of the hierarchy
  • Repeat until all NCs for that CP are similar

26
Using the CPs for classification
  • CP A02 C04 (Musculoskeletal System, Neoplasms)
  • skull tumors, bone cysts, bone metastases, skull
    osteosarcoma

27
Using the CPs for classification
  • CP A02 C04 (Musculoskeletal System, Neoplasms)
  • skull tumors, bone cysts, bone metastases, skull
    osteosarcoma
  • Similar NCs
  • All NCs under the CP A02 C04 have the same
    semantic relationship
  • Location of disease? Disease in Anatomy?

28
Using the CPs for classification
  • CP A02 C04 (Musculoskeletal System, Neoplasms)
  • skull tumors, bone cysts, bone metastases, skull
    osteosarcoma
  • Similar NCs
  • All NCs under the CP A02 C04 have the same
    semantic relationship
  • Location of disease? Disease in Anatomy?
  • Add CP A02 C04 to the list of classification
    decisions

Classification decisions A02 C04
29
Using the CPs for classification
  • CP B06 B06 (Plants, Plants)
  • eucalyptus trees, apple fruits, rice grains,
    potato plants

Classification decisions A02 C04
30
Using the CPs for classification
  • CP B06 B06 (Plants, Plants)
  • eucalyptus trees, apple fruits, rice grains,
    potato plants
  • Similar
  • Same relationship
  • Add CP B06 B06

Classification decisions A02 C04 B06 B06
31
Using the CPs for classification
  • CP C04 M01 (Neoplasms, Person)
  • leukemia survivor, lymphoma patients, cancer
    physician, cancer nurses
  • Person afflicted by Disease? Person who treat
    Disease?
  • Too different!
  • Second noun needs to be more specific Descend
    one level for the second noun Person

Classification decisions A02 C04 B06 B06
32
Using the CPs for classification
  • CP C04 M01 (Neoplasm, Person)
  • leukemia survivor, lymphoma patients, cancer
    physician, cancer nurses
  • ? Too different!
  • CP C04 M01.643 (Neoplasms, Patients)
  • leukemia survivor, lymphoma patients
  • Person afflicted by Disease
  • CP C04 M01.526 (Neoplasms, Occupational Groups)
  • cancer physician, cancer nurses
  • Person who treat Disease
  • OK

33
Classification Decisions
  • A02 C04
  • B06 B06
  • C04 M01
  • C04 M01.643
  • C04 M01.526
  • A01 H01
  • A01 H01.770
  • A01 H01.671
  • A01 H01.671.538
  • A01 H01.671.868
  • A01 M01
  • A01 M01.643
  • A01 M01.526
  • A01 M01.898

34
Classification Decisions Relations (future
work)
  • A02 C04 ? Location of Disease
  • B06 B06 ? Kind of Plants
  • C04 M01
  • C04 M01.643 ? Person afflicted by Disease
  • C04 M01.526 ? Person who treats Disease
  • A01 H01
  • A01 H01.770
  • A01 H01.671
  • A01 H01.671.538
  • A01 H01.671.868
  • A01 M01
  • A01 M01.643
  • A01 M01.526
  • A01 M01.898

35
Classification Decisions Relations (future
work)
  • A02 C04 ? Location of Disease
  • B06 B06 ? Kind of Plants
  • C04 M01
  • C04 M01.643 ? Person afflicted by Disease
  • C04 M01.526 ? Person who treats Disease
  • A01 H01
  • A01 H01.770
  • A01 H01.671
  • A01 H01.671.538
  • A01 H01.671.868
  • A01 M01
  • A01 M01.643 ? Person afflicted by Disease
  • A01 M01.526
  • A01 M01.898

36
Classification Decision Levels
  • Anatomy 250 CPs
  • 187 (75) remain first level
  • 56 (22) descend one level
  • 7 (3) descend two levels
  • Natural Science (H01) 21 CPs
  • 1 (4) remain first level
  • 8 (39) descend one level
  • 12 (57) descend two levels
  • Neoplasms (C04) 3 CPs
  • 3 (100) descend one level

37
Evaluation
  • Test the decisions on testing set
  • Count how many NCs that fall in the groups
    defined in the classification decisions are
    similar to each other
  • Accuracy
  • Anatomy 91 accurate
  • Natural Science 79
  • Neoplasm 100
  • Total Accuracy 90.8
  • Generalization our 415 classification decisions
    cover 46,000 possible CP pairs

38
Outline
  • Related work
  • Linguistic motivation
  • Lexical Hierarchy MeSH
  • Labeling NC relations
  • Method and results
  • Discussion of ambiguity

39
Ambiguity Two Types
  • Lexical ambiguity
  • mortality
  • state of being mortal
  • death rate
  • Relationship ambiguity
  • bacteria mortality
  • death of bacteria
  • death caused by bacteria

40
Lexical Ambiguity vs. Multiple MeSH Senses
  • Lexical ambiguity different from multiple MeSH
    senses
  • Ex Mortality has 4 senses
  • Public Health (G) ? Data Collection ? Vital
    Statistics  ? Mortality
  • Investigative Techniques (E) ? Data Collection ?
    Vital Statistics  ? Mortality
  • Information Science (L) ? Data Collection ? Vital
    Statistics  ? Mortality
  • Population Characteristics (N) ? Demography ?
    Vital Statistics  ? Mortality
  • On average, there are 1.5 MeSH senses per word
    for the nouns in our collection

41
Four Cases
Single MeSH senses
Multiple MeSH senses
Only one possible relationship abdomen
radiography, aciclovir treatment
Only one possible relationship alcoholism
treatment
Multiple relationships hospital databases,
education efforts, kidney metabolism
Multiple relationships bacteria mortality
Ambiguity of relationship
42
Four Cases
Single MeSH senses
Multiple MeSH senses
Only one possible relationship abdomen
radiography, aciclovir treatment
Only one possible relationship alcoholism
treatment
Multiple relationships bacteria mortality
Multiple relationships hospital databases,
education efforts, kidney metabolism
Most problematic cases
Ambiguity of relationship
but rare!
43
Conclusions
  • Very simple method for assigning semantic
    relations to two-word technical NCs
  • 90.8 accuracy
  • Grouping the NCs with respect to their semantic
    descriptors
  • Lexical resource (MeSH) useful for this task
  • Use the upper levels of the lexical hierarchy for
    an accurate classification, reducing therefore
    the space of the problem

44
Future work
  • Analyze full spectrum of hierarchy
  • NCs with gt 2 terms
  • growth hormone deficiency
  • Other syntactic structures
  • Non-biomedical words
  • Other ontologies (e.g.,WordNet)?

45
And given enough data
  • skull character
  • jaw depression
  • nose resuscitation
  • cadaver motion

46
Thanks!For more informationhttp//bailando.
sims.berkeley.edu/lindi/
Write a Comment
User Comments (0)
About PowerShow.com