Text Mining in Animal Health Surveillance - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Text Mining in Animal Health Surveillance

Description:

Lindsay Mclarty. Food Safety Division. Alberta Agriculture Food And ... Purebred (y/n), breed, age, sex, BCS, PM (y/n) Diseased, Distressed, Down, Dead, Neuro ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 39
Provided by: informat940
Category:

less

Transcript and Presenter's Notes

Title: Text Mining in Animal Health Surveillance


1
Text Mining in Animal Health Surveillance
  • John Berezowski
  • Clarissa Snyder
  • Lindsay Mclarty
  • Food Safety Division
  • Alberta Agriculture Food And Rural Development

2
Text Mining In Public Health
  • Knowledge management
  • Classification of journal articles to manage and
    search of databases
  • Classification of hospital records to allow data
    mining of hospital databases to discover
    knowledge
  • Classification of medical records for real time
    surveillance
  • Free text emergency room chief complaints
    classified into syndromes eg GI or Influenza like

3
Purpose
  • Canada-Alberta BSE Surveillance Program
  • CABSESP
  • Alberta Veterinarians participate in BSE
    surveillance
  • Submit cattle samples for BSE testing
  • Dead or euthanized
  • Examine cattle prior to sampling
  • Provide data about farmers and animals tested
  • Purpose maximize information about cattle tested
  • Especially why cattle were sick/dead/sampled
  • Assist CFIA to identify Clinical Suspects

4
Purpose
  • Large sample (July 04 - July 06)
  • 35,720 Alberta cattle tested by AAFRD
  • Another 25,000(/-) tested by the CFIA
  • 9,117 farms
  • 141 veterinary clinics (293 veterinarians)
  • Purpose evaluate utility of BSE submission form
    data for other surveillance purposes

5
Submission Form Data
  • Farmer ID, date, location, number on farm
  • Purebred (y/n), breed, age, sex, BCS, PM (y/n)
  • Diseased, Distressed, Down, Dead, Neuro
  • Clinical signs in free text format
  • Presumptive diagnosis in free text format

6
Example Submission
  • Clinical Signs Cow was in dry lot. Went off
    feed, coughing and labored breathing
  • Presumptive Diagnosis PM findings- traumatic
    pericarditis and abscess from hardware between
    reticulum and diaphragm
  • Need tools (Text Mining) to extract information
    from free text fields

7
Text Mining Definition
  • Based on data mining definitions
  • Knowledge discovery in text
  • Semi or automated discovery of trends and
    patterns across large volumes of text
  • Computer applications that aim to aid in making
    sense of large volumes of text

8
Text Mining Our Context
  • Classify cattle with respect to certain concepts
  • Etiologies Johnes, AIP, hepatic lipidosis,
    LDA, IBR, unknown, etc.
  • Descriptors acute, chronic, emaciated, lame,
    autolyzed, blind, ataxic, etc.
  • Clinical PresentationSyndromes respiratory, GI,
    repro etc
  • Use classifications to better describe the cattle
    sampled and look for associations or trends
    within the samples

9
Named Entity Recognition
  • Identify terms in text
  • -Term textual representation of a concept
  • Classify terms
  • -Noun vs verb vs adjective,preposition, etc.
  • -Etiology vs descriptors animal (pregnant) vs
    clinical sign (chronic)
  • Map terms to concepts in an ontology
  • -Associate each term with one or more concepts

Bleeding
Concept of hemorrhage
Bled
Hemorrhage
10
Problems With Our Data
  • No suitable ontology
  • Whats an ontology?
  • A model that links concept labels to their
    textual representations and defines or describes
    the relationships between concepts
  • Machine readable descriptions of concepts and
    their relationships
  • Examples Dictionaries, SNOMED-SNOVET

11
Problems With Our Data
  • Terms are formal (vet/med) unusual
  • Nephritis, peritonitis, cancer eye, lump
    jaw, corkscrew claw, downer, fatty
    liver, hardware, found dead
  • Specific to food animal practitioners.

12
Problems With Our Data
  • Term Variation
  • A single concept is expressed in a number of
    different ways (synonyms)
  • Probability of two experts using the same term to
    refer to the same concept is less than 201
  • Arthritis arthritis, arthritic, osteoarthritis,
    polyarthritis, septic-arthritis
  • 1Grefenstette G. 1994

13
Problems With Our Data
  • Term Ambiguity
  • The same term is used to refer to multiple
    concepts
  • Multiple meanings for the same term
  • Boated nutritional (feedlot, pasture), or
    bloated abdomen (perforated ulcer)
  • Prolapse vagina, uterus, rectum, vaginal fat,
    intestinal

14
Problems With Our Data
  • No sentence structure
  • Old age, arthritis, no teeth
  • Stifle, bilateral, degenerative, arthritis
  • Pelvic injury, post calving, crippled
  • Down, tumor on R shoulder, losing condition

15
Build Our Ontology
  • From the text fields on the submission forms
  • Designed to meet our classification needs
  • Identify Potential Clinical Suspects
  • Classify BSE submissions into clinical syndromes

16
Clinical Suspect
Refractory To Treatment
Alive
Yes
Yes
Progressive Neuro Signs
Progressive Behavior Change
OR
Clinical Suspect
Yes
Over 30 Months
Rule Outs
Yes
No
Alive AND (Refractory to tx) AND (Progressive
Behavior Change OR Progressive Neuro Change) AND
(No Rules Outs) AND (Over 30 months of Age)
Clinical Suspect
17
Clinical Suspect
Refractory To Treatment
Alive
Yes
Yes
Progressive Neuro Signs
Progressive Behavior Change
OR
Clinical Suspect
Yes
Over 30 Months
Rule Outs
Yes
No
Alive AND (Refractory to tx) AND (Progressive
Behavior Change OR Progressive Neuro Change) AND
(No Rules Outs) AND (Over 30 months of Age)
Clinical Suspect
18
Ontology
  • Chronic (refractory to Tx)
  • Neurologic
  • Behavioral
  • Rule outs
  • Lame Skin/Ocular/Mammary
  • Cardiovascular Sudden Death
  • GI Infectious Dz
  • Repro Edema/Swelling/Neoplasia
  • Respiratory Trauma
  • Urologic Anorexia/Wt loss

19
Method
  • Text Mining Software
  • WordStat and SimStat (Provalis Research,
    Quebec City, PQ)
  • Spell checked text fields
  • Identified all words in the text fields
  • 292,537 words in total, 7,266 unique
  • Manually sorted words into ontology categories

20
Chronic
  • ADVANCED DOWNHIL
  • CHONIC DURATION
  • CHRINIC AWHILE
  • CHRONCI POOR_DOER
  • CRONIC DECLIN
  • DBILIT EMACIAT
  • DAYS_AGO

21
Neurological
  • Ataxia
  • Neurological
  • Paresis/Paralysis
  • Hyperesthesia
  • Hypermetria
  • Locomotor deficits

22
Neurological
  • Ataxia
  • ATAX, ATXIA, ATXIC, ATACHIA, ATAXIA, TAXIA,
    etc
  • CNS
  • CN, MENINGITIS, MENINGOMA , etc
  • Neurological
  • CONVULS, HEAD_PRESS, HEPATOENCEPHALOPATHY,
    NURO, NEUR, etc
  • Paresis/Paralysis
  • PARLAYSIS, PARLYSIS, PARYALYZED, PARAPARESIS,
    PAREISIS, PARES, PARETIC, etc

23
Behavioral
  • Behavioral
  • Hyperexcitable

24
Behavioral
  • Behavioral
  • EHAV, APPREHENS, AVOID, BALKING, BAWLING,
    BELIGER, BELLIGER, BELLOW, BIZARRE,
    COMPULSIVELY, CRAZY, DELIROUS etc
  • Hyperexcitable
  • ANXIETY, ANXIOUS, CHARG, CHASE, EXCITEABLE,
    HYPERALERT, HYPEREXC, HYPEREXCITABLE,
    HYPERSENSITIV, IRRITA, etc.

25
Example Submission
  • Clinical Signs Cow was in dry lot. Went off
    feed, coughing and labored breathing
  • Presumptive Diagnosis PM findings- traumatic
    pericarditis and abscess from hardware between
    reticulum and diaphragm

26
Classifying Submissions
  • Cow was in dry lot. Went off feed, coughing and
    labored breathing

Anorexia
Respiratory
27
Classifying Submissions
  • PM findings- traumatic pericarditis and abscess
    from hardware between reticulum and diaphragm

GI
Cardiovascular
Trauma
28
Classified Submissions
N 35,721
29
Clinical Suspects
30
Clinical Suspect Examples
31
Veterinary Practice Surveillance
  • Veterinary Practice Surveillance (VPS)
  • Cattle practitioners submit data about about
    cattle to AAFRD daily via a restricted access
    website
  • Practitioners classify sick cattle by commodity
    (cow-calf, dairy etc), age and syndrome (12)
  • Large sample
  • 26,016 Submissions (Aug 05 Dec 06)
  • 5,081 farms
  • 31 veterinary clinics

32
Submissions per day
Sept 2005 to July 2006
33
Respiratory Syndrome
VPS Cattle greater than 30 months of age
34
Clostridium hemolyticum
VPS 75 cases, BSE 157 cases
35
Utility ?
  • Classifying/identifying High Risk
  • Generalize with caution (no prevalence)
  • Sampling bias
  • Misclassification
  • For each classification estimate
  • Se and Sp of veterinarians
  • Se and Sp of text classifier

36
Utility ?
  • But
  • Large sample
  • Disease importance or trends over time and space
  • Clostridium hemolyticum
  • Events syndromic, unknown, emerging
  • Establish normal patterns to identify unusual
    events
  • Respond/investigate
  • Access for targeted surveillance

37
Questions?
  • Our Team
  • Clarissa Snyder
  • Lindsay McLarty
  • John Berezowski
  • Contact us
  • john.berezowski_at_gov.ab.ca
Write a Comment
User Comments (0)
About PowerShow.com