Title: WOW
1WOW
My God, its full of cows! (David Bowman,
2001)
2Can walkover-weight suggest a cow needs attention?
3Join with breeding information
4Position at the outset
- Obstacle No health information!!!
- Suggested Milking order (i.e. where a cow is in
the herd/line-up) is hierarchical and affected by
health issues - Proposed goal to predict a drop in milking order
using WOW and other facts
5Assumptions deck of cards
- Same cows come in for milking each time
- Cows are well-behaved (e.g. arrive in a nice
queue) - Data is in good shape (e.g. one reading per cow
per milking)
6Data problems
- Multiple entries for cows (e.g. four entries for
22719193 in QBH2005) - Delete duplicate weights (SQL problem?)
- Cow skipped and recycled back into order
- Use average if more than one value
7About a quarter of the data are zeroes
8zero problems
- Differentiate between a missing cow, a missing
weight and a zero weight - Ignore missing cows
- Cow skipped and recycled back into order
- Time-based interpolation
- Can be problematic if cow has been missing for a
while - Add flag to indicate weight was guessed
9other issues in data preparation
- Change milking date to milk index
- Change birthdate to age in months
- Change parturition date to days since last calved
- Additional derivatives
- milking index - cows position in milk order
- ?-index change in index for a cow over various
time periods (1, 3 and 7 days) - mu-weight average weight over varying-length
periods (3, 7, 14, 21 and 28 milkings) - ?-mu-weight change in index for a cow (1, 3,
and 7 days)
10Does change in milk order correlate to WOW?
11Correlation coefficients QBH2006 (dense)
- WOW to index 0.12
- WOW to 14-day mu-weight 0.93
- Index to 10-day mu-weight 0.14
- 3-day ?-order to ?-weight 0.045
123-day ?-order and 3-day ?-weight
13Predict change in milking order
- Use M5P to predict how the milking order will
change for a cow at the next milking - Approx. 205,000 QBH2006 samples (with fewer than
5/25 missing attributes) - 2/3 training 1/3 testing
14Re-running took too long but youve all seen
it before, where accuracy was 51.89
(discrimination 0.527) and the model tree was
hugely ugly (65 nodes, 33 leaves). Also tried
predicting cows index as decile and as ratio to
herdsize.
15Cows position (index) as ratio to herdsize
16Cow index vs. herd size
17Where to? .
- Data must still be scrubbed so that milking order
makes sense (if milking order is going to be
relevant) - Perhaps cow order needs to be described in
completely different terms (e.g. cow buddies) - Easy visualization of herds/cows/breeds/dates/tren
ds is needed - this segued into another area of the project ..
18Visualization tools (alpha and beta)
19(No Transcript)
20In the meantime health data is obtained
21Can WOW predict onset of illness?
- Combine original attributes and derivatives with
health judgments - Cows with unknown health are considered healthy
- Need equal number of positive and negative
instances
22Health data becomes available
23Not so much health data
- 1613 recorded instances of health
- 913 different cows with health info
- 2540 cows with milking info
- 788 milked cows with health data
- 7 broad categories of illness
- Calving disorder
- Metabolic disorder
- Udder disorder (only one with 50 in herd)
- Reproductive disorder
- Lameness
- Infectious diseases
- Other ailments
24Data sparseness
- QBH2006
- 75 instances out of 324,291 have health
- 63 udder disorder
- 10 metabolic disorder
- 2 lameness
- Only .002 positives ? will never be isolated ?
must subsample negatives - Random selection of 75 negatives ? data
sparseness ? over-fitting likely
25Data sparseness
- QBH2006
- 36 cows have illness at some time, so just learn
those? - 11,966 records for those cows, 76 of which have
illness (still - Random selection of 1 as negatives (about 120)
26(No Transcript)
27Refinements to approach
- QBH2006
- Restrict target objective to UDDER DISORDER
- Randomly select equal number of negatives from
cows who have health problem at some point - goal differentiate between healthy and
unhealthy state
28(No Transcript)
29(No Transcript)
30(No Transcript)
31Detecting mastitis amidst random normal cows
- QBH2006
- Restrict learning objective to UDDER DISORDER
- Randomly select equal number of negatives from
all cows that have been milked (63,63-)
32(No Transcript)
33When is a cow sick?
- So far, attempted to predict health label at
point of milking, but .. - when was the health label attached? before,
during or after the current milking? - Goal predict whether cow needs attention at the
next milking (i.e. time series)
34(No Transcript)
35(No Transcript)
36 Summary Correctly Classified Instances
90 70.3125 Incorrectly
Classified Instances 38
29.6875 Kappa statistic
0.4026 Mean absolute error
0.3446 Root mean squared error
0.4532 Relative absolute error
68.8933 Root relative squared error
90.5974 Total Number of Instances
128 Detailed Accuracy By Class TP
Rate FP Rate Precision Recall F-Measure
ROC Area Class 0.508 0.108 0.821
0.508 0.627 0.707 UDDER DISORDER
0.892 0.492 0.652 0.892 0.753
0.707 NONE Confusion Matrix a b
7 58 b NONE
37(No Transcript)
38Agenda
- Replace quantified attributes with simpler (e.g.
boolean, nominal) ones - Characterise exceptions
- Below average weight for cow/herd/breed/age
- Dropped decile/50 in order
- Broad statistical measures
- How many std.devs. from mean
- z-score (probability of variation)
- Choose negative instances more carefully (select
fewer interpolates) - Spend more time with people who know cows