Empirical Methods for AI (and CS) - PowerPoint PPT Presentation

About This Presentation
Title:

Empirical Methods for AI (and CS)

Description:

In addition (in this course at least) empirical means 'wanting to understand ... no need for expensive super-colliders. Control ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 50
Provided by: iang3
Category:

less

Transcript and Presenter's Notes

Title: Empirical Methods for AI (and CS)


1
Empirical Methods for AI (and CS)
  • CS1573 AI Application Development
  • (after a tutorial by Paul Cohen, Ian P. Gent, and
    Toby Walsh)

2
Overview
  • Introduction
  • What are empirical methods?
  • Why use them?
  • Case Study
  • Experiment design
  • Data analysis

3
Empirical Methods for CS
  • Part I Introduction

4
What does empirical mean?
  • Relying on observations, data, experiments
  • Empirical work should complement theoretical work
  • Theories often have holes
  • Theories are suggested by observations
  • Theories are tested by observations
  • Conversely, theories direct our empirical
    attention
  • In addition (in this course at least) empirical
    means wanting to understand behavior of complex
    systems

5
Why We Need Empirical Methods Cohen, 1990 Survey
of 150 AAAI Papers
  • Roughly 60 of the papers gave no evidence that
    the work they described had been tried on more
    than a single example problem.
  • Roughly 80 of the papers made no attempt to
    explain performance, to tell us why it was good
    or bad and under which conditions it might be
    better or worse.
  • Only 16 of the papers offered anything that
    might be interpreted as a question or a
    hypothesis.
  • Theory papers generally had no applications or
    empirical work to support them, empirical papers
    were demonstrations, not experiments, and had no
    underlying theoretical support.
  • The essential synergy between theory and
    empirical work was missing

6
Empirical CS/AI (Theory, not Theorems)
  • Computer programs are formal objects
  • so lets reason about them entirely formally?
  • Two reasons why we cant or wont
  • theorems are hard
  • some questions are empirical in nature
  • e.g. are finite state automata adequate to
    represent the sort of knowledge met in practice?
  • e.g. even though our problem is intractable in
    general, are the instances met in practice easy
    to solve?

7
Empirical CS/AI
  • Treat computer programs as natural objects
  • like fundamental particles, chemicals, living
    organisms
  • Build (approximate) theories about them
  • construct hypotheses
  • e.g. greedy search is important to chess
  • test with empirical experiments
  • e.g. compare search with planning
  • refine hypotheses and modelling assumptions
  • e.g. greediness not important, but search is!

8
Empirical CS/AI
  • Many advantage over other sciences
  • Cost
  • no need for expensive super-colliders
  • Control
  • unlike the real world, we often have complete
    command of the experiment
  • Reproducibility
  • in theory, computers are entirely deterministic
  • Ethics
  • no ethics panels needed before you run experiments

9
Types of hypothesis
  • My search program is better than yours
  • not very helpful beauty competition?
  • Search cost grows exponentially with number of
    variables for this kind of problem
  • better as we can extrapolate to data not yet
    seen?
  • Constraint systems are better at handling
    over-constrained systems, but OR systems are
    better at handling under-constrained systems
  • even better as we can extrapolate to new
    situations?

10
A typical conference conversation
  • What are you up to these days?
  • Im running an experiment to compare the
    Davis-Putnam algorithm with GSAT?
  • Why?
  • I want to know which is faster
  • Why?
  • Lots of people use each of these algorithms
  • How will these people use your result?...

11
Keep in mind the BIG picture
  • What are you up to these days?
  • Im running an experiment to compare the
    Davis-Putnam algorithm with GSAT?
  • Why?
  • I have this hypothesis that neither will dominate
  • What use is this?
  • A portfolio containing both algorithms will be
    more robust than either algorithm on its own

12
Keep in mind the BIG picture
  • ...
  • Why are you doing this?
  • Because many real problems are intractable in
    theory but need to be solved in practice.
  • How does your experiment help?
  • It helps us understand the difference between
    average and worst case results
  • So why is this interesting?
  • Intractability is one of the BIG open questions
    in CS!

13
Why is empirical CS/AI in vogue?
  • Inadequacies of theoretical analysis
  • problems often arent as hard in practice as
    theory predicts in the worst-case
  • average-case analysis is very hard (and often
    based on questionable assumptions)
  • Some spectacular successes
  • phase transition behaviour
  • local search methods
  • theory lagging behind algorithm design

14
Why is empirical CS/AI in vogue?
  • Compute power ever increasing
  • even intractable problems coming into range
  • easy to perform large (and sometimes meaningful)
    experiments
  • Empirical CS/AI perceived to be easier than
    theoretical CS/AI
  • often a false perception as experiments easier to
    mess up than proofs

15
Empirical Methods for CS
  • Part II A Case Study

16
Case Study
  • Scheduling processors on ring network
  • jobs spawned as binary trees
  • KOSO
  • keep one, send one to my left or right
    arbitrarily
  • KOSO
  • keep one, send one to my least heavily loaded
    neighbour

17
Theory
  • On complete binary trees, KOSO is asymptotically
    optimal
  • So KOSO cant be any better?
  • But assumptions unrealistic
  • tree not complete
  • asymptotically not necessarily the same as in
    practice!

18
Lesson Evaluation begins with claimsLesson
Demonstration is good, understanding better
  • Hypothesis (or claim) KOSO takes longer than
    KOSO because KOSO balances loads better
  • The because phrase indicates a hypothesis about
    why it works. This is a better hypothesis than
    the beauty contest demonstration that KOSO beats
    KOSO
  • Experiment design
  • Independent variables KOSO v KOSO, no. of
    processors, no. of jobs, probability(job will
    spawn),
  • Dependent variable time to complete jobs

19
Criticism This experiment design includes no
direct measure of the hypothesized effect
  • Hypothesis KOSO takes longer than KOSO because
    KOSO balances loads better
  • But experiment design includes no direct measure
    of load balancing
  • Independent variables KOSO v KOSO, no. of
    processors, no. of jobs, probability(job will
    spawn),
  • Dependent variable time to complete jobs

20
Lesson The task of empirical work is to explain
variability
Empirical work assumes the variability in a
dependent variable (e.g., run time) is the sum of
causal factors and random noise. Statistical
methods assign parts of this variability to the
factors and the noise.
Number of processors and number of jobs explain
74 of the variance in run time. Algorithm
explains almost none.
21
Lesson Keep the big picture in mind
  • Why are you studying this?
  • Load balancing is important to get good
    performance out of parallel computers
  • Why is this important?
  • Parallel computing promises to tackle many of our
    computational bottlenecks
  • How do we know this? Its in the first paragraph
    of the paper!

22
Empirical Methods for CS
  • Part III Experiment design

23
Experimental Life Cycle
  • Exploration
  • Hypothesis construction
  • Experiment
  • Data analysis
  • Drawing of conclusions

24
Types of experiment designs
  • Manipulation experiment
  • Observation experiment
  • Factorial experiment

25
Manipulation experiment
  • Independent variable, x
  • xidentity of parser, size of dictionary,
  • Dependent variable, y
  • yaccuracy, speed,
  • Hypothesis
  • x influences y
  • Manipulation experiment
  • change x, record y

26
Observation experiment
  • Predictor, x
  • xvolatility of stock prices,
  • Response variable, y
  • yfund performance,
  • Hypothesis
  • x influences y
  • Observation experiment
  • classify according to x, compute y

27
Factorial experiment
  • Several independent variables, xi
  • there may be no simple causal links
  • data may come that way
  • e.g. programs have different languages,
    algorithms, ...
  • Factorial experiment
  • every possible combination of xi considered
  • expensive as its name suggests!

28
Some problem issues
  • Control
  • Ceiling and Floor effects
  • Sampling Biases

29
Control
  • A control is an experiment in which the
    hypothesised variation does not occur
  • so the hypothesized effect should not occur
    either
  • BUT remember
  • placebos cure a large percentage of patients!

30
Control MYCIN case study
  • MYCIN was a medial expert system
  • recommended therapy for blood/meningitis
    infections
  • How to evaluate its recommendations?
  • Shortliffe used
  • 10 sample problems, 8 therapy recommenders
  • 5 faculty, 1 resident, 1 postdoc, 1 student
  • 8 impartial judges gave 1 point per problem
  • max score was 80
  • Mycin 65, faculty 40-60, postdoc 60, resident 45,
    student 30

31
Control MYCIN case study
  • What were controls?
  • Control for judges bias for/against computers
  • judges did not know who recommended each therapy
  • Control for easy problems
  • medical student did badly, so problems not easy
  • Control for our standard being low
  • e.g. random choice should do worse
  • Control for factor of interest
  • e.g. hypothesis in MYCIN that knowledge is
    power
  • have groups with different levels of knowledge

32
Ceiling and Floor Effects
  • Well designed experiments (with good controls)
    can still go wrong
  • What if all our algorithms do particularly well
  • Or they all do badly?
  • Weve got little evidence to choose between them

33
Ceiling and Floor Effects
  • Ceiling effects arise when test problems are
    insufficiently challenging
  • floor effects the opposite, when problems too
    challenging
  • A problem in AI because we often repeatedly use
    the same benchmark sets
  • most benchmarks will lose their challenge
    eventually?
  • but how do we detect this effect?

34
Ceiling Effects machine learning
  • 14 datasets from UCI corpus of benchmarks
  • used as mainstay of ML community
  • Problem is learning classification rules
  • each item is vector of features and a
    classification
  • measure classification accuracy of method (max
    100)
  • Compare C4 with 1R, two competing algorithms

35
Ceiling Effects machine learning
  • DataSet BC CH GL G2 HD HE Mean
  • C4 72 99.2 63.2 74.3 73.6 81.2 ... 85.9
  • 1R 72.5 69.2 56.4 77 78 85.1 ... 83.8
  • Max 72.5 99.2 63.2 77 78 85.1 87.4
  • C4 achieves only about 2 better than 1R
  • Best of the C4/1R achieves 87.4 accuracy
  • We have only weak evidence that C4 better
  • Both methods performing near ceiling of possible
    so comparison hard!

36
Ceiling Effects machine learning
  • In fact 1R only uses one feature (the best one)
  • C4 uses on average 6.6 features
  • 5.6 features buy only about 2 improvement
  • Conclusion?
  • Either real world learning problems are easy (use
    1R)
  • Or we need more challenging datasets
  • We need to be aware of ceiling effects in results

37
Sampling bias
  • Data collection is biased against certain data
  • e.g. teacher who says Girls dont answer maths
    question
  • observation might suggest
  • girls dont answer many questions
  • but that the teacher doesnt ask them many
    questions
  • Experienced AI researchers dont do that, right?

38
Sampling bias Phoenix case study
  • AI system to fight (simulated) forest fires
  • Experiments suggest that wind speed uncorrelated
    with time to put out fire
  • obviously incorrect as high winds spread forest
    fires

39
Sampling bias Phoenix case study
  • Wind Speed vs containment time (max 150 hours)
  • 3 120 55 79 10 140 26 15 110 12 54 10 103
  • 6 78 61 58 81 71 57 21 32 70
  • 9 62 48 21 55 101
  • Whats the problem?

40
Sampling bias Phoenix case study
  • The cut-off of 150 hours introduces sampling bias
  • many high-wind fires get cut off, not many low
    wind
  • On remaining data, there is no correlation
    between wind speed and time (r -0.53)
  • In fact, data shows that
  • a lot of high wind fires take gt 150 hours to
    contain
  • those that dont are similar to low wind fires
  • You wouldnt do this, right?
  • you might if you had automated data analysis.

41
Empirical Methods for CS
  • Part IV Data analysis

42
Kinds of data analysis
  • Exploratory (EDA) looking for patterns in data
  • Statistical inferences from sample data
  • Testing hypotheses
  • Estimating parameters
  • Building mathematical models of datasets
  • Machine learning, data mining
  • We will introduce hypothesis testing and
    computer-intensive methods

43
The logic of hypothesis testing
  • Example toss a coin ten times, observe eight
    heads. Is the coin fair (i.e., what is its long
    run behavior?) and what is your residual
    uncertainty?
  • You say, If the coin were fair, then eight or
    more heads is pretty unlikely, so I think the
    coin isnt fair.
  • Like proof by contradiction Assert the opposite
    (the coin is fair) show that the sample result (
    8 heads) has low probability p, reject the
    assertion, with residual uncertainty related to
    p.
  • Estimate p with a sampling distribution.

44
The logic of hypothesis testing
  • Establish a null hypothesis H0 p .5, the
    coin is fair
  • Establish a statistic r, the number of heads in
    N tosses
  • Figure out the sampling distribution of r given
    H0
  • The sampling distribution will tell you the
    probability p of a result at least as extreme as
    your sample result, r 8
  • If this probability is very low, reject H0 the
    null hypothesis
  • Residual uncertainty is p

45
A common statistical test The Z test for
different means
  • A sample N 25 computer science students has
    mean IQ m135. Are they smarter than average?
  • Population mean is 100 with standard deviation 15
  • The null hypothesis, H0, is that the CS students
    are average, i.e., the mean IQ of the
    population of CS students is 100.
  • What is the probability p of drawing the sample
    if H0 were true? If p small, then H0 probably
    false.
  • Find the sampling distribution of the mean of a
    sample of size 25, from population with mean 100

46
Reject the null hypothesis?
  • Commonly we reject the H0 when the probability of
    obtaining a sample statistic (e.g., mean 135)
    given the null hypothesis is low, say lt .05.
  • A test statistic value, e.g. Z 11.67, recodes
    the sample statistic (mean 135) to make it easy
    to find the probability of sample statistic given
    H0.
  • We find the probabilities by looking them up in
    tables, or statistics packages provide them.
  • For example, Pr(Z 1.67) .05 Pr(Z 1.96)
    .01.
  • Pr(Z 11) is approximately zero, reject H0.

47
Summary of hypothesis testing
  • H0 negates what you want to demonstrate find
    probability p of sample statistic under H0 by
    comparing test statistic to sampling
    distribution if probability is low, reject H0
    with residual uncertainty proportional to p.
  • Example Want to demonstrate that CS graduate
    students are smarter than average. H0 is that
    they are average. t 2.89, p .022
  • Have we proved CS students are smarter? NO!
  • We have only shown that mean 135 is unlikely
    if they arent. We never prove what we want to
    demonstrate, we only reject H0, with residual
    uncertainty.
  • And failing to reject H0 does not prove H0,
    either!

48
Computer-intensive Methods
  • Basic idea Construct sampling distributions by
    simulating on a computer the process of drawing
    samples.
  • Three main methods
  • Monte carlo simulation when one knows population
    parameters
  • Bootstrap when one doesnt
  • Randomization, also assumes nothing about the
    population.
  • Enormous advantage Works for any statistic and
    makes no strong assumptions (e.g., normality)

49
Summary
  • Empirical CS and AI are exacting sciences
  • There are many ways to do experiments wrong
  • We are experts in doing experiments badly
  • As you perform experiments, youll make many
    mistakes
  • Learn from those mistakes, and ours!
Write a Comment
User Comments (0)
About PowerShow.com