CS521 Software Engineering Hypothesis Testing

About This Presentation

Title:

CS521 Software Engineering Hypothesis Testing

Description:

The outcome of an experiment need not be a number, for example, the outcome when ... as the research hypothesis, or the 'hunch' that the investigator wants to test. ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 55

Provided by: sky89

Category:

more less

Transcript and Presenter's Notes

Title: CS521 Software Engineering Hypothesis Testing

1
CS521 Software Engineering Hypothesis Testing
2
Random Variable

The outcome of an experiment need not be a
number, for example, the outcome when a coin is
tossed can be 'heads' or 'tails'. However, we
often want to represent outcomes as numbers.
A random variable is a function that associates a
unique numerical value with every outcome of an
experiment. The value of the random variable will
vary from trial to trial as the experiment is
repeated.
There are two types of random variable - discrete
and continuous.

3
Discrete Random Variables

A discrete random variable is one which may take
on only a countable number of distinct values
such as 0, 1, 2, 3, 4, ... Discrete random
variables are usually (but not necessarily)
counts.
If a random variable can take only a finite
number of distinct values, then it must be
discrete.
Example number of defective light bulbs in a
box of ten.

4
Continuous Random Variable

A continuous random variable is one which takes
an infinite number of possible values. Continuous
random variables are usually measurements.
Examples include height, weight, the amount of
sugar in an orange, the time required to run a
mile.

5
Expected Value

The expected value (or population mean) of a
random variable indicates its average or central
value. It is a useful summary value (a number) of
the variable's distribution.
Stating the expected value gives a general
impression of the behavior of some random
variable without giving full details of its
probability distribution (if it is discrete) or
its probability density function (if it is
continuous).
Two random variables with the same expected value
can have very different distributions. There are
other useful descriptive measures which affect
the shape of the distribution, for example
variance.

6
Expected Value

The expected value of a random variable X is
symbolised by E(X) or µ.
If X is a discrete random variable with possible
values x1, x2, x3, ..., xn, and p(xi) denotes P(X
xi), then the expected value of X is defined
by

7
Expected Value

If X is a continuous random variable with
probability density function f(x), then the
expected value of X is defined by

8
Variance

The (population) variance of a random variable is
a non-negative number which gives an idea of how
widely spread the values of the random variable
are likely to be the larger the variance, the
more scattered the observations on average.

9
Probability Distribution

The probability distribution of a discrete random
variable is a list of probabilities associated
with each of its possible values. It is also
sometimes called the probability function or the
probability mass function.
More formally, the probability distribution of a
discrete random variable X is a function which
gives the probability p(xi) that the random
variable equals xi, for each value xi
p(xi) P(Xxi)

10
An example

Suppose that we want to compare the crime rate in
Portland with the crime rate in the rest of the
country.
Is there more or less crime in Portland than the
national average?

11
An example

First, we start with the hypothesis that the
crime rate on average in Portland is the same as
the national average.
To test our hypothesis, we ask what sample means
would occur if many samples of the same size were
drawn at random from our population if our
hypothesis is true.

12
An example

We can now refer to the sampling distribution of
the mean, drawn from a population whose mean is
the same as the national average, and we compare
our sample mean with those in this sampling
distribution.
If our hypothesis is true, then the distribution
of sample means will be centered about the
national average.

13
An example

Suppose that the relationship between our sample
mean and those of the sampling distribution of
the mean looks like this

Our hypothesized value.
Our obtained value.
14
An example

If so, our sample mean is one that could
reasonably occur if the hypothesis is true, and
we will retain our hypothesis as one that could
be true. (The crime rate of Portland is the same
as the national average.)

15
An example

On the other hand, if the relationship between
our sample mean and those of the sampling
distribution of the mean looks like this

16
An example

Our sample mean is so deviant that it would be
quite unusual to obtain such a value when our
hypothesis is true. In this case, we would
reject our hypothesis and conclude that it is
more likely that the crime rate of Portland is
not the same as the national average.
The population represented by the sample differs
significantly from the comparison population.

17
Null Hypothesis

The hypothesis that we put to the test is called
the null hypothesis, symbolized H0.
The null hypothesis usually states the situation
in which there is no difference (the difference
is null) between populations.

18
Alternative Hypothesis

The alternative hypothesis, symbolized HA, is the
opposite of the null hypothesis.
The alternative hypothesis is also identified as
the research hypothesis, or the hunch that the
investigator wants to test.

19
Null and Alternative Hypotheses

Both H0 and HA are statements about population
parameters, not sample statistics.
A decision to retain the null hypothesis implies
a lack of support for the alternative hypothesis.
A decision to reject the null hypothesis implies
support for the alternative hypothesis.

20
When do we retain and when do we reject the null
hypothesis?

When we draw a random sample from a population,
our obtained value of the sample mean will almost
never exactly equal the mean of our population.
The decision to reject or retain the null
hypothesis depends on the selected criterion for
distinguishing between those sample means that
would be common and those that would be rare if
H0 was true.

21
When do we retain and when do we reject the null
hypothesis?

If the sample mean is so different from what is
expected when H0 is true that its appearance
would be unlikely, H0 should be rejected.
But what degree of rarity of occurrence is so
great that it seems better to reject the null
hypothesis than to retain it?

22
When do we retain and when do we reject the null
hypothesis?

This decision is somewhat arbitrary, but common
research practice is to reject H0 if the sample
mean is so deviant that its probability of
occurrence in random sampling is .05 or less.
Such a criterion is called the level of
significance, symbolized ?.

23
Rejection Regions

For our purposes, we will adopt the .05 level of
significance.
Therefore, we will reject H0 only if our obtained
sample mean is so deviant that it falls in the
upper 2.5 or lower 2.5 of all the possible
sample means that would occur when H0 is true.
The portions of the sampling distribution that
include the values of the mean that lead to
rejection of the null hypothesis are called
rejection regions.
If our sample mean falls in the middle 95 of the
distribution of all possible values of the mean
that could occur when H0 is true, we will retain
the null hypothesis.

24
What sample means would occur if H0 is true?

If it is true, the sampling distribution of the
mean would center on the hypothesized population
mean.
If we assume that the sampling distribution of
the mean approximates a normal curve (and we can,
if our sample size satisfies the central limit
theorem)

25
Critical Values

We can use the normal curve table to calculate
the Z values, called critical values, that
separate the upper 2.5 and lower 2.5 of sample
means from the remainder.

26
An example

Suppose our obtained sample mean of the crime
rate in Portland is a score of 90.
Suppose that the national average is known to be
85, with a standard deviation of 20.
Even if the population mean really is a score of
85, because of random sampling variation we do
not expect the mean of a sample randomly drawn
from a population to be exactly 85 (although it
could be).

27
Using the Sampling Distribution of the Mean to
Determine Probability

The important question is what is the relative
position of the obtained sample mean among all
those that could have been obtained if the
hypothesis is true?
To determine the position of the obtained sample
mean, it must be expressed as a Z score.

28
Z score

In hypothesis testing, you are finding a Z score
of your samples mean on a distribution of means.

29
Z Score Formulas

The method of changing the samples mean to a Z
score.

30
An example

In our study,

31
An example

Our sample mean is 2.5 standard errors of the
mean greater than expected if the null hypothesis
were true.
The value of 2.5 falls in the rejection region,
so we reject H0 and retain HA.
We can conclude that the mean of the population
from which the sample came from is not 85.

32
An example

The crime rate of Portland is, on average,
different from (greater than) other cities of the
country.
Notice that the conclusion is about the
population represented by the sample under study
and not simply the particular sample itself.

33
What if we had used ? .01?

Our sample mean, and our Z value would still be
the same, but the critical values of Z that
separate the regions of rejection would be
different, ? 2.58.
This is a more conservative value (it is harder
to reject the null hypothesis).
Your decision depends on your criterion.

Using an alpha level of .01, you would fail to
reject the null hypothesis.
34
If we retain H0, what can we conclude?

The decision to retain H0 does not mean that it
is likely that H0 is true.
Rather, this decision reflects the fact that we
do not have sufficient evidence to reject the
null hypothesis.
Certain other hypotheses would also have been
retained if tested in the same way.

35
If we retain H0, what can we conclude?

Consider our example where the hypothesized
population mean is 85.
If we had obtained a sample mean of 86, the null
hypothesis would have been retained.
But suppose the hypothesized population mean was
87.
If we had obtained a sample mean of 86, the null
hypothesis would also have been retained.

36
Strength of Decision

Rejecting the null hypothesis means that H0 is
probably false, a strong decision.
Retaining the null hypothesis is a weak decision.

37
Two-tailed Test

The alternative hypothesis states that the
population parameter may be either less than or
greater than the value stated in H0.
The critical region is divided between both tails
of the sampling distribution.

38
Two-tailed Test

This type of test is desirable in most research
situations.
For example, in most cases in which the
performance of a group is compared to a known
standard, it would be of interest to discover
that the group is superior or inferior.

39
One-tailed Test

The alternative hypothesis states that the
population parameter differs from the value
stated in H0 in one particular direction.
The critical region is located only in one tail
of the sampling distribution.

40
One-tailed Test

Upper-tail Critical

Lower-tail Critical

41
One-tailed Test

The advantage of a one-tailed test is that it is
more sensitive to detecting a false hypothesis in
the direction of concern than a two-tailed test.
The major disadvantage of a one-tailed test is
that it precludes any chance of discovering that
reality is just the opposite of what the
alternative hypothesis says.

42
Steps of the Hypothesis Test

State the research question.
State the statistical hypothesis.
Set decision rule.
Calculate the test statistic.
Decide if result is significant.
Interpret result as it relates to your research
question.

43
An example

Robins and John (1997) carried out a study on
narcissism (self-love), comparing people who had
scored high versus low on a narcissism
questionnaire. (An example item was If I ruled
the world it would be a better place.) They
also had other questionnaires, including one that
had an item about how many times the participant
looked in the mirror on a typical day. They
hypothesize that people who scored high on the
narcissism scale look in the mirror significantly
more often than people who did not score high on
the scale. Based on previous research, it is
known that, on average, a person looks in the
mirror 4.8 times per day, with a standard
deviation of 2.6. Taking a sample of 25
narcissistic individuals, they find a mean of 6.3
visits to the mirror per day. Using the .05
level of significance, and assuming the
distribution approximates a normal curve, what
should the researchers conclude?

44
An example

State the research question
Do individuals, who score high on a narcissistic
scale, look at themselves in the mirror
significantly more often than individuals who are
not narcissistic?
State the statistical hypothesis

45
Statistical Hypotheses

Two-tailed Test
One-Tailed Test
Lower-tailed
Upper-tailed

46
An example

Set decision rule

47
An example

Calculate the test statistic

48
An example

Decide if results are significant
Reject H0, 2.88 gt 1.65.
Interpret results as it relates to the
statistical hypothesis
Narcissistic individuals look in the mirror
significantly more often than individuals who are
not narcissistic.

49
Another example

A psychologist is working with people who have
had a particular type of major surgery. The
psychologist proposes that people will recover
from the operation more quickly if friends and
family are in the room with them for the first 48
hours after the operation (based on several other
studies on social support), but acknowledges that
the presence of friends and family may also slow
recovery time, due to the added activity and
possible stress associated with visitors. It is
known that time to recover from this kind of
surgery is normally distributed with a mean of 12
days and a standard deviation of 5 days. The
procedure of having friends and family in the
room for the period after the surgery is done
with 9 randomly selected patients. The patients
recover in an average of 8 days. Using the .01
level of significance, what should the researcher
conclude?

50
Another example

State the research hypothesis
State the statistical hypothesis
Set decision rule
Calculate the test statistic
Decide if results are significant
Interpret results as it relates to the
statistical hypothesis

Do patients who have friends and family with them
following surgery recover more or less quickly
than people who do not?

Retain H0, -2.40 gt -2.58

Patients who have friends and family with them
following surgery do not recover significantly
faster, or slower, than patients who do not have
social support.

51
ASSIGNMENT

Due 2 week
Select a Software Engineering Research Paper
(Journal)
The paper must include an experiment
Create a write up and presentation that outlines
and analyzes the experiment and results.
Cover the following topics (Next Slide)

52
ASSIGNMENT

Definition This is where the study is defined
in terms of problem objectives and goals. The
following questions need to be asked
What is the object of the study?
What is the purpose of the study?
Which effect is studied (quality focus)?
From whose perspective are you viewing the study?
What is the context of the study (e.g. where is
it conducted)? The context defines which
personnel are involved in the experiment and
which objects will be studied.

53
ASSIGNMENT

Planning This is where the details of the
experiment are defined.
Is the experiment off-line or on-line, student or
industry, toy or real, specific or general?
What is the hypothesis and null hypothesis?
What are your variables both independent and
dependent?
How did you select your subjects, simple random
sampling, systematic sampling, stratified random
sampling, convenience sampling, quota sampling?
Does your design use randomization for subjects
and objects, or did you employ blocking or
balancing?
How many factors and how many treatments did you
have? For example if you were investigating new
design methods for producing quality software,
the factor would be the design method and the
treatments are the new and old designs. How you
choose these factors and treatments will define
what statistical analysis can be applied.
What instrumentation was used?
What is the validity evaluation? There are
typically four threats to validity conclusion,
internal, construct, and external Cook79.
Conclusion Validity Is there a significant
statistical relationship between treatment and
outcome?
Internal Validity Does the treatment actually
cause the outcome?
Construct Validity Relationship between theory
and observation. Does the treatment reflect the
construct of the cause and does the outcome
reflect the construct of the effect?
External Validity Here we are concerned with
generalization of the study. How does it apply
outside the scope of our study?

54
ASSIGNMENT

Operation How was the experiment carried out?
How did you obtain participants?
Did you obtain consent, did you protect sensitive
results, did you offer inducements, and was there
any deception?
Did you need to prepare the instrumentation in
any way?
Explain the execution of the experiment? How did
you collect data, what was the environment like?
What scale did you use?
Did you validate the data and did it seem
reasonable?
Analysis and Interpretation In order to draw
valid conclusions you must analyze and interpret
the data.
How did you numerically process and present the
data after the experiment? Did you measure
central tendency, dispersion, and dependency and
how did you display your results?
Did you apply data set reduction and why?
What type of hypothesis testing did you use?
What were the results?