Title: Flight Test and Statistics
1Flight Test and Statistics
- PRESENTED BY
- Richard Duprey
- Director, FAA Certification Programs
- National Test Pilot School
- Mojave, California
2Flight Test and StatisticsIf you want to be
absolutely certain you are right, you cant say
you know anything.
3Flight Test and Statistics Overview
- Background on National Test Pilot School
- Coverage of Statistics
- Scope - six hours of academics
- Detail
- Use of statistics in flight test
- Types of questions we try to answer
4NTPS Background
- Private non-profit
- Grants Master Science
- Only civilian school of its kind
- SETP equivalent to USAF and Navy Test Pilot
Schools - Offers variety of courses (Fixed Wing and
Helicopters) - Professional - 1 year
- Introductory
- Performance and Flying Qualities Testing
- Systems Testing
- Operational Test and Evaluation
- NVG
- FAA Test Pilot / FTE initial and recurrent
training
5Data Analysis
0
z
6Tunnel in the Sky
7Data Analysis - Hour 1
- Types of Errors
- Types of Data
- Elementary Probability
- Classical Probability
- Experimental Probability
- Axioms
- Examples
8Introduction
- Flight testing involves data collection
- time to climb
- fuel flow for range estimates
- qualitative flying qualities ratings
- INS drift rate
- Landing and Take-off data
- Weapon effectiveness
- All of these experimental observations have
inaccuracies - Understanding these errors, their sources, and
developing methods to minimize their effect is
crucial to good flight testing
9Types of Errors
- There are two very different types of errors
- systemic errors and random errors
- Systemic errors
- repeatable errors
- caused by flawed measuring process
- ex measuring with an 11 inch ruler or airspeed
indicator corrections - Random errors
- not repeatable and usually small
- caused by unobserved changes in the experimental
situation - errors by observer - reading airspeed indicator
- unpredictable variations - small voltage
fluctuations causing fuel counter errors - cant be eliminated but typically distributed
about a well defined distribution
10Types of Data
- There are four types of numerical data
- NOMINAL DATA
- numerical in name only - say an aircraft
configuration - 1 gear down, 2 gear up, 3 slats extended
- normal arithmetic processes not applicable
- 3 gt1 or 3-12 are not valid relationships
- ORDINAL DATA
- contains information about rank order only
- 1 C-150, 2 B-1, 3 F-15
- in terms of max speed 3gt1 is valid, but not
3-12
11Types of Data
- There are four types of numerical data
(continued) - INTERVAL DATA
- contains rank and difference information - ex
temperature in degrees Fahrenheit - 30, 45, 60 at different times, 15 deg. difference
- zero point arbitrary, so 60o F is not twice 30oF
- RATIO DATA
- all arithmetic processes apply
- most flight test data falls into this category
- Can say that a 1000 pound per hour fuel flow is 4
times greater than 250 PPH
12Probability and Flight Test
- Quantitative analysis of random errors of
measurement in flight testing must rely on
probability theory - Goal
- Student to understand what technique is
appropriate and limitations on the results
13Elementary Probability
- The probability of event A occurring is the
fraction of the total times that we expect A to
occur -
- Where - P(A) is the probability of A
occurring - - na is the number of times we
expect A to occur - - N is the total number of
attempts or trials
14Elementary Probability
- From this definition, P(A) must always be between
0 and 1 - if A always happens, na N and P(A) 1
- if A never happens, na 0 and P(A) 0
- In order to determine P(A) we can take two
different approaches - make predictions based on foreknowledge (a
priori) - conduct experiments (a posteriori)
15Classical (a priori) Probability
- If it is true that
- every single trial leads to one of a finite
number of outcomes - and, every possible outcome is equally likely
- Then,
- na is the number of ways that A can happen
- N is the total number of possible outcomes
- For example
- six-sided die implies six possible outcomes N
6 - if A is getting a 6 on one roll, na 1
- P(A) 1/6 0.1667
16Second Example
- What is the probability of getting two heads when
we toss two fair coins? - There are four possible outcomes (N 4)
- (H,H) (H,T) (T,H) (T,T)
- na 1 since only one of the possible outcomes
results in two heads (H,H) - Thus P(A) 1/4 0.25
17Classical (a priori) Probability
- Approach instructive
- Generally not applicable to flight test where
- Possible outcomes infinite
- Each possible outcome not equally likely
- Leads us to second approach
18Experimental (a posteriori) Probability
- Experimental probability is defined as
- Where
- - nA obs is the number of times we observe A
- Versus . number of times we expect A to occur
- - Nobs is the number of trials
19Experimental Example
- If the probability of getting heads on a single
toss of a coin is determined experimentally, we
might get
1.0
Porb
(heads)
0.5
0
norb
1000
100
1
10
20Probability Axioms
- Probability Theory can be used to describe
relationships between events
21Probability Axioms
- Three probability axioms are easily justified as
opposed to proven - P(not A) 1 - P(A)
- Probability of something happening has to be one
- P(A or B) P(A) P(B)
- P(H or T) 0.5 0.5 1 for a single coin
- P(A and B) P(A) x P(B)
- P(T and T) 0.5 x 0.5 0.25 for two coins
- same answer we got when examining all possible
outcomes - The last two axioms require that
- each outcome is independent
- A occurring doesnt affect probability of A or B
occurring - each outcome is mutually exclusive
- Only one can occur in a single trial
22Example
- Problem
- Based on test data, 95 of the time an F-4 will
successfully make an approach-end barrier
engagement on an icy runway - what is the probability that at least one of a
flight of four F-4s will miss? - Solution
- P (1 or more miss) 1 - P(all engage)
- Probability that at least one will miss is the
complement of the probability that all will
engage - P (all engage) P(1st success) P(2nd )
P(3rd) P(4th) - 0.95 0.95 0.95 0.95 0.954
0.81 - Thus,
- P (1 or more miss) 1 - 0.81 0.19
23Example
- Problem
- What is the probability of getting 7 or 11 on a
single roll of a pair of dice? - Solution
- Since getting 7 or 11 are independent, mutually
exclusive events, we can say - P (7 or 11) P (7) P (11)
- N 62 36
- n7 6
- (6, 1) (1, 6) (5, 2) (2, 5) (4, 3) (3, 4)
- n11 2
- (6, 5) (5, 6)
- Thus,
- P (7) 6/36, P (11) 2/36
- P (7 or 11) 6/36 2/36 0.222
24Data Analysis - Hour 2
- Populations and Samples
- Measures of Central Tendency
- Dispersion
- Probability Distributions
- Discrete
- Continuous
- Cumulative
25Population Samples
- A population is all possible observations
- Many populations are infinite
- A pair of dice can be rolled indefinitely
- Population of F-117 weapons deliveries is all the
possible drops it could make in its lifetime - Some populations are limited
- Votes by registered Republicans
- A sample is any subset of a population
- For example
- 100 rolls of a pair of dice
- Bomb scores for 100 weapon delivery sorties
26Population Constructs
- Constructing a population
- Must impose assumptions
- Homogenous
- Independent
- Random
27Sample Requirements
- Homogeneous
- the data must come from one population only
- DC-10 take-off data shouldnt be used with MD-11
- Independent
- selecting one data point must not affect
subsequent probabilities - selecting and removing a heart from a deck of
cards changes the probability of drawing another
heart - DC-10 landing 75 feet past touchdown aim point on
one landing doesnt change probability that next
landing will miss by same distance (or any
distance) - Random
- equal probability of selecting any member of
population - using a member of a population with a bias would
be non-random - F-16 with boresight error would cause a bias in
downrange miss distance
28Measures of Central Tendency
- Given homogenous, independent, random sample,
need to describe the contents of that sample - Measure steel rod diameter with a micrometer -
would get several different answers - Tighten the micrometer
- Dust particles on the rod
- Reading scale on micrometer
- What to do with answers that are different?
29Measures of Central Tendency
- There are three common measures of central
tendency - Mean (arithmetic average) - most commonly used
- Mode
- most common value in the sample
- there may be more than one mode
- Median
- middle value
- for an even-numbered sample, average the two
middle values - Dangers ........
30Dispersion
- Just reporting the mean as the answer can be very
misleading - Consider the following two samples, both with a
mean of 100 (and same median as well) - Sample 1 99.9, 100, 100.1
- Sample 2 0.1, 100, 199.9
- We also need to report how much the data
generally differs from the mean value
31Deviation
- We define deviation as the difference between the
ith data point and the mean - Averaging the deviations does not help
32Mean Deviation
- Since there as many deviations above and below
the mean, we could average the absolute values of
deviations
33Standard Deviation
- While the mean deviation can be used, the
standard deviation s is a more common measure of
dispersion - versus
- The square of the standard deviation, s2, is
called the variance
34Notation
- Normally, we use Greek letters to denote
statistics for populations - m for population mean
- s2 for population variance
- And we use Roman letters for sample statistics
- for sample mean
- s2 for sample variance
35Sample Standard Deviation
- One other difference exists between s and s
- The sample standard deviation has the sum of the
squares divided by N - 1 versus N - Mathematically, this is due to a loss of one
degree of freedom - The effect is to increase the standard deviation
slightly - Difference decreases as sample gets larger
36Flight Test Example - PA28 Takeoff Distance
- Two data points eliminated - wrong configuration,
improper technique - Data adjusted for standard weight (2150 lbs.),
runway slope (GPS), temperature, pressure,
airspeed/altimeter corrections - Technique, rotate at 65, liftoff at 70, maintain
75 until 50 feet AGL
37Probability Distributions
- Statistical applications requires understanding
of the characteristics of the data obtained - Probability distributions gives us such
understanding
38Probability Distributions
- To understand probability distributions, consider
the problem of tossing 2 coins - Let n represent the number of heads for a single
toss of both coins - Then the probabilities of getting n 0, 1, or 2
can be calculated - for n 0, P(0) 0.25
- for n 1, P(1) 0.5
- for n 2, P(2) 0.25
39Discrete Distributions
- We can present the data as a bar graph
40Empirical Distributions
- In flight test, we are concerned with empirical
distributions versus theoretical in the coin
example - If we collect data on landing errors
41Continuous Distributions
- If we get more and more data, and make the
intervals smaller, our histogram approaches a
continuous curve - Continuous Probability Distribution of Touchdown
Miss Distance - Cant be interpreted same way as the previous
discrete distribution
42Continuous Distributions
- Height of curve above a point is not the
probability of x having that point value - Any one point on the x-axis represents a non-zero
point on the curve - But the probability associated with that single
point must be zero, since there are an infinite
number of points on the x-axis - We can meaningfully talk only about the
probability of being between two points a and b
on the x-axis
43Probability as Area Under Curve
- The probability of getting a result between a and
b is rep-resented by the area under the
probability distribution curve between a and b
f (x)
P(a x b)
x
44Cumulative Probability Distribution
- A cumulative probability distribution gives the
probability that x is less than or equal to some
value, a - Relative probability of aircraft landing miss
distances could be displayed in the following
cumulative distribution
1.0
0.95
f (x)
0.5
x
xT
45Data Analysis - Hour 3
- Special Probability Distributions
- Binomial
- Normal
- Students t
- Chi squared
46Binomial Distribution
- The binomial is a discrete distribution
- It tells us the probability of getting n
successes in N trials given the probability (p)
of a single success - Limiting cases
- if n N, then obviously P(N) pN
- if n 0, then P(0) (1 - p)N
- or, letting q 1 - p, P(0) qN
- For 0 lt n lt N, the possible number of
combinations of success and failure gives
47Binomial Distribution -flight test ex.
- Two flight control systems are equally desirable
- What is probability that 6 out of 8 pilots would
prefer system A over B? - If A and B are truly equally good, probability of
pilot picking A over B is 0.5 (Pq 0.5) - Probability of 6 pilots picking A over B is
- 0.109
- There is only a 11 probability that this would
happen. If it did, it would mean that your
initial assumptions about the two flight control
systems was in error
48Binomial Flt. Test Example
- If p q 0.5, then for N 8, the binomial
distribution would be and from the figure, P(2)
is about 11
49Normal Distribution
- The normal distribution is a continuous
probability distribution based on the binomial - SINGLE MOST IMPORTANT DISTRIBUTION IN FLIGHT TEST
ANALYSIS - Any deviation from a mean value is assumed to be
composed of multiples of elemental errors evenly
distributed - The mathematical derivation is left as an exercise
50Normal Distribution
- Graphically, it can be seen that x m gives the
maximum value and x m s are the two points of
inflection on the curve
f (x)
x
m
ms
m-s
51Normal Distribution
- Thus the probability that x lies between some
value a and b is given by - Major problem - cannot be solved explicitly
- numerical techniques are required
- tables could be used, but different tables would
be required for each m and s.
52Standard Normal Distribution
- By using a substitution of variables
- We can use tables for a normal distribution where
the mean is zero and the deviation is one - Thus
- Becomes
- Mean of zero and a standard deviation of one
53Standardized Normal Distribution
99.7
95
68
f (z)
2.5
13.5
34
2.5
34
13.5
z
54Examples - cruise performance
- Cruise performance test flown 40 times
- Mean fuel used was 8,000 pounds
- Standard deviation was found to be 500 pounds
- Find probability that on the next sortie, we will
use between 7000 and 8200 pounds - Given m 8000, s 500
- find the probability that 7000 lt x lt 8200
- From table 0.6554-0.0228 0.6326
- 63 Probability that fuel used would be within
the specified range
55Students t Distribution
- Problem To use the normal distribution we had
to know the population mean and standard
deviation - Flight Test - dont normally know the population
- just have sample - The difference between sample and population mean
is described by the statistic
56Students t vs n
- Different t distributions must be tabulated for
each value of n - For large n, the t-distribution approaches the
standard normal distribution - use normal
distribution when n 30
n 10
n 2
t
57t - Flight Test Examples
- B-33 landing distance example
58Chi- Squared (c2 ) Distribution
- Just as the sample mean may differ from the
population mean, we should expect a difference in
the variances - The difference is distributed according to
59c2 vs Sample Size
f (c) 2
n 1
n 4
n 10
c2
60c2 Examples
- Find c2 for 95th percentile (11.1)
- one-tailed
- 5 degrees of freedom
- Find c2 for 95th percentile (0.831,12.80)
- two-tailed
- 5 degrees of freedom
- Find the median value of c2 (27.3)
- 28 degrees of freedom
61Data Analysis - Hour 4
- Confidence Limits
- Intervals for mean and variance
- Hypothesis Testing
- Null and alternate hypotheses
- Tests on mean and variance
62Confidence Limits
- In practice, we take a sample from a population
such as Take-off distance - Report it as if it were the true answer
- Subsequent tests will differ - sample
mean/variance will differ from true population - Can be considered sufficiently accurate if we
- Standardize test method and conditions
- Take sufficient samples
- Quantitative methods (confidence intervals) exist
to determine how certain we are that we have the
correct answer
63Central Limit Theorem
- Given a population with mean m, and variance s2,
then the distribution of successive sample means,
from samples of n observations, approaches a
normal distribution with mean m, and variance s2/n
64Central Limit Theorem
Sample size n Þ
x
x
- Regardless of original Distribution of A, the
distribution of the means will be approximately
normal - gets better as n increased - Mean of the means will be the same as the mean of
A - Variance of means function of variance of A
divided by n
65Confidence Interval for Mean
- If we take samples of size n, the means of
multiple tests (okay samples) will be normally
distributed
66Confidence Interval - Means
- If z comes from one of our samples
- or, using the central limit theorem
- Thus
67Confidence Interval - Means
- Thus (1 - a) percent of the time, the true
population mean m, will be within a certain range
about the sample mean - The range of values is the interval
- And (1 - a) is the confidence level
68Example - flight test
- Find 95 confidence interval for F-100 engine
thrust given - n 50 engines tested
- mean thrust 22,700 lbs
- s 500 lbs
- At 95, a 0.05, Z 1- a/2 1.96
- ? 22,700 /- 1.96 (
) - 22,561 lt ? lt 22,839
- At 99, a 0.01, Z 1- a/2 2.58
- ? 22,700 /- 2.58 (
) - 22,518lt ? lt 22,882
- Observations
- Interval widens for increased certainty
- Had to use s as an estimate for ?, legitimate
for n gt30
69Small Sample Confidence Intervals
- Some flight tests involved repeated numerous
test points, most do not - But when n lt30, we must substitute t for z
- For example, if our earlier problem were based on
only a sample of 5, what would the 95 confidence
interval be?
70Example - flight test
- Find 95 confidence interval for F-100 engine
thrust given - n 5 engines tested
- mean thrust 22,700 lbs
- s 500 lbs
- At 95, a/2 0.025, ? 4, t 4, 0.975 2.78
-
- ? 22,700 /- 2.78 ( )
-
- 22,078 lt ? lt 23,321
- vs. 22,561 lt ? lt 22,839 for 95 with ? 50
- vs. 22,518 lt ? lt 22,882 for 99 with ?
50 - Had to use s as an estimate for ?, legitimate
for n gt30
71Confidence Interval for Variance
- Similar to intervals for means, the confidence
interval for variance is based on the c2
statistic - For example, find the 95 confidence interval
where n 6, s 2
72Confidence Interval for Variance
- At 95, a/2 0.025, 1- a/2 0.975, v 5, s 2
-
- gtgtgt
- Large band due to small sample size, if n 18,
interval would be smaller -
73Hypothesis Testing
- Instead of just using data to estimate of some
parameter, we hypothesize an answer and then use
data to judge reasonableness - Truth can be known with certainty only if we
examine the entire population - Example
- assume a coin is fair (hypothesis)
- toss the coin 100 times
- if results are
- 48 heads, conclude coin is fair
- 35 heads, conclude coin is not fair
74Null Hypothesis
- Acceptance of a statistical hypothesis
- result of insufficient evidence to reject it
- doesnt necessarily mean that it is true
- Thus, it is important to carefully select initial
hypothesis (the null hypothesis - H0 ) - selected for purposes of rejecting it called
the null hypothesis - if we dont gather enough data we must accept the
null hypothesis - Formulated so that in case of insufficient data,
we return to the status quo or safe conclusion - Examples of null hypothesis
- the defendant is innocent
- the new RADAR is no better than the old
- the MTBF of a new part is no better than the old
75Alternate Hypothesis
- Since we are trying to negate the null hypothesis
(H0) with data, the alternate hypothesis (H1)
must be defined -- H0 must be opposite of H1 - Examples
- 1. H0 m 15 H1 m ¹ 15
- 2. H0 p ³ 0.9 H1 p lt 0.9
- 3. Lock-on range of new radar is better than old
76Types of Errors
- A Type I error
- rejecting null hypothesis when it is true
- chance variation of fair coin gives 35/100 heads
- probability is denoted as a (the level of
significance) - A Type II error
- accepting null hypothesis when it is false
- 43/100 concluded as fair when P(A) 0.4
- probability is denoted as b (the power of the
test) - We want small a
- as a decreases, b increases (fixed sample size)
- Large b implies we stay with the status quo, H0
more frequently than we should - a more
acceptable error - to decrease both , increase sample size
77Hypothesis Testing
- Step One
- Form null and alternate hypothesis
- Step Two
- Choose level of significance (a)
- Define areas of acceptance and rejection (one or
two tailed) - Step Three
- Collect data and compare to expectations
- Step Four
- Accept or reject the null hypothesis
78Hypothesis TestingTwo Tailed
- Some tests - interested in extremes in either
direction - Two Tailed
- Example Burn times on an ejection seat rocket
motor - Too short - dont clear aircraft
- Too long - impose too many gs on pilot
- Form hypothesis of the form
- H0 m m0 H1 m ¹ m0
- Reject H0 whenever sample produce results too
low or high - Not the usual for flight test - usually deal with
One Tailed
79Hypothesis Flight Test Examples Two Tailed
- Early Testing of F-19 bombing system for 30º dive
angles gave - Cross range error were normally distributed
- Mean error of 20 ft and a standard deviation of 3
feet. - After a flight control modification to solve a
high AOA flying qualities problem, it was found - Sample mean cross range error for nine bombs was
22 feet. - Has the mean changed at the 0.05 level of
significance?
80Hypothesis TestingTwo Tailed
- Step One
- Form null and alternate hypothesis
- H0 m 20 (status quo) H1 m ¹ 20
- Step Two
- Choose level of significance (a) 0.05 (given)
- Define areas of acceptance and rejection (one or
two tailed) - (a) 0.05 would be divided into two tails -
hi/lo - extreme values in either direction would indicate
change in m - ? not changed significantly from unmodified
system
81Hypothesis TestingTwo Tailed
- Step Three
- Collect data and compare to expectations
- Step Four
- Accept or reject the null hypothesis
82Step 4 - accept or reject
Reject
Reject
a 2
a 0.025 2
Accept
z
- Since z 2 which is gt 1.96
- Conclude with 95 confidence to reject null
hypothesis - Mean cross range bombing error has changed due to
flight control modification
83Hypothesis TestingOne Tailed
- Most flight tests - interested in extremes in
only one direction - One Tailed - small sample, ? unknown
- Example Does aircraft satisfy contractual range
requirements - Only care if distance is shorter than specified
- Form hypothesis of the form
- H0 m ? m0 H1 m ? m0
- Or
- H0 m ? m0 H1 m ?m0
- Reject H0 whenever sample produce results
extreme in one direction
84Hypothesis Flight Test Examples One Tail
- Contract fuel climb requirements
- Use less than 1500 pounds in climb from Sea Level
to 20,000 feet - Test results
- Nine climbs average of 1600 lbs
- Sample standard deviation of 200lbs.
- Do we penalize the contractor?
85Hypothesis TestingOne Tailed
- Step One
- Form null and alternate hypothesis
- H0 m ? 1500 (until proven guilty) H1 m ?
1500 - Step Two
- Choose (a) 0.05 for level of significance
- (a) 0.01 reserved for safety of flight
questions - Define areas of acceptance and rejection (one or
two tailed) - one tailed - contract not met only if fuel used
was on the high side
86Hypothesis TestingOne Tailed
- Step Three
- Collect data and compare to expectations
- Step Four
- Accept or reject the null hypothesis
87Step 4 - accept or reject
Reject
a 0.05
Accept
z
- Since t 1.5 which is lt 1.867
- Conclude with 95 confidence to accept null
hypothesis - Contractor has met climb fuel requirements
- Put another way
- Dont have data _at_95 confidence level to show
contractor failed to meet specs
88Hypothesis Test ExamplesVariance
- Four steps still valid here
- Substitute chi-squared for z or t
- Example on variance
- The contract states the standard deviation of
miss distances for particular weapon system
delivery mode must not exceed 10 meters at 90
confidence. - In ten test runs we get s 12 meters.
- Is the contractor in compliance?
89Hypothesis TestingOne Tailed Variance
- Step One
- Form null and alternate hypothesis
- H0 ? ? 10 H1 ? ? 10
- Step Two
- (a) 0.10 was specified
- smaller ?s good gtgtgt implies one sided test
- Extremely large ?s will nullify H0
90Hypothesis TestingOne Tailed Variance
- Step Three
- Collect data and compare to expectations
- Step Four
- Accept or reject the null hypothesis
- Since 13 lt 14.7, accept H0 that ? ? 10 Meters
- Cant conclude contractor has failed to meet spec
91Data Analysis - Hour 5
- Tests for non- normal distributions
- Sample size
- Error Analysis
92Parametric vs. Nonparametric
- Non-parametric tests make no assumption about
population distribution - Everything so far --- assumed normal
- These tests less useful when used on normal
distributions require a larger sample size to
give us same info from the test - Use goodness of fit tests to determine
distribution type - Normal use methods already describe
- Otherwise, use non- parametric
- Three non-parametric tests useful in flight test
93Nonparametric Tests
- Three nonparametric tests well use are
- Rank Sum Test
- also U test, Wilcoxon test, and Mann-Whitney test
- Sign Test
- can be applied to ordinal data
- Signed Rank Test
- combination of sign and rank sum tests
- All test the null hypothesis that two different
samples come from the same population - assumes
both are equivalent - Calculates statistics from the two samples
- Determines probability --- decide if original
assumption correct
94Rank Sum TestU Test or Mann Whitney
- The method (based on binominal distribution)
consists of - Rank order all data from each sample
- Assign rank values to each data point
- average rank for repeated data values
- Compute the sum of the ranks for each sample (R1,
R2) - Calculate the U statistic for each sample (n
sample size) - Compare the smaller U to the critical value in
reference - If U lt critical value, reject H0 (i.e. ?1 ?2 )
95Rank Sum Example Radar Flight Test
- The target detection range (nm) of two radars was
- System 1 9, 10, 11, 14, 15, 16, 20
- System 2 4, 5, 5, 6, 7, 8, 12, 13, 17
- Is there a difference between the two systems at
90 confidence?
96Rank Sum Example
- Rank order all scores and assign rank values
- R1 78912131416 79
- R2 12.52.5456101115 57
- Calculate U1, U2
-
97Rank Sum Flight Test Ex.
- Compare smaller U (12 in this case) with
critical values for - 0.10 n1 7 n2 9 Ucr 15
- Since U lt Ucr
- Reject null hypothesis that two radars have
the same performance with 90 confidence
98Sign Test
- Require gt paired observations of two samples with
a better than eval - Can be used on ordinal data, such as pilots
preferring system A or B - Pilot preferring system A over B is same as B
over A - The probability of system A being preferred over
system B, x times in N tests is just - But if H0 is AB, then p q .5, and
99Sign Test
- But f(x) is just the probability for one discrete
point, such as 3 of 8 pilots preferring A over B,
and we need the whole tail - Thus (i.e. sum)
100Sign Test Example Modified Flight Control System
- Suppose 10 pilots evaluate handling qualities of
two different sets of control laws during powered
lift approaches - The results are
- 7 prefer system B
- 2 prefer system A
- 1 had no preference
- Should we switch to the new control laws?
101Sign Test Example Evaluation of new flight
control system laws
- Null hypothesis is that both systems (old and
new) are equally desirable - Choose 0.5 level of significance since SOF not an
issue - Calculate probability of 0, 1 or 2 pilots
choosing system A if there were really no
difference - If probability is less than level of
significance, reject H0 - Conclude B is better than A
102Sign Test Example Evaluation of new flight
control system laws
- Can only be 91 sure that B is really better than
A - Not enough need 95 to justify added expense of
System A - Thus, accept H0 no significant difference
between A and B
103Signed Rank Test
- Combines elements of both the Sign Test and the
Rank Sum Test - That is, the Sign Test can be made more powerful
if there is some indication of how much one
system was preferred over another - Method
- Rank differences by absolute magnitude
- Sum the positive and negative ranks (W, W-)
- Compare the smaller W with critical values in
reference - Reject H0 if W lt Wcr
104Signed Rank Example
- If ten pilots who evaluated two competing systems
gave them a Cooper Harper rating on a scale of 1
to 10 - Pilot System A System B Difference
- 1 3 1 2
- 2 5 2 3
- 3 3 4 -1
- 4 4 3 1
- 5 3 3 0
- 6 4 2 2
- 7 4 1 3
- 8 2 1 1
- 9 3 1 2
- 10 1 2 -1
105Signed Rank Example
- Ranking differences by absolute magnitude,
ignoring zero difference
106Signed Rank Example
- Summing positive and negative ranks
- W 2.5 2.5 6 6 6 8.8 8.5
40.0 - W- 2.5 2.5 5.0
- Using ? 0.05, WCR 8 (one tailed criteria)
- Since 5 lt 8 (WCR ), can reject H0
- There is a difference between A and B with 95
confidence
107Sample Size
- One of the most significant aspects of statistics
for flight testing is to determine how much you
need to test - Too few data points will result in poor
conclusions or recommendations - Too many data points will waste limited resources
- Two approaches for determining sample size
- Sample size when accuracy is the driving factor
- An approach for determining significant
differences between means - Tradeoffs
108Accuracy Driven
- Required to determine a population statistic such
as takeoff distance within some accuracy 10 - Concept of confidence interval can be used to
determine required number of sample points - Remember the confidence interval of the mean
- But is the error, thus
109Accuracy Example How Many Sorties Required to
Determine T/O Distance?
- System Program Office wants us to determine
Takeoff distance within 10 during the test
program - Historically we find the standard deviation for
similar aircraft to be about 20 of the mean - We need to be 95 confident of our answer
- How many data points should we plan?
110How Many Sorties Required to Determine T/O
Distance?
- z0.975 1.96 for 95 confidence
- ? 0.2 ? historical is 20 of
the mean - Error /- 0.1? 10 error
- Tests required (?)
- 16 Takeoffs would be required
- Check to see if assumption about standard
deviation remains reasonable (test hypothesis on
variance) during testing
111General Approach for Determining Significant
Differences Between Means
- For the general problem of whether or not a
system meets a specification or if their is a
significant difference between two systems, the
approach is more complex - The difference between paired samples (d) from
two populations will have some distribution - If the two populations are the same, the mean of
the ds will be zero - If they are not the same, the mean will be
non-zero
112Determining Significant Differences Between Means
- If the difference between the population means is
d1, then test results above and below a d of xc
will give - Test result giving mean difference above xc
- Populations differ in their means with level of
significance ? - Test result below xc
- Not a difference when in fact there was with
probability ß
f (d)
a
b
d
d1minimum significant difference
xc
113Determining Significant Differences Between Means
- Move xc to right, reduce ? but increase ? etc.
- Only to reduce both is to increase sample size
- The sample size needed to determine the
difference between two populations is a function
of a, b, d1, s1, and s2,
114General Approach Weapons System Delivery
Accuracy - example
- How many data points are required to determine if
a system meets the specification for a weapon
delivery accuracy of 5 mils? - We need
- a normally set it at 0.10, 0.05, or 0.01 (0.01
is usually reserved for critical safety-of-flight
issues) - use 0.05 here - b set this larger than a, typically 0.1 or 0.2 -
use 0.1 here - d1 the least difference considered significant -
use 1 mil here - s1 and s2 these come from testing (initially
from historical data) - note that s for a specification is zero
- assume 3 mils for s1 here (i.e results from
previous test)
115General Approach Weapons System Delivery
Accuracy - example
- How many data points are required to determine if
a system meets the specification for a weapon
delivery accuracy of 5 mils? - 77 Test points required - probably not feasible -
must look at trade-offs - How significant is it if we change ? from 0.10 to
0.20 or change ?1 from 1 to 1.5?
116Tradeoffs
- The general approach
- can lead to unacceptable answers
- has several choices
- Analyzing these options can lead to logical
choices
n
a 0.1
b 0.1
b 0.2
d1
117Sample Size Non-parametric Tests
- Sample size cannot be determined with accuracy
- Signed rank test is about 90 efficient as test
on means using z statistic - Calculate n as just described and divide by
0.90 - How many pilots do we need to evaluate new flight
control system laws and be 90 certain that there
is a significant improvement (defined by Cooper
Harper Scale)? - a 0.10 b0.20 (arbitrary) d1 1
- s1, s2 - review of similar tests show s ? 1
-
118Sample Size Non-parametric Tests
- Yields
- Thus -- 10 Evaluation pilots would be needed
119Error Analysis
- Thus far we have discussed errors of directly
measured parameters - In flight test we normally combine observations
into calculated values - fuel used fuel flow x time
- specific range velocity / fuel flow
- The propagation or combinations of errors can
thus be significantly larger the one individual
piece would imply
120Significant Figures
- The number of significant figures in a result
implies a level of precision - Definition
- the left most nonzero digit is the most
significant figure - the least significant figure is
- right most nonzero digit (no decimal point)
- right most digit (with a decimal point)
- all digits between least and most significant are
significant digits - Rules
- addition/subtraction keep one more decimal digit
than in least accurate number - other use one more digit than in least accurate,
then round result to least accurate - Ex. Timing event with watch with tenth of a
second division - shouldnt record more than two decimal places
--10.24 seconds
121Error Propagation
- Precision of computed value is dependent on the
precision of each directly measured value - Example
- Partial
- Derivative
- Form
- In a computed value (say Q) it can be shown that
the error in Q (DQ) where Q f(a,b,c...) is
122Error Propagation
- But in this course, we have seen that individual
errors are stochastic (randomly variable), so - Example
- Find the standard deviation of CL (lift
coefficient) given a 1 standard deviation each
for n, W and Ve
123Error Propagation
- Where
- A 1 error in each term gives a 2.4 error in
the final result
124Questions?