Title: PROBABILISTIC AND
1PROBABILISTIC AND STATISTICAL FOUNDATIONS
Engineering systems dynamic systems i.e.
their states change with time -
stochastically (randomly) Models of the system
inputs random variables Random variables
basis of engineering systems
2Random changes not often truly random Often
some hint of a pattern or trend Random
nature of engineering systems as defined by a
set of random variables leads to random changes
in state(s) of system Random nature of
engineering systems - modelled using
probability and statistics Interpretation of
these probabilities and statistics -
optimization of the system for current and future
use
3STATISTICS (descriptive) Used to describe system
characteristics based on data collected on
it Used in the analysis of system data to
provide an intuitive understanding of the system
data Used to display system information to
external viewers Used to develop system
performance measures
4STATISTICS (inferential) and PROBABILITY Used to
derive statements about what a system might do if
certain scenarios were inputted (more
later) Statistics for current and previous
system behaviour Probability for predictive
(future) system behaviour
5- What is Applied Statistics?
- Applications from various fields.
- What is statistics?
- What is probability?
- Relationship between probability and Statistics.
6What is Applied Statistics?
- Collection of (statistical) techniques used in
practice. - Range from very simple ones such as graphical
display, summary statistics, and time-series
plots, to sophisticated ones such as design of
experiments, regression analysis, principal
component analysis, and statistical process
control.
- Successful application of statistical methods
depends on the close interplay between theory and
practice. - There should be interplay (communication and
understanding) between engineers and
statisticians.
7- Engineers should have adequate statistics
background to (a) know what questions to ask (b)
mix engineering concepts with statistics to
optimize productivity (c) get help and
understand the implementation.
- The object of statistical methods is to make the
scientific process as efficient as possible.
Thus, the process will involve several
iterations, each of which will consist of an
hypothesis, data collection, and inference.
The iterations stop when satisfactory results are
obtained.
8WHY WE NEED STATISTICS?
- Quality is something we all look for in any
product or service we get. - What is Quality?
- It is not static and changes with time.
- Continuous quality improvement program is a MUST
to stay competitive in these days.
- Final quality and cost of a product are pretty
much dependent on the (engineering) designs and
the manufacture of the products. - Variability is present in machines, materials,
methods, people, environment, and measurements. - Manufacturing a product or providing a service
involves at least one of the above 6 items (may
be some other items in addition to these)
9- Need to understand the variability.
- Statistically designed experiments are used to
find the optimum settings that improve the
quality. - In every activity, we see people use (or abuse?)
statistics to express satisfaction (or
dissatisfaction) towards a product. - There is no such a thing as good statistics or
bad statistics.
- It is the people who report the statistics
manipulate the numbers to their advantage. - Statistics properly used will be more productive.
10EXPLORE, ESTIMATE and CONFIRM
- Statistical experiments are carried out to
- EXPLORE gather data to study more about the
process or the product. - ESTIMATE use the data to estimate various
effects. - CONFIRM gather additional data to verify the
hypotheses.
11BASIC DESCRIPTIVE STATISTICS
Consider a random variable measured over a time
interval Data set will have measures of central
tendency and measures of deviation around
them i.e. target values and scatter
(variation) i.e. signal and noise Signal/noise
ratios very important in optimization
12CASE STUDY 2
The data below represent 30 observations taken
over a 24-hour period that record the air
pressure (in atmospheres) provided by an
industrial compressor that is used to power a
pneumatic stamping facility
x each value n 30
Armed with this data provide a complete basic
statistical analysis of this data and comment on
the results.
Measures of central tendency
mean
or
Mean average, expected value etc.
Are you familiar with the difference between
sample mean and population mean?
13Mean the expected value of a random variable,
which is also called the population mean. The
Median a number dividing the higher half of a
sample, a population, or a probability
distribution from the lower half. The median of a
finite list of numbers can be found by arranging
all the observations from lowest value to highest
value and picking the middle one. If there are an
even number of observations, the median is not
unique, so one often takes the mean of the two
middle values. The Mode In statistics, mode
means the most frequent value assumed by a random
variable, or occurring in a sampling of a random
variable, e.g. the highest peak on a data
histogram Like the statistical mean and the
median, the mode is a way of capturing important
information about a random variable or a
population in a single quantity.
Mode of a sample The mode of a data sample is the
element that occurs most often in the collection.
For example, the mode of the sample 1, 3, 6, 6,
6, 6, 7, 7, 12, 12, 17 is 6. Given the list of
data 1, 1, 2, 4, 4 the mode is not unique,
unlike the arithmetic mean.
14Measures of dispersion
Give an indication as to how much scatter there
is in the data set Scatter variability
bad! The more variable the system the harder
to model and optimize Any dynamic system
variable measured over time will have a central
tendency and a variability to analyze
15The Range Difference between highest (Xl) and
lowest (Xs) data value i.e. R Xl - Xs The
Variance Variance measure of fluctuation of
the observations around a mean
16For a population, or For a sample
Why N versus n-1 ? Population variance
parameter cf. sample variance estimator will
change from sample to sample -but should average
out to the parameter - property of unbiasedness
17POPULATION
- Population is a collection of all units defined
by some characteristic, which is the subject
under study. - In the study of the MPG (miles per gallon ) of a
new model car, the population consists of the
MPG's of all cars of that model. - To study the income level of a particular city
the population consists of the incomes of all
working people in that city.
18The sample In statistics, a sample is a subset
of a population. Typically, the population is
very large, making a census or a complete
enumeration of all the values in the population
impractical or impossible. The sample represents
a subset of manageable size.
- Much care should be devoted to the sampling.
- There is always going to be some error involved
in making inferences about the populations based
on the samples. - The goal is to minimize this error as much as
possible. - There are many ways of bringing in systematic
bias (consistently misrepresent the population).
19- This can be avoided by taking random samples.
- Simple random sample all units are equally
likely to be selected. - Multi-stage sample units are selected in several
stages. - Cluster sample is used when there is no list of
all the elements in the population and the
elements are clustered in larger units.
- Stratified sample In cases where population
under study may be viewed as comprising different
groups (stratas) and where elements in each group
are more or less homogeneous, we randomly select
elements from every one of the strata. - Convenience sample samples are taken based on
convenience of the experimenter. - Systematic sample units are taken in a
systematic way such as selecting every 10th item
after selecting the first item at random.
20DESCRIPTIVE STATISTICS
- Deals with characterization and summary of key
observations from the data. - Quantitative measures mean, median, mode,
standard deviation, percentiles, etc. - Graphs histogram, Box plot, scatter plot, Pareto
diagram, stem-and-leaf plot, etc. - Here one has to be careful in interpreting the
numbers. Usually more than one descriptive
measure will be used to assess the problem on
hand.
21- Standard deviation
- Square root of the variance
- Measures scatter in same units as observations
- Very popular in process control and tolerancing
- e.g. diameter 63.40.1mm
or
22Other useful measures skewness and kurtosis
measure of asymmetry of data set about
mean Consider the distribution in the figure.
The bars on the right side of the distribution
taper differently than the bars on the left side.
These tapering sides are called tails, and they
provide a visual means for determining which of
the two kinds of skewness a distribution
has positive skew The right tail is the
longest the mass of the distribution is
concentrated on the left of the figure. The
distribution is said to be right-skewed.
negative skew The left tail is the longest the
mass of the distribution is concentrated on the
right of the figure. The distribution is said to
be left-skewed.
23Kurtosis measure of peakedness of data set
pdf for the Pearson type VII distribution with
kurtosis of infinity (red) 2 (blue) and 0
(black)
log-pdf for the Pearson type VII distribution
with kurtosis of infinity (red) 2 (blue) 1,
1/2, 1/4, 1/8, and 1/16 (gray) and 0 (black)
24 left skew right skew no
skew mesokurtic leptokurtic
platykurtic
- A perfect mesokurtic curve is also called a
normal curve, which by definition is not skewed
in either direction. - A leptokurtic distribution is symmetrical in
shape, similar to a normal distribution, but the
centre peak is much higher that is, there is a
higher frequency of values near the mean. If you
move scores from shoulders of a distribution into
the centre and tails of a distribution, the
result is a peaked distribution with thick tails - A platykurtic distribution is one in which most
of the values share about the same frequency of
occurrence. As a result, the curve is very flat,
or plateau-like. Uniform distributions are
platykurtic.
25Statistical Association Relationship between
variables Does variable x influence variable y
? Measured in terms of correlation coefficient
r 2 data sets, X and Y then r is given as
Presented as bi-variate x-y plots Indicated the
strength and direction of a linear relationship
between two random variables
26CASE STUDY 3
The data below is taken from 40 observations on
the depth of cut and tool wear in a milling
operation. Is it true to say that the amount of
tool wear is generally independent of the cutting
depth?
27Consider the data presented r is calculated as
0.9397 r ranges from -1 to 1 with zero showing
no correlation Answer indicates a strong
positive correlation i.e. increasing the cut
depth increases tool wear (94 certain of this!)
28INTRODUCTORY PROBABILITY
Illustrative Example
- The following data corresponds to an experiment
in which the effect of engine RPM (revolutions
per minute) on the horsepower is under study. - TABLE 1 Data for HP Example
29INTRODUCTORY PROBABILITY
- Looking at the data in Table 1, why is that the
hp values, say at 4500 RPM, are not exactly the
same if the experiment is repeated under the
same conditions? - The fluctuation that occurs from one repetition
to another is called experimental variation,
which is usually referred to as noise or
statistical error or simply error Recall
this term from earlier discussion on data
collection.
- This represents the variation that is
inherently present in any (practical) system. - The noise is a random variable and is studied
through probability.
30What is Probability?
- A manufacturer of blender motors wants to
determine the warranty period for this product. - If motor life were constant, (say 8 years) the
manufacturer would have no problem. The motor
could be warranted for 8 years. - But, in reality, the motor life is not a constant.
- Some motors will fail quickly and others will
last for several years. - There is an element of randomness in the life of
the motors. - The manufacturer cannot precisely predict how
long any motor will last. - Probability theory gives the manufacturer the
means to quantify what is known about motor
lifetimes and helps to quantify the risks
involved in setting a warranty period.
31- Similar problems arise in the context of other
products. - FMS play an important role in modern
manufacturing. Improved quality, lower inventory,
shorter lead times, higher productivity and
greater safety are some of the benefits derived
from FMS. - All of these have random elements.
- Probability theory deals with randomness,
allowing the study of quantities whose behavior
cannot be predicted completely in advance. - The above examples deal with manufacturing.
32- We could just as easily find examples in
business, electrical and computer engineering,
biomedical science and engineering, sociology,
economics, marketing, civil engineering, the
behavioral sciences and so on. The underlying
problem, randomness, is the same.
- One should understand the ideas of probability
and statistics from both theoretical and
practical points of view. - To properly apply probability and statistics in
the real world, we must appreciate both sides of
the picture. - We cannot properly apply a procedure if we don't,
at least in general terms, understand the
reasoning (theory) behind it.
33- On the other hand, trying to apply theory without
knowledge of the area of application is foolish.
We have to have a proper perspective on both
before meaningful progress can be made. - Probability theory develops mathematical models
for random experiments. - A random experiment is a sequence of actions
whose outcome cannot be predicted with certainty.
- Outcomes of random experiments the length of a
phone call, the gender mix of three people chosen
from a group of 25 people, and the phenotype of
the offspring of a cross breeding experiment, the
number of defects on a painted panel.
34- EXPERIMENT
- Calculation of MPG of a new model car.
- Measurements of current in a thin copper wire.
- Measurements of Film build thickness in a
painting process. - Duration of phone calls.
- Time to assemble a job.
- Tossing a coin.
- Any data set subject to variability
- - Natures way
- Creates a degree of randomness that generates
uncertainty - - need probability theories to handle this
35- EVENTS
- n samples collected for a quantity (variable) X
- Values of n will differ from each other
- for numerous reasons
All possible values of X Sample Space
S subset of S Event A i.e.
36Venn Diagrams - Illustrations used in the
branch of mathematics known as set theory. They
show all of the possible mathematical or logical
relationships between sets - good graphical way
of representing probability i.e. to show
37Events often combined e.g. 2 events A and
B or for n events e.g. A (x1, x3)
and B (x4, x8) Then C (x1, x3, x4, x8)
38(No Transcript)
39Intersection of Events Here -
or Where D contains all sample points common to
the events in question e.g. if A x1, x2, x3
and B x3, x4,x6 Then D x3
40Mutually exclusive events 2 events, A and B are
mutually exclusive if - an impossible
event i.e.
41CASE STUDY 4
A construction company is currently bidding for 2
jobs. Considering a sample space (S) containing
the outcome of winning or losing these jobs
42- S WW, WL, LW, LL
- A WL, LW B LL C WW, WL, LW
- i.e. mutually exclusive
- WL, LW, WW
43What is the likelihood of an event
occurring? i.e. the probability of an
event Probability of event A P(A) Note
P(F) 0 and P(S ) 1
numerical value between 0 and 1
44CASE STUDY 5
45P(V1) Buses 5/192 0.026 P(V2) 2-axle
15/192 0.078 P(V3) 3-axle 25/192
0.130 P(V4) 4-axle 30/192 0.156 P(V5)
5-axle 105/192 0.547 P(V6) 6-axle 6/192
0.031 P(V7) 7-axle 6/192 0.031 What is
the probability of seeing a truck? 1 0.026
0.974 97.4 chance What is the probability
of seeing a truck with 5 or more
axles? 0.5470.0310.031 0.609 60.9
46Basic laws of probability For three
events etc.
47Conditional probability If event A depends on
event B i.e. AB it is described as
conditional The conditional probability P(AB)
assumes event B occurs
48Often described as and i.e. - Bayes
Theorem
49CASE STUDY 6
50- P(A) 0.05 P(B) 0.005
- and P(BA) 0.17
- Both A and B need to occur i.e.
- so
- b) Here A or B is relevant i.e.
51- Here P(AB) is required Bayes theorem
De Morgans Rule Deals with complimentary
probabilities of large systems i.e. and
52CASE STUDY 7
Use conditional probability! (De Morgan) P(s)
probability of no malfunction so
Because the events are independent
Answer 1 0.9631 0.0369