Title: Data Analysis, Presentation, and Statistics
1Data Analysis, Presentation, and Statistics
2Overview
- Tables and Graphs
- Populations and Samples
- Mean, Median, and Standard Deviation
- Standard Error 95 Confidence Interval (CI)
- Error Bars
- Comparing Means of Two Data Sets
- Linear Regression (LR)
3Warning
- Statistics is a huge field, Ive simplified
considerably here. For example - Mean, Median, and Standard Deviation
- There are alternative formulas
- Standard Error and the 95 Confidence Interval
- There are other ways to calculate CIs (e.g., z
statistic instead of t difference between two
means, rather than single mean) - Error Bars
- Dont go beyond the interpretations I give here!
- Linear Regression
- We only look at simple LR and only calculate the
intercept, slope and R2. There is much more to
LR!
4Should I Use a Table or Graph?
- Tables
- Presenting large amount of different data
- Comparing multiple characteristics
- Graphs
- Visual presentation quickly gives information
- Compare one or two characteristics
- Showing trends
5Tables
Table 1 Average Turbidity and Color of Water
Treated by Portable Water Filters
4 5 12
Consistent Format, Title, Units, Big
Fonts Differentiate Headings, Number Columns
6Figures
Consistent Format, Title, Units Good Axis Titles,
Big Fonts
11
Figure 1 Turbidity of Pond Water, Treated and
Untreated
7Graphing Suggestions
- 1, 2, 5 rule
- Set gradations so smallest division of the axis
is a positive integer power of 10 times 1, 2, or
5. - Huh?
- Set your scale up so that the smallest division
is an integer increment.
8Graphing Suggestions
- Labels
- All axes should be labeled
- Include units on the label
- Points, lines, curves
- Play around with options
- Color can be your friend
- Color can be your enemy
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Populations and Samples
- Population
- All of the possible outcomes of experiment or
observation - US population
- Particular type of steel beam
- Sample
- A finite number of outcomes measured or
observations made - 1000 US citizens
- 5 beams
- We use samples to estimate population properties
- Mean, Variability (e.g. standard deviation),
Distribution - Height of 1000 US citizens used to estimate mean
of US population
15Mean and Median
- Turbidity of Treated Water (NTU)
Mean Sum of values divided by number of
samples (1336810)/6 5.2 NTU
1 3 3 6 8 10
Median The middle number Rank - 1 2 3
4 5 6 Number - 1 3 3 6 8 10
For even number of sample points, average middle
two (36)/2 4.5
Excel Mean AVERAGE Median - MEDIAN
16Variance
- Measure of variability
- sum of the square of the deviation about the mean
divided by degrees of freedom
n number of data points
Excel variance VAR
17Standard Deviation, s
- Square-root of the variance
- For phenomena following a Normal Distribution
(bell curve), 95 of population values lie within
1.96 standard deviations of the mean - Area under curve is probability of getting
value within specified range
Excel standard deviation STDEV
Standard Deviations from Mean
18Standard Error of Mean
- Standard deviation of mean
- Of sample of size n
- taken from population with standard deviation s
- Estimate of mean depends on sample selected
- As n ?, variance of mean estimate goes down,
i.e., estimate of population mean improves - As n ?, mean estimate distribution approaches
normal, regardless of population distribution
1995 Confidence Interval (CI) for Mean
- Interval within which we are 95 confident the
true mean lies - t95,n-1 is t-statistic for 95 CI if sample size
n - If n ? 30, let t95,n-1 1.96 (Normal
Distribution) - Otherwise, use Excel formula TINV(0.05,n-1)
- n number of data points
20Error Bars
- Show data variability on plot of mean values
- Types of error bars include
- Standard Deviation, Standard Error, 95 CI
- Maximum and minimum value
21Using Error Bars to compare data
- Standard Deviation
- Demonstrates data variability, but no comparison
possible - Standard Error
- If bars overlap, any difference in means is not
statistically significant - If bars do not overlap, indicates nothing!
- 95 Confidence Interval
- If bars overlap, indicates nothing!
- If bars do not overlap, difference is
statistically significant - Well use 95 CI
22Example 1
Create Bar Chart of Name vs Mean. Right click on
data. Select Format Data Series.
23Example 2
24Linear Regression
- Fit the best straight line to a data set
Right-click on data point and use trendline
option. Use options tab to get equation and R2.
25R2 - Coefficient of multiple Determination
yi Predicted y values, from regression
equation yi Observed y values
R2 fraction of variance explained by
regression (variance standard deviation
squared) 1 if data lies along a straight line