Title: Stor 155, Section 2, Last Time
1Stor 155, Section 2, Last Time
- Course Organization Website
- http//stat-or.unc.edu/webspace/postscript/marron/
Teaching/stor155-2007/Stor155-07Home.html - What is Statistics?
- Data types and structure
- Get going in EXCEL
- Exploratory Data Analysis
- Bar Graphs
2Reading In Textbook
- Approximate Reading for Todays Material
- Pages 14-23
- Approximate Reading for Next Class
- Pages 40-55
3Stat 31, Student Poll Results
As indicated on Student Info form
Big changes from the past More Public More
diversity
4Stat 31, Student Poll Results
- Have you taken an AP Exam?
- Only 10 had grades generally low
- So dont worry if you havent
5Stat 31, Student Poll Results
- Female 48
- Male 53
- Interesting Point
- Different from all of UNC 60 - 40
- Lesson about which courses to take???
6Major Concept Distributions
- Distribution Patterns of data
- Way data is spread
out - e.g. Bar Graph is visual display of categorical
distribution
7Exploratory Data Analysis 2
- Visual Display of Quantitative Distributions
- Stem and Leaf Plots
- (From last time) Not Recommended
- (Main motivation was pencil and paper statistical
analysis, but now have better graphical methods
readily accessible) - A limited special case of.
8Visual Disp Quantitative Distns
- 2. Histograms
- Idea Apply bar graph idea,
- By creating categories,
- Called class intervals or classes or bins
9Histograms
- Idea put numbers into bins,
- bar heights are counts, or frequencies
- 1.3
- 3.6
- 1.9
- 3.1
- 1.5
10Histograms
- Idea put numbers into bins,
- bar heights Class Intervals
- 1.3 (0,1, (1,2,
(2,3, (3,4 - 3.6
- 1.9
- 3.1
- 1.5 0 1 2 3
4
11Histograms
- Idea put numbers into bins,
- bar heights are counts, or frequencies
- 1.3
- 3.6
- 1.9
- 3.1
- 1.5 0 1 2 3
4
12Histograms
- Idea put numbers into bins,
- bar heights are counts, or frequencies
- 1.3
- 3.6
- 1.9
- 3.1
- 1.5 0 1 2 3
4
13Buffalo Snowfall Data
- Buffalo, N. Y. (Annual) Snowfall Data
- Raw Data
- http//stat-or.unc.edu/webspace/postscript/marron/
Teaching/stor155-2007/Stor155Eg2Raw.xls - 63 years, ranging from 30 - 120
(inches) - Histogram Analysis (pre-done)
- http//stat-or.unc.edu/webspace/postscript/marron/
Teaching/stor155-2007/Stor155Eg2Done.xls
14Buffalo Snowfall Data, I
- A. EXCEL default (of bin edges)
- Unround numbers for bin edges
- Harder to interpret
- Data centered around 90
- Most data between 50 and 130
- Assymetric Distribution
15Buffalo Snowfall Data, II
- B. Smaller bins
- Chosen by me
- Binwidth 5, ltlt 13 from EXCEL default
- Nicer edge numbers
- Data centered around 84 (now more precise)
- Bar graph rougher (fewer points in each bin)
- Suggests 3 main groups
- (called modes or clusters)
- (cant see this above bin width is important)
16Buffalo Snowfall Data, III
- C. Larger bins
- Chosen by me
- Binwidth 30, gtgt 13 from EXCEL default
- Bar graph is smooth
- (since many points in each bin)
- Only one mode (cluster)???
- Quite symmetric?
- (different from above bin width is important)
17Buffalo Snowfall Data, IV
- Whats under the hood (how to do this)
- Tools ? Data Analysis ? Histogram ( Chart Out)
- (may need Data Analysis Add-in)
- Massage pic (especially bar width)
- Sigma ? min, max
- Bin range create first two drag
- Histogram, using input bin edges
18Histogram HW
- HW 1.33
- Use Excel and histograms
- Get data from CDrom
- Do both
- Excel Default bins
- Bins set to 0,10,20,,240
- Which gives answers closer to answers in back of
book? - Turn in only one page
19And now for something completely different
- Is this class too monotone?
- Easier to understand?
- Calm environment enhances learning?
- Or does it induce somnolence?
- What is somnolence?
- Google definition
- Sleepiness, a condition of
- semiconsciousness approaching coma.
20And now for something completely different
- Recall last classs Student Questionnaire
- I asked you for
- Name
- Major
- Contact Info
- Background
21And now for something completely different
22And now for something completely different
- OK, will try to send your mind in a different
- direction
- Hopefully, a mental break
- (not on the Homework Assignment!)
23And now for something completely different
- An experiment
- Pull out any coins you have with you
- How many of you have
- gt 1 penny?
- gt 1 nickel?
- gt 1 dime?
- gt 1 quarter?
- Choose most frequent denomination
24And now for something completely different
- Collect data (into Spreadsheet)
- Years stamped on coins
- (chosen denomination)
- Many as person has
- Enter into spreadsheet
- Look at distribution using histogram
25And now for something completely different
- Predicted Answer
- From Text Book, Problem 1.32
- Distribution is Left Skewed
- Works out as predicted?
- Why?
- Note most skewed distns seem to be
- Right Skewed
26Histogram Binwidths
- Nice Example from the Webster West, U.S.C.
- http//www.stat.sc.edu/west/applets/histogram.htm
l - Control Binwidth with slider
- Undersmoothing?
- About right?
- Oversmoothing?
- (critical to visual impression)
27Histogram Binwidth Example
- Hidalgo Stamp Data
- From Mexico in 1800s
- How many sources of paper?
- How many modes
- 1, 2, 5, 7, 10?
28Histogram Binwidth Example
- How many modes (i.e. clusters)?
- Caution Answer depends on binwidth
- (a serious and current
- statistical research problem)
- Have seen all of 2,3,5,7,10 in the literature!
29Stamps Data Histogram
- How many modes?
- 2nd Caution Answer also depends on bin location
- (i.e. shift of bins)
30Histogram Bins
- For this course
- Try several binwidths, to get the idea
- Weakness of EXCEL (we will see several)
- This process is inconvenient
31Comparison of Histograms
- Class Example Study Habits Data
- Idea Compare Study Habits of Males vs. Females
(measured by some survey score, perhaps of
questionable value?) - http//stat-or.unc.edu/webspace/postscript/marron/
Teaching/stor155-2007/Stor155Eg4Done.xls
32Study Habits Data
- EXCEL default histograms
- Populations look similar???
- Careful Binwidth very big
- Careful Different bin ranges
- Need smaller binwidths, and common scales
33Study Habits Data
- Better Choice Binwidths 10, same bins for
both - Clear difference, easy to see
- Females higher on average
- Males are more spread
- 1 exceptional value, really true???
34Things to look for (in histos)
- Population Center Point (Study Habits Data)
- Population Spread (Study Habits Data)
- Shape - Symmetric vs. Skewed
- Right Skewed
- Left Skewed
- Modes - Unexpected clusters
- Outliers - unusual data points
35Histogram Data Examples
- Textbook Applets from Publishers Website
- One Variable Statistical Calculator
- Data Set Service Times at a Call Center
- Histogram
- (hold mouse button, and slide left-right)
- Results
- Broad range of binwdiths (12 25 is best?)
- Single bin is useless
- Distribution is Right Skewed
- Clear Outlier
36Comparison of Histograms HW
- HW 1.35b, 1.34, 1.17
- Work in this order
- Get data from CDrom
- Use EXCEL and histograms
- Odd answers in back
- You choose the bins
- (if you miss something in answers, change this)
- Turn in at most one page for each
- 1.31, 1.32
37Exploratory Data Analysis 3
- Time Plots, i.e. Time Series
- Idea when time structure is important,
- plot variable as a function of time
- variable
- time
- Often useful to connect the dots
38Class Time Series Example
- Monthly Airline Passenger Numbers
- http//stat-or.unc.edu/webspace/postscript/marron/
Teaching/stor155-2007/Stor155Eg5Done.xls - Increasing Trend
- (long term growth, over years)
- Increasing Variation
- (appears proportional to trend)
- Seasonal Effect - 12 Month Cycle
- (Peak in summer, less in winter)
39Airline Passengers Example
- Interesting variation log transformation
- Stabilizes variation
- Since log of product is sum
- Shows changing variation propl to trend
- Log10 is most interpretable
- (log10(1000) 3, )
- Generally useful trick (there are others)
40Airline Passengers Example
- A look under the hood
- http//stat-or.unc.edu/webspace/postscript/marron/
Teaching/stor155-2007/Stor155Eg5Raw.xls - Use Chart Wizard
- Chart Type Line (or could do XY)
- Use subtype for points lines
- Use menu for first log10
- Although could just type it in
- Drag down to repeat for whole column
41Time Series HW