Descriptive statistics - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Descriptive statistics

Description:

Data exploration: What you do for yourself when you first get the data. ... 'Box and whisker plots' Sometimes shows min, 1st quartile, median, 3rd quartile, max ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 29
Provided by: uio
Category:

less

Transcript and Presenter's Notes

Title: Descriptive statistics


1
Descriptive statistics
  • Petter Mostad
  • 2005.09.08

2
Goal Reduce data amount, keep information
  • Two uses
  • Data exploration What you do for yourself when
    you first get the data.
  • Data presentation Illustrating for others some
    conclusion with numbers or graphs based on the
    data.

3
Data exploration
  • Understand description of variables
  • Find ranges, typical values, distributions of
    variables
  • Is the data OK? Meaningful? Outliers? Errors?
  • How do variables relate to each other?
  • Is it meaningful? As expected?
  • Can you form new hypotheses?

4
Data presentation
  • Remove superfluous information
  • Present essential information fairly
  • Present information efficiently
  • Make it possible to understand information
    quickly and simply

5
Types of variables
  • Numerical variables
  • Discrete
  • Continuous
  • Categorical variables
  • Nominal values
  • Ordinal values

6
Histograms
  • Subdivide continuous data into intervals, and
    display counts in intervals
  • Desicion about width of intervals can influence
    result a lot
  • Ogives

7
Bar charts
  • Can show variation between categories
  • Grouped bars can compare variations in different
    groups
  • Stacked bars can show proportions, or cumulative
    effects

8
Example
  • Shows changing proportions of 8 types across 24
    groups
  • Groups coexpressed genes
  • Types Types of organisms

9
Cumulative distributions
  • Cumulates the proportions up to each level
  • Can never decrease goes from 0 to 1 (or 100)

10
Stem-and-leaf diagrams
  • A way to show both the distribution of numbers
    graphically, and the digits involved
  • Age in years Stem-and-Leaf Plot
  • Frequency Stem Leaf
  • 2,00 1 .
  • 18,00 2 . 01223444
  • 28,00 2 . 5667888889999
  • 39,00 3 . 0000111222233344444
  • 48,00 3 . 55555666777778888899999
  • 38,00 4 . 00001111223334444
  • 39,00 4 . 555677777888889999
  • 37,00 5 . 0000011223333444
  • 22,00 5 . 55667789999
  • 13,00 6 . 011133
  • 5,00 6 . 6
  • 7,00 7 . 03
  • 1,00 7 .
  • Stem width 10

11
Pie charts
  • Illustrates percentages or parts well for
    comparison between the parts.
  • 3D pies, or exploded pies, distort more than
    they clarify the information

12
Pareto diagrams
  • Focuses on the most important (frequent)
    categories.
  • Shows cumulative frequences when including each
    category

13
Numerical summary statistics
  • (Arithmetic) mean
  • Median
  • Mode
  • Skewness
  • Outliers
  • Max, min, range

14
Arithmetic versus geometric mean
  • Given observations x1, x2, , xn
  • Arithmetic mean
  • Geometric mean
  • They correspond to each other when the scale is
    changed by taking logarithms!

15
Measures of variability
  • (Sample) variance
  • (Sample) standard deviation
  • Coefficient of variation

16
Percentiles and quartiles
  • The x percentile is the number p such that x
    percent of the data is smaller than p.
  • The first and third quartiles are the 25th and
    75th percentiles, respectively
  • The inter-quartile range is the difference
    between the third and first quartiles.

17
Boxplots
  • Box and whisker plots
  • Sometimes shows min, 1st quartile, median, 3rd
    quartile, max
  • May instead show some outliers separately

18
Scatterplots
  • Probably the most useful graphical plot
  • Can show any kind of connection between
    variables, not only linear
  • Can be done for many pairs at a time (matrix
    plot), or for triplets (3D plot)

19
Covariance
  • Given paired observations (x1,y1), (x2,y2), ,
    (xn, yn)
  • (sample) covariance
  • Positive when variables tend to change in the
    same direction, negative if opposite direction

20
Correlation coefficient
  • Correlation coefficient
  • Always between -1 and 1
  • If exactly equal to 1, then points are on an
    increasing line
  • Can be a more illustrative measure than covariance

21
Least squares line fitting
We can illustrate a trend in the data by fitting
a line
22
Fitting the line
  • The line is often fitted by minimizing the sum of
    the squares of the errors (the vertical
    distances to the line)
  • We will hear much about regression methods later

23
Cross tables
  • When items can be classified using two different
    categorical variables, we can illustrate counts
    in a cross table.
  • If percentages are computed, they must be either
    relative to the columns or the rows.
  • In multiway tables, more than two classifying
    variables are used.

24
Early example Napoleons Russian campain 1812-1813
25
DNA sequence logos
  • Used to show what is conserved, and what varies,
    at DNA binding sites for some protein
  • Relative height of letters show which bases are
    conserved
  • Total height shows degree of conservation

26
Chernoff faces
  • A way to visualize about 20 parameters in one
    figure
  • Background We are good at remembering and
    comparing faces
  • Features in the face correspond to parameters you
    want to visualize

27
Chernoff faces
28
Use your own creativity!
  • When exploring data, try to make the kinds of
    plots that will answer your questions!
  • When presenting data, think about
  • simplicity
  • fairness
  • efficiency
  • inventiveness
Write a Comment
User Comments (0)
About PowerShow.com