EIN 4905/ESI 6912 Decision Support Systems Excel - PowerPoint PPT Presentation

About This Presentation
Title:

EIN 4905/ESI 6912 Decision Support Systems Excel

Description:

Annual drug sales at a local drugstore are distributed Normally with a mean of ... previously to find the probability that a drugstore's annual sales are 42,000. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 62
Provided by: michellem65
Category:

less

Transcript and Presenter's Notes

Title: EIN 4905/ESI 6912 Decision Support Systems Excel


1
Spreadsheet-Based Decision Support Systems
Chapter 7 Statistical Analysis
Prof. Name
name_at_email.com Position
(123) 456-7890 University Name
2
Overview
  • 7.1 Introduction
  • 7.2 Understanding Data
  • 7.3 Relationships in Data
  • 7.4 Distributions
  • 7.5 Summary

3
Introduction
  • Performing basic statistical analysis of data
    using Excel functions
  • Statistical features of the Data Analysis
    Toolpack
  • Trend curves for analyzing data patterns
  • Basic linear regression techniques in Excel
  • Several different distribution functions in Excel

4
Understanding Data
  • Statistical Functions
  • Descriptive Statistics
  • Histograms

5
Statistical Functions
  • AVERAGE
  • Finds the mean of a set of data.
  • AVERAGE(range or range_name)
  • MEDIAN
  • Finds the middle number in a list of sorted data.
  • MEDIAN(range or range_name)
  • STDEV
  • Finds the standard deviation of a set of data.
  • This is equal to the square root of the variance,
    which measures the difference between the mean of
    the data set and the individual values.
  • STDEV(range or range_name)

6
Figures 7.1 and 7.2
7
Figures 7.3 and 7.4
8
Analysis Toolpack
  • An Excel Add-In which includes several
    statistical analysis techniques
  • To ensure that it is an active Add-in, choose
    Tools gt Add-ins from the menu. Select Analysis
    Toolpack from the list.

9
Descriptive Statistics
  • Provides a list of statistical information about
    your data set including
  • Mean
  • Median
  • Standard deviation
  • Variance
  • Go to Tools gt Data Analysis gt Descriptive
    Statistics

10
Descriptive Statistics (cont)
  • The Input Range refers to the location of the
    data set.
  • You can check whether your data is Grouped By
    Columns or Rows.
  • If there are labels in the first row of each
    column of data, then check the Labels in First
    Row box.
  • The Output Range refers to where you want the
    results of the analysis to be displayed in the
    current worksheet.
  • The Summary Statistics box will calculate the
    most commonly used statistics from our data.

11
Figure 7.7
  • Quarterly stock returns for three different
    companies are recorded. We want to know
  • Average stock return
  • Variability of stock returns
  • Which quarters had the highest and lowest stock
    returns

12
Figures 7.8 and 7.9
13
Figure 7.11
  • The standard deviation can be used to understand
    how common outliers are in the data.

14
More Descriptive Statistics
  • Confidence Level for Mean
  • The mean is calculated using the specified
    confidence level (for example, 95 or 99), the
    standard deviation, and the size of the sample
    data.
  • The confidence level and calculated mean are then
    added to the analysis report.
  • You can compare the actual mean to this
    calculated mean based on the specified confidence
    level.
  • Kth Largest
  • Gives the largest ranked data value for a
    specified value of k.
  • For k 1, the maximum data value would be
    returned.
  • Kth Smallest
  • Gives the smallest ranked data value for a
    specified value of k.
  • For k 1, the minimum data value would be
    returned.

15
Descriptive Statistics Functions
  • PERCENTILE
  • Returns a value for which a desired percentile k
    of the specified data_set falls below.
  • PERCENTILE(data_set, k)
  • For example, for the MSFT data, the value for
    which 95 of the data falls below is
  • PERCENTILE(B4B27,0.95) 0.108
  • PERCENTRANK
  • Returns the percentile of the data_set which
    falls below a given value.
  • PERCENTRANK(data_set, value)
  • For example, the percent of the MSFT data which
    falls below the value 0.108 is
  • PERCENTRANK(B4B27, 0.108) 0.95, or 95

16
Histograms
  • Histograms calculate the number of occurrences,
    or frequency, which values in a data set fall
    into various intervals.
  • Choose the Histogram option from the Analysis
    Toolpack list.

17
Histograms (contd)
  • The Input Range is the range of the data set.
  • The Bin Range is used to specify the location of
    the bin values.
  • Bins are the intervals into which values can
    fall they can be defined by the user or can be
    evenly distributed among the data by Excel.
  • The Output Range is the location of the output,
    or the frequency calculations for each bin.
  • The chart options include a simple Chart Output
    (the actual histogram), Cumulative Percentage for
    each bin value, and a Pareto organization of the
    chart.

18
Figures 7.15 and 7.16
19
Figures 7.17 and 7.18
  • To create your own bin values, make a list of
    upper bounds for each interval.

20
Figure 7.19
21
Histograms (contd)
  • Histograms can also be formatted.
  • Right-click on the histogram and change the Chart
    Options or other parameters.

22
Histograms (cont)
  • There are four basic shapes to a histogram
  • Symmetric has only one peak that is, there is a
    central high part and almost equal lower parts to
    the left and right of this peak.
  • Positively skewed has a peak on the left and
    many lower points (stretching) to the right.
  • Negatively skewed has a peak on the right and
    many lower points (stretching) to the left.
  • Multiple peaks imply that more than one source,
    or population, of data is being evaluated.

23
Relationships in Data
  • Trend Curves
  • Regression

24
Data Relationships
  • Relationships in data are usually identified by
    comparing two variables the dependent variable
    and the independent variable.
  • The dependent variable is the variable we are
    most interested in. By understanding its current
    behavior we can better predict its future
    behavior.
  • The independent variable is the variable we use
    as the comparison in order to make this
    prediction.

25
Trend Curves
  • Trend curves are used to graph and analyze these
    relationships between data.
  • Trend curves graph the data with
  • the independent variable on the x-axis
  • the dependent variable on the y-axis
  • To add a trend curve to your chart, right-click
    on the data points in an XY Scatter chart and
    choose Add Trendline from the drop-down list of
    options.

26
Trend Curves (contd)
  • There are five basic trend curves which Excel can
    model
  • Linear
  • Exponential
  • Power
  • Moving Average
  • Logarithmic

27
Trend Curves (contd)
  • Click on the Options tab to set options for the
    trend curve.
  • Set the name of the trendline.
  • Specify a period forward or backwards for which
    you want to predict the behavior of your
    dependent variable.
  • Check to Display Equation and Display R-Squared
    Value.

28
Linear Trend Curves
  • Number of Units Produced each month and the
    corresponding Monthly Plant Cost are recorded.
  • The company wants to be able to estimate their
    plant costs based on the planned production
    amounts.
  • The dependent variable is therefore the Monthly
    Plant Cost and the independent variable is the
    Units Produced.

29
Figures 7.26 and 7.29
  • Graph the data and then add a Linear trendline.

30
Figure 7.30
  • Use the displayed equation to predict future
    values.
  • First check the accuracy of the equation by
    calculating the error from the known data.
  • Linear trends have the relationship y ax - b

31
Exponential Trend Curves
  • Sales data for ten years is recorded.
  • We want to be able to predict sales for the next
    few years.
  • The independent variable is Years and our
    dependent variable is Sales.

32
Figures 7.34 and 7.35
  • Exponential trends have the relationship y
    ae(bx) or
  • y aEXP(bx)

33
Power Trend Curves
  • We are given yearly Production values and yearly
    Unit Cost for production.
  • We want to determine the relationship between
    Unit Cost and Production in order to be able to
    predict future Unit Costs.

34
Figures 7.39 and 7.40
  • Power trends have the relationship y axb

35
Regression Analysis
  • We can use some regression analysis parameters to
    ensure that the relationships we have chosen for
    our data are good fits.
  • These parameters include
  • R-Squared value
  • Standard error
  • Slope
  • Intercept

36
R-Squared Value
  • The R-Squared value measures the amount of
    influence the independent variable has on the
    dependent variable.
  • The closer the R-Squared value is to 1, the
    stronger the relationship is between the
    independent and dependent variables.
  • If the R-Squared value is closer to 0, then there
    may not be a relationship between these two
    variables.

37
Figure 7.42
  • We fit a Linear trendline to the Monthly Plant
    Cost per Units Produced chart (see Figure 7.44).
  • The R-Squared value is 0.8137, which is fairly
    close to 1, implying a good fit.

38
Figure 7.45
  • The RSQ Excel function can calculate the
    R-squared value from a set of data.
  • RSQ(y_range, x_range)
  • Note that this function only works with Linear
    trend curves.

39
Standard Error
  • The standard error measures the accuracy of any
    predictions made.
  • It can be calculated in Excel using the STEYX
    function
  • STEYX(y_range, x_range)
  • This function can also only be used for Linear
    trend curves.

40
Slope and Intercept
  • Two Excel functions can be used with a linear
    regression line of a collection of data.
  • SLOPE function
  • SLOPE(y_range, x_range)
  • INTERCEPT function
  • INTERCEPT(y_range, x_range)

41
Distributions
  • Many distributions have Excel functions
    associated with them.
  • These functions are basically equivalent to using
    distribution tables.
  • That is, given certain parameters of a set of
    data for a particular distribution, you would
    look at a distribution table to find the
    corresponding area from the distribution curve.
  • Some common distributions are
  • Normal
  • Exponential
  • Uniform
  • Binomial
  • Poisson
  • Beta
  • Weibull

42
Normal Distribution
  • The parameters for this distribution are simply
    the value we are interested in finding the
    probability for, and the mean and standard
    deviation of the set of data.
  • The function we use with the Normal distribution
    is NORMDIST
  • NORMDIST(x, mean, std_dev, cumulative)

43
Normal Distribution (cont)
  • The cumulative parameter will be seen in many
    Excel distribution functions.
  • This parameter can take the values True or False
    to determine if you want the value returned from
    the cumulative distribution function or the
    probability density function, respectively.
  • The cumulative distribution function (cdf) will
    find the probability that a value in the data set
    is less than or equal to x.
  • The probability density function (pdf) will find
    the probability that a value is exactly equal to
    x.

44
Figure 7.48
  • Annual drug sales at a local drugstore are
    distributed Normally with a mean of 40,000 and
    standard deviation of 10,000.
  • The probability that the actual sales for the
    year are 42,000 is 0.58, or 58.

45
Figure 7.49
  • What is the probability that annual sales will be
    between 35,000 and 49,000?
  • To find this value, we will subtract the cdf
    values for these two bounds.
  • NORMDIST(49000, 40000, 10000, True)
  • NORMDIST(35000, 40000, 10000, True)
  • This will return a 0.51 probability, or 51
    chance.

46
Standard Normal Distribution
  • If the mean of your data is 0 and the standard
    deviation is 1, then placing these values in the
    NORMDIST function with the cumulative parameter
    as True will find the resulting value from the
    Standard Normal distribution.
  • The STANDARDIZE function will convert the x value
    from a data set of a mean not equal to 0 and a
    standard deviation not equal to 1 into a value
    which does assume a mean of 0 and a standard
    deviation of 1.
  • STANDARDIZE(x, mean, std_dev)
  • The resulting standardized value is then used as
    the main parameter in the NORMSDIST function
  • NORMSDIST(standardized_x)

47
Figure 7.50
  • Consider the same example used previously to find
    the probability that a drugstores annual sales
    are 42,000.

48
Uniform Distribution
  • The Uniform distribution does not actually have a
    corresponding Excel function however, a simple
    formula can be used to model the Uniform
    distribution.
  • 1 / (b a)
  • Given that a value x is Uniformly distributed
    between a and b, we can use this formula to
    determine the probability that x will have an
    integer value in this interval.

49
Figure 7.51
  • Consider any values for a and b, then use the
    formula to calculate the Uniform value.

50
Poisson Distribution
  • The Poisson distribution has only the mean as its
    parameter.
  • The function we use for this distribution is
    POISSON
  • POISSON(x, mean, cumulative)
  • The Poisson distribution value is the probability
    that the number events which occur is either
    between 0 and x (cdf) or equal to x (pdf).

51
Figure 7.52
  • For example, consider a bakery which serves an
    average of 20 customers per hour.
  • Find the probability that at most 35 customers
    will be served in the next two hours.

52
Exponential Distribution
  • The Exponential distribution has only one
    parameter lambda 1 / mean of the data set.
  • The function we use for this distribution is
    EXPONDIST
  • EXPONDIST(x, lambda, cumulative)
  • The Exponential distribution is commonly used for
    modeling interarrival times.

53
Figure 7.53
  • Let us use the same example with the bakery data.
  • Arrival rate is said to be 20 customers per hour.
  • Interarrival mean, or the Exponential mean, is 1
    / arrival rate. Therefore, for this example, the
    interarrival mean is 1/20 hours per customer
    arrival.
  • To find the probability that a customer arrives
    in 10 minutes, we would set
  • x 10/60 0.17 hours
  • lambda 1/(1/20) 20 hours
  • EXPONDIST(0.17, 20, True)

54
Binomial Distribution
  • The Binomial distribution has the following
    parameters the number of trials and the
    probability of a success.
  • We are trying to determine the probability that
    the number of successes is less than or equal to
    (using cdf) or equal to (pdf) some x value.
  • The function we use for this distribution is
    BINOMDIST
  • BINOMDIST(x, trials, prob_success, cumulative)

55
Figure 7.54
  • Suppose a survey shows that 40 percent of people
    pay more attention to ads in the newspaper, and
    60 percent pays more attention to ads on
    television.
  • What is the probability that out of 100 people
    surveyed, 50 of them respond more to ads on
    television?

56
Beta Distribution
  • The Beta distribution has the following
    parameters alpha, beta, A, and B.
  • Alpha and beta are determined from the data set
  • A and B are optional bounds on the x value for
    which you want the Beta distribution value
  • The function we use for this distribution is
    BETADIST
  • BETADIST(x, alpha, beta, A, B)
  • If A and B are omitted, then a standard
    cumulative distribution is assumed and they are
    given the values 0 and 1, respectively.

57
Figure 7.55
  • Determine the probability that a team can
    complete a project in 10 days.
  • Estimate the total time needed to be 1 to 2
    weeks these estimates will be the bound values,
    or the A and B parameters.
  • Use a mean and standard deviation of 12 and 3
    days to compute the alpha and beta parameters.

58
Weibull Distribution
  • The Weibull distribution has the parameters alpha
    and beta.
  • The function we use for this distribution is
    WEIBULL
  • WEIBULL(x, alpha, beta, cumulative)
  • The Weibull distribution is most commonly used to
    determine reliability functions.

59
Figure 7.56
  • On average, a lightbulb will last 1200 hours,
    with a standard deviation of 100 hours. We can
    use these values to calculate alpha and beta.
  • We can now use the WEIBULL distribution to
    determine the probability that a lightbulb will
    be reliable for 55 days 1320 hours.

60
Summary
  • The Analysis Toolpack is an Excel Add-In that
    includes statistical analysis techniques such as
    Descriptive Statistics, Histograms, Exponential
    Smoothing, Correlation, Covariance, Moving
    Average, and others.
  • The Descriptive Statistics option provides a list
    of statistical information about a data set,
    including the mean, median, standard deviation,
    and variance.
  • Histograms calculate the number of occurrences,
    or frequency, which values in a data set fall
    into various intervals.
  • Relationships in data are usually identified by
    comparing the dependent variable and the
    independent variable.
  • There are five basic trend curves that Excel can
    model Linear, Exponential, Power, Moving
    Average, and Logarithmic.
  • Some of the more common distributions that can be
    recognized when performing a statistical analysis
    of data are the Normal, Exponential, Uniform,
    Binomial, Poisson, Beta, and Weibull
    distributions.

61
Additional Links
  • (place links here)
Write a Comment
User Comments (0)
About PowerShow.com