Analytics on Microsoft Excel - PowerPoint PPT Presentation

About This Presentation
Title:

Analytics on Microsoft Excel

Description:

We are one of the biggest actors in Business Intelligence in Norway ... Donald Farmer - Principal Program Manager for Microsoft's Data Mining ' ... – PowerPoint PPT presentation

Number of Views:323
Avg rating:3.0/5.0
Slides: 64
Provided by: Capg158
Category:

less

Transcript and Presenter's Notes

Title: Analytics on Microsoft Excel


1
Analytics on Microsoft Excel
Overview of solutions around the platform from a
real world perspective
  • Alberto Guillén
  • 03. June 2008

2
  • Capgemini is a leading company with long
    experience in technology services
  • We are one of the biggest actors in Business
    Intelligence in Norway
  • A major demand from our clients is delivering
    solutions in Microsoft Excel, we have
    continuously updated our efforts to adapt
    clients needs

Alberto Guillén Consultant Risk Management
Compliance MSc. Mathematics MSc.
Statistics Phone 47 46444721 E-Mail alberto.gu
illen_at_capgemini.com
3
Excel is more than worksheet functions and tables
Basic statistical environment
Data Visualization and Data Mining
Desktop for analytical solutions
EXCEL
Front for Data Warehouse Collection (SQL Server)
GUI for in-house coded programs (VBA)
Reporting and Monitoring Tool (Performance Point)
Excel is the Industry standard for end-user
calculations, and also as front interface
4
Analytical solutions can be created on different
complexity layers beyond basic Excel

Statistical Programming Languages
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
5
Analytical solutions can be created on different
complexity layers beyond basic Excel

Statistical Programming Languages
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
6
Analytical solutions can be created on different
complexity layers beyond basic Excel

Statistical Programming Languages
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
7
Analytical solutions can be created on different
complexity layers beyond basic Excel

Statistical Programming Languages
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
8
Analytical solutions can be created on different
complexity layers beyond basic Excel

Statistical Programming Languages
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
9
Analytical solutions can be created on different
complexity layers beyond basic Excel

Statistical Programming Languages
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
10
The average user masters the standard Excel tools
BASIC EXCEL
Tables and Filters
Worksheet functions
Charts
What If Analysis
Pivot Tables
Standard Excel allows direct interaction with the
raw data
11
The Analysis Toolpack makes rigorous analysis
possible
ANALYSIS TOOLPACK
  • Free Microsoft download
  • Created for statisticians
  • ANOVA
  • Correlation
  • Covariance
  • Descriptive Statistics
  • Exponential Smoothing
  • F-Test Two-Sample for Variances
  • Fourier Analysis
  • Histogram
  • Moving Average
  • Random Number Generation
  • Rank and Percentile
  • Regression
  • Sampling
  • t-Test

Basic statistical analyses are available through
Analysis Toolpack
12
Solver leverages computational abilities
SOLVER
  • Free Microsoft download
  • Optimizing and root finding set of algorithms
  • Can be called on the background from VBA
  • Practical but slow in heavy calculations
  • Not exact convergence sometimes!
  • Can be tuned
  • Needs good seeds

Solver implements standard algorithms for
mathematical optimization problems
13
Solver leverages computational abilities
SOLVER
  • Free Microsoft download
  • Optimizing and root finding set of algorithms
  • Can be called on the background from VBA
  • Practical but slow in heavy calculations
  • Not exact convergence sometimes!
  • Can be tuned
  • Needs good seeds

Solver implements standard algorithms for
mathematical optimization problems
14
Third party add-ins provide easily new
functionalities
ADD-INS
  • Cheap
  • Simple
  • Easy to use
  • No development efforts

Many small software developers use Excel as their
GUI
15
There are several third party add-ins offering
solutions on quantitative analysis and Monte
Carlo simulation
ADD-INS
  • Monte Carlo add-ins
  • Crystal Ball
  • Quantechs _at_Risk 5.0
  • RiskAMP
  • Lumenaut
  • Simtools
  • Also free .xla on the web

Hundreds of free or cheap add-ins offer various
solutions on fields like Risk Management
16
The Table Analysis Tools add-in brings data
mining capabilities
DATA MINING
  • Included in the data mining add-in
  • Scenario analysis differs from the standard tool
  • Brings the power of data mining to end-users

Data Mining is embedded into table functionality
17
The Data Mining add-in easies data mining to
business analysts
DATA MINING
  • What is Data Mining?
  • Data mining is frequently described as "the
    process of extracting valid, authentic, and
    actionable information from large databases.
  • Microsofts approach to Data Mining
  • Business Intelligence with a user-friendly
  • interface, accessible to end-users and
  • developers
  • Software
  • SQL Server 2005/2008 (Visual Studio BI)
  • Excel/Visio add-ins
  • DMX
  • ADOMD.Net / AMO

Microsoft brings Data Mining to business users
for the first time
18
Microsoft takes a different approach to Data
Mining
DATA MINING
  • Donald Farmer - Principal Program Manager for
    Microsofts Data Mining
  • "We don't have all the functionality of something
    like a SAS or an SPSS, because that's just not
    our market. Our market just has to be a much
    larger market
  • We have a huge database marketing team who do
    classic customer analysis. These guys were all
    SAS users, but when they joined Microsoft, they
    started using our tools. , they actually use
    the Excel data mining add-ins to do it. It's not
    that there's nothing they don't miss, it's that
    they are able to achieve the same business
    results using our tools.
  • "For a function such as 'Detect Categories,' what
    the add-in is doing is building a clustering
    model in the background , but we don't expect
    the Excel user to understand that. We just call
    it 'Build Categories,'
  • "We're seeing a lot of interest in the Excel-side
    data mining,for one thing, but we're also seeing
    interest in the embed-ability, too. The people
    who are actually pushing this are from the
    developer side.

Microsoft will not compete with traditional DM
vendors, Microsoft targets other users
19
Data Mining assists in various business processes
DATA MINING
  • Top Business Scenarios for DM
  • Cross-sell and up-sell
  • Campaign management
  • Customer acquisition
  • Budget and forecasting
  • Customer retention
  • New fields manufacturing, retail and entertaiment
  • Main DM tasks
  • Classification
  • Estimation
  • Prediction
  • Association
  • Clustering

Data Mining is being used in several business
areas
20
Data Mining is performed in SQL Server 2005 / 2008
DATA MINING
SQL Server Business Development Studio and DMX
code is the natural environment
21
Data Mining is also accessible through Excel 2007
DATA MINING
  • The Excel add-in acts as a client to an instance
    of Analysis Services
  • Both Excel and SQL Server Analysis Services
    support the full DM Cycle

Data understanding
Data preparation
Modeling
Validation
Deployment
Excel sends DM queries and data directly to SQL
Server Analysis Services
22
Data Mining is an iterative process
DATA MINING
  • A mining model is part of a larger process that
    includes everything
  • This process can be defined by using the
    following six basic steps
  • Defining the problem
  • Preparing Data
  • Exploring Data
  • Building Models
  • Exploring and Validating Models
  • Deploying and Updating

Problem
Working environment
?
Deployment
Although the process is illustrated as circular,
creating a data mining model is a dynamic and
iterative process
23
There are 9 available Data Mining algorithms on
Excel
DATA MINING
  • Decision/Regression Trees
  • Clustering
  • Naïve Bayes
  • Association rules
  • Sequence clustering
  • Time series
  • Neural Networks
  • Logistic regression
  • Linear regression
  • Plug-in algorithms
  • Third-party or self programmed implementing a set
    of COM interfaces

9 built-in algorithms can be tuned to obtain new
ones
24
Decision and Regression trees find natural splits
DATA MINING
  • Decision trees classify and find associations
  • Regression trees build segmented regressions
  • Example
  • Identify potential buyers

Decision trees give decision rules that are
suitable to business understanding
25
Clustering finds homogeneous groups
DATA MINING
  • Example Find segments of similar clients

age
income
Clustering can find hidden classes and identify
outliers
26
Clustering finds homogeneous groups
DATA MINING
  • Example Find segments of similar clients

Middle age Many cars and children
older age 2 cars no children
age
Young people No children
income
Clustering can find hidden classes and identify
outliers
27
Naïve Bayes provides probabilities of group
membership
DATA MINING
  • Example marketing campaign

?
?
Naîve Bayes is an efficient method to asses
probability of classification
28
Association rules unveils hidden logic
DATA MINING
  • Example Shopping Basket

Association rules visualizes logical rules that
underly your business
29
Sequence clustering finds event patterns in time
DATA MINING
  • Example Web navigation

Sequence clustering identifies clusters of
similarly ordered events in a sequence
30
Time series forecasts processes in time
DATA MINING
  • ARTx Microsoft proprietary algorithm
  • ARIMA available in SQL Server 2008
  • Example forecast seasonal sales to keep suitable
    stock

historical
predicted
The past patterns that it discovers can be used
to predict values for future time steps.
31
Time series forecasts processes in time
DATA MINING
  • ARTx Microsoft proprietary algorithm
  • ARIMA available in SQL Server 2008
  • Example forecast seasonal sales to keep suitable
    stock

historical
predicted
The past patterns that it discovers can be used
to predict values for future time steps.
32
Neural networks discovers predictive patterns by
learning
DATA MINING
  • Example fraud detection

Neural networks learns in an uncontrolled manner
33
Logistic regression predicts binary responses
DATA MINING
  • Microsoft Logistic Regression is implemented as a
    trivial neural network
  • Example Probability of credit default based on
    personal information

Logistic regression gives probabilities of
YES/NO given some attributes
34
Linear regression is of course also available
DATA MINING
  • It is however extended by Regression Trees
  • (Linear Regression is implemented as a
    particular case)
  • Example extrapolate the influence of oil price
    on house prices

The classical linear regression is also
integrated in the add-in
35
Chosen examples vs. real life problems
DATA MINING
  • So far, we have seen chosen examples
  • Shopping basket
  • Web navigation
  • Market segmentation
  • Unfortunately, it is not that easy data Mining
    is a creative and unclear process. Sometimes
    there is no answer with data mining.
  • Books dont show examples on when not to use the
    algorithms
  • Time series long forecasts
  • Classification trees credit scoring
  • Sequence clustering non-markovian processes

Bottom line understand the statistical models
behind the icons
36
Data Mining using the Data Mining add-in to
forecast Credit Default
DATA MINING
  • Logistic regression
  • The algorithm uses Z-scores transformations
  • Cut-offs should be calculated when deploying

Ranked classes
Number of payments
A
score
Income
B
Probability of default
C
Age
D
Civil status
After training the algorithm, probabilities of
default can be predicted for new applicants
37
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
t
Some problems require creative approaches
38
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
Training
period
t
Some problems require creative approaches
39
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
Training
period
t
Some problems require creative approaches
40
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
Training
period
t
Some problems require creative approaches
41
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
Training
period
t
Some problems require creative approaches
42
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
Training
period
t
Some problems require creative approaches
43
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
Predicting
One period older Period Period 1 Age Age

period
t
Some problems require creative approaches
44
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
x
Training
period
t
Some problems require creative approaches
45
Data Mining using the Data Mining add-in to
forecast Debt Recovery
DATA MINING
  • Hybrid between time series and regression trees
  • Problem need the algorithm k-nearest neighbours
  • Can be implemented as a plug-in algorithm

recovered
x
Predicting
period
t
Some problems require creative approaches
46
VBA for Excel is the main tool for automated
solutions
VBA
  • Communication with other software (COM Server)
  • Build algorithms not available in Excel
  • Automation of processes (Macro programming)
  • Easy and quick interaction with the solution
    through ActiveX Buttons and Userforms
  • Possibility to embed analytical solutions in a
    simple user-front to end users without the right
    competence

VBA allows in-house built solutions
47
VBA Building a statistical tool for analyzing
and forecasting Debt Collection
VBA
  • With VBA it is possible to deliver customized
    solutions to end users
  • Problems a lot of work to implement statistical
    algorithms, Solver can get slow

VBA is the tool to use to provide end-users with
an interactive work station
48
There are no limits with statistical programming
languages
R

Statistical Programming Languages (COM Server)
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
49
There are no limits with statistical programming
languages
R

Statistical Programming Languages (COM Server)
Analysis Tool pack
DDE
Third- party Add-ins
Data Mining Add-in
OLE
1991
Excel
ActiveX
Solver
1996
DCOM
COM
1999
.Net
VBA
WCF
50
R is becoming the standard in the scientific
community
R
  • R is a statistical programming language with
    syntax similar to S-plus
  • R is free (under GNU license)
  • R uses statistical libraries created by
    statisticians all over the world
  • R communicates with Excel through a COM server
  • COM is a set of interfaces that covers OLE,
    ActiveX, DCOM, ...
  • R Excel add-in
  • Background mode
  • Small Ribbon toolbar
  • Fast
  • Code embedded in
  • Worksheet functions
  • VBA
  • Cells

R allows analysts to implement the most advanced
mathematical models
51
Histograms are a dangerous tool to approximate
empirical pdfs
R
With standard Excel
52
Histograms are a dangerous tool to approximate
empirical pdfs
R
With standard Excel
53
Histograms are a dangerous tool to approximate
empirical pdfs
R
With standard Excel
54
With the R add-in, advanced semiparametric
methods are available
R
With the R add-in for Excel
Empirical probability distribution functions are
easily approximated with the R add-in
55
With the R add-in, advanced semiparametric
methods are available
R
With the R add-in for Excel
Empirical probability distribution functions are
easily approximated with the R add-in
56
With the R add-in, advanced semiparametric
methods are available
R
With the R add-in for Excel
Empirical probability distribution functions are
easily approximated with the R add-in
57
With the R add-in, advanced semiparametric
methods are available
R
With the R add-in for Excel
Empirical probability distribution functions are
easily approximated with the R add-in
58
With the R add-in, advanced semiparametric
methods are available
R
With the R add-in for Excel
Empirical probability distribution functions are
easily approximated with the R add-in
59
Statistical programming languages problem case
R
  • Forecasting Oslo Børs Hovedindeks

?

Some problems demand advanced statistical
approaches
60
Statistical programming languages problem case
R
  • Using Monte Carlo simulation to predict default
    in Specialized Lending

NIBOR
Oil price
Complex multivariate Monte Carlo models are
developed fast in R
61
Industrial solutions are another alternative
Industry vendors

Industrialized Vendors
Statistical Programming Languages (COM Server)
Analysis Tool pack
Third- party Add-ins
Data Mining Add-in
Excel
Solver
VBA
62
Industrial solutions should be chosen only if the
area requires it
Industry vendors
  • When to consider Industrial Solutions
  • Big companies (important deployment)
  • Special industrial subject area
  • Data warehouse integration
  • Highly competent staff within Analytics
  • Expensive investment study worthiness
  • Whatever vendor, check Excel compatibilities
    (reporting, platform migrations, )

The previously presented alternatives for Excel
can do the their job at end-user level
63
Further references
  • Capgemini
  • www.no.capgemini.com (alberto.guillen_at_capgemini.co
    m)
  • Microsoft Data Mining
  • http//www.sqlserverdatamining.com
  • http//www.microsoft.com/sqlserver/2008/en/us/data
    -mining-addins.aspx
  • R
  • http//www.r-project.org
  • http//sunsite.univie.ac.at/rcom/
Write a Comment
User Comments (0)
About PowerShow.com