Outline

About This Presentation

Title:

Outline

Description:

High. Test sample. Training sample. Prediction error ... from weather forecasts and calendar records (season, weekday, holiday) ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 16

Provided by: ANG109

Category:

Tags: outline

more less

Transcript and Presenter's Notes

Title: Outline

1
Outline

The five pillars of data mining
Supervised and unsupervised learning

2
Data mining

The process of
Selecting
Exploring
Modifying
Modeling
Assessing
large amounts of data to uncover previously
unknown patterns

3
SEMMA
Sample the data by creating one or more data
tables Explore the data by searching for (i)
anticipated relationships and trends (ii)
unanticipated relationships and trends (iii)
anomalies Modify the data by transforming
variables and combining existing variables into
new variables Model the data by searching for a
combination of the data that reliably predicts a
desired outcome Assess the data by evaluating
the usefulness and reliability of the findings
from the data mining process
4
Sample the data and create data tables
Cases and variables
Objects and attributes
5
Examine anticipated relationships electricity
consumption and temperature
6
Examine the presence of outliersTotal nitrogen
concentrations in Swedish riversdetermined by
two different methods
7
Modifying inputs

Transforming inputs or outputs
Combining existing variables into new variables
Aggregating inputs
Reducing the dimension of the inputs

8
Model selection credit scoring

Candidate predictors
Age
Sex
Income
Marital status
Education
Savings
Loans
Payment records
Houseowner
.
.
.
Subset selection aims to produce a model that is
interpretable and has possibly lower prediction
error

9
Bias, Variance and Model Complexity

Low Bias High Variance
High Bias Low Variance
Test sample
Prediction error
Training sample
Low
High
Model complexity
10
Statistical learning

Supervised learning (prediction, classification)
We have a training set of data, in which we
observe the outcome and feature measurements for
a set of objects
Using this data we build a prediction model, or
learner, which will enable us to predict the
outcome for new unseen objects
Unsupervised learning (association analysis,
clustering)
We observe only the features and have no
measurements of the outcome.
Our task is to describe how the data are
organized and clustered

Hastie, Tibshirani, and Friedman The elements of
statistical learning
11
Statistical learning problems some examples

Supervised learning (prediction, classification)
Predict tomorrows electricity consumption, from
weather forecasts and calendar records (season,
weekday, holiday)
Identify the numbers in a handwritten ZIP code,
from a digitized image
Unsupervised learning (association analysis)
Identify buying patterns that can be used to
design sales promotions

12
Supervised learning statistical terminology

Prediction of one or more outputs using
observations of one or more inputs
Statistical terminology
Inputs Predictors
Independent variables
Explanatory variables
Outputs Responses
Dependent variables

13
Naming convention

Regression
Prediction of quantitative outputs using one or
more inputs
Classification
Prediction of qualitative outputs using
observations of one or more inputs

14
Prediction by learning from data
Assume that we have a data set whi
ch shows the outcome (response) y for a set of
investigated objects with features x1, , xp
Prediction by learning from data implies that
we derive a function that can be used to
foresee the outcome for new objects (with known
or observed features)
15
Some major types of quantitative prediction models