Outline - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Outline

Description:

High. Test sample. Training sample. Prediction error ... from weather forecasts and calendar records (season, weekday, holiday) ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 16
Provided by: ANG109
Category:
Tags: outline

less

Transcript and Presenter's Notes

Title: Outline


1
Outline
  • The five pillars of data mining
  • Supervised and unsupervised learning

2
Data mining
  • The process of
  • Selecting
  • Exploring
  • Modifying
  • Modeling
  • Assessing
  • large amounts of data to uncover previously
    unknown patterns

3
SEMMA
Sample the data by creating one or more data
tables Explore the data by searching for (i)
anticipated relationships and trends (ii)
unanticipated relationships and trends (iii)
anomalies Modify the data by transforming
variables and combining existing variables into
new variables Model the data by searching for a
combination of the data that reliably predicts a
desired outcome Assess the data by evaluating
the usefulness and reliability of the findings
from the data mining process
4
Sample the data and create data tables
Cases and variables
Objects and attributes
5
Examine anticipated relationships electricity
consumption and temperature
6
Examine the presence of outliersTotal nitrogen
concentrations in Swedish riversdetermined by
two different methods
7
Modifying inputs
  • Transforming inputs or outputs
  • Combining existing variables into new variables
  • Aggregating inputs
  • Reducing the dimension of the inputs

8
Model selection credit scoring
  • Candidate predictors
  • Age
  • Sex
  • Income
  • Marital status
  • Education
  • Savings
  • Loans
  • Payment records
  • Houseowner
  • .
  • .
  • .
  • Subset selection aims to produce a model that is
    interpretable and has possibly lower prediction
    error

9
Bias, Variance and Model Complexity

Low Bias High Variance
High Bias Low Variance
Test sample
Prediction error
Training sample
Low
High
Model complexity
10
Statistical learning
  • Supervised learning (prediction, classification)
  • We have a training set of data, in which we
    observe the outcome and feature measurements for
    a set of objects
  • Using this data we build a prediction model, or
    learner, which will enable us to predict the
    outcome for new unseen objects
  • Unsupervised learning (association analysis,
    clustering)
  • We observe only the features and have no
    measurements of the outcome.
  • Our task is to describe how the data are
    organized and clustered

Hastie, Tibshirani, and Friedman The elements of
statistical learning
11
Statistical learning problems some examples
  • Supervised learning (prediction, classification)
  • Predict tomorrows electricity consumption, from
    weather forecasts and calendar records (season,
    weekday, holiday)
  • Identify the numbers in a handwritten ZIP code,
    from a digitized image
  • Unsupervised learning (association analysis)
  • Identify buying patterns that can be used to
    design sales promotions

12
Supervised learning statistical terminology
  • Prediction of one or more outputs using
    observations of one or more inputs
  • Statistical terminology
  • Inputs Predictors
  • Independent variables
  • Explanatory variables
  • Outputs Responses
  • Dependent variables

13
Naming convention
  • Regression
  • Prediction of quantitative outputs using one or
    more inputs
  • Classification
  • Prediction of qualitative outputs using
    observations of one or more inputs

14
Prediction by learning from data
Assume that we have a data set whi
ch shows the outcome (response) y for a set of
investigated objects with features x1, , xp
Prediction by learning from data implies that
we derive a function that can be used to
foresee the outcome for new objects (with known
or observed features)
15
Some major types of quantitative prediction models
  • Linear or nonlinear regression models with i.i.d.
    error terms
  • Time series regression models with stochastic
    noise
  • Transfer function models
Write a Comment
User Comments (0)
About PowerShow.com