Handling%20nonnumerical%20variables%20(2) - PowerPoint PPT Presentation

About This Presentation
Title:

Handling%20nonnumerical%20variables%20(2)

Description:

Handling nonnumerical variables (2) Sections 6.3 6.6. Kenrick Bingham. http://iki.fi/kenny ... Better idea: Order labels of different variables so that ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 11
Provided by: kenrick4
Category:

less

Transcript and Presenter's Notes

Title: Handling%20nonnumerical%20variables%20(2)


1
Handling nonnumerical variables (2)
  • Sections 6.36.6

Kenrick Bingham http//iki.fi/kenny/
2
Handling nonnumerical variables (2)
  • Ordering alpha labels
  • joint distribution tables
  • when no numerical variables are available
  • Dimension reduction
  • multidimensional scaling (MDS)

3
Ordering alpha labels
  • Trivial ordering always possible (e.g.,
    alphabetical)
  • Better idea Order labels of different variables
    so that correlations show
  • How?

4
Ordering alpha labels
  • Two-way tables

5
Ordering alpha labels
  • Two-way tables
  • Jigsaw puzzle
  • Problems

6
Ordering alpha labels
  • Two-way tables
  • Jigsaw puzzle
  • Problems
  • Implementation
  • straightforward (?)
  • always possible to numerate some of the variables
    (?)
  • continue with mixed alpha-numeric methods
  • More vars (dimensions) same procedure

7
Dimension reduction
  • Some variables may be known to be correlated and
    do not need to be modeled
  • Project data into fewer dimensions, avoiding
    significant loss of information
  • Example triangle

8
Dimension reduction
  • Error measured by "stress" change in perimeter
    of triangle
  • if no distortion occurs, stress 0
  • Generalization change in the sum of distances
    between all pairs of points

stress 0
stress gt 0
9
Method suggested
  • Dn original data set , dim Dn n, k n
  • Project Dk into Dk 1 as follows
  • rotate Dk randomly
  • project Dk into k 1 dimensions and measure
    stress compared to Dn
  • repeat until required "degree of confidence"
    acheived
  • set Dk 1 to be the projection causing the
    least stress
  • Set k k 1 and repeat until stress gets too
    big

10
Dimension reduction Questions
  • Why not use principle component analysis (PCA)?
  • well understood, firm theoretical basis
  • minimises the MSE
  • MSE more widely used than "stress"
  • no need for random iterations
  • Why in the chapter "Handling nonnumerical
    variables"?
Write a Comment
User Comments (0)
About PowerShow.com