Exploring similarities and dissimilarities of objects - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Exploring similarities and dissimilarities of objects

Description:

Distance-preserving 2-dimensional scatter charts. MDS scatter charts ... For a given set of N items, find a representation in few dimensions such that ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 32
Provided by: ANG109
Category:

less

Transcript and Presenter's Notes

Title: Exploring similarities and dissimilarities of objects


1
Exploring similarities and dissimilarities of
objects
  • Distance-preserving 2-dimensional scatter charts
  • MDS scatter charts
  • Variation-preserving 2-dimensional score charts
  • PCA score charts
  • Cluster analysis
  • Unconstrained or constrained K-means clustering

2
Multidimensional scaling
  • Primary objective
  • Fit the original data into a low-dimensional
    space so that the distortion is minimized
  • Problem formulation
  • For a given set of N items, find a representation
    in few dimensions such that the N(N-1)/2
    interitem similarities (distances) nearly match
    the original similarities (distances).
  • Terminology
  • The resulting low-dimensional plot is called an
    ordination of the data
  • The numerical measure of closeness of the
    low-dimensional representation is called stress

3
Metric and nonmetric multidimensional scaling
  • Metric multidimensional scaling (principal
    coordinate analysis)
  • Multidimensional scaling based on the actual
    magnitudes of the original similarities
  • Nonmetric multidimensional scaling
  • Multidimensional scaling based on the rank
    orders of the N(N-1)/2 original similarities

4
Metric multidimensional scaling- a simple example
  • Geometrical representation of cities produced by
    MDS and an airline distance table

5
Metric multidimensional scaling- a simple
example of SAS-code
  • Geometrical representation of cities produced by
    MDS and an airline distance table

proc mds datamds.uscities dimension2
outmds.uzcitiesout oconfig run
6
Metric multidimensional scaling- a simple example
  • Geometrical representation of cities produced by
    MDS and an airline distance table

Spokane
Boston
Los Angeles
7
Metric and nonmetric multidimensional scaling-
general problem formulation
  • Consider a set of N items, and an ordering
  • the M N(N-1)/2 interitem similarities.
  • We want to find a q-dimensional configuration
    such that the interitem distances match this
    ordering. A perfect match occurs when
  • As long as the order is preserved, the
    magnitudes of the distances are considered to be
    irrelevant.

8
Minimization of stress
  • We would like to find a q-dimensional
    representation such that
  • is minimized where the are reference
    numbers that are monotonically related to the
    observed similarities

9
Interpretation of stress levels
  • The stress
  • is always between 0 and 1. As q increases, the
    stress will decrease, and it will be zero for for
    q N-1
  • Any stress value less than 0.1 is typically
    taken to mean that the representation is good.

10
Multidimensional scaling- software options

SAS proc distance followed by proc mds ggobi
11
Identifying outliers and anomalies
  • Simple filtering of raw data
  • Analysis of residuals derived from prediction
    models
  • Principal components score charts

12
Detection of anomalies in surface ozone
concentrationsrecorded at Ähtäri, Finland, at
1200
The ozone concentrations are correlated to
several meteorological variables
13
PLS-normalised and seasonally adjusted
concentrations of surface ozone at Ähtäri,
Finland(PLS Partial Least Squares Regression)
  • PLS-normalisation with respect to contemporaneous
    data regarding
  • temperature
  • humidity
  • wind direction
  • wind speed
  • measured at a network of stations
  • Separate PLS-normalisations for each month

14
Modelling ln daily electricity consumption as a
spline function of the population-weighted mean
temperature in Sweden residual analysis
15
From multiple time series of data to a smooth
trend surface
16
Smoothing of the trend function in models of time
series data representing several sites along a
gradient
Spatial smoothing along a gradient Temporal
smoothing across years
17
Smoothing of the trend function in models of time
series data representing several seasons
Sequential smoothing across seasons Temporal
smoothing across years
18
Smoothing of the trend function in models of time
series data representing several sectors
Circular smoothing across sectors Temporal
smoothing across years
19
Remark
  • Different types of time series data may require
    different types of smoothing

20
A simple model for simultaneous smoothingand
adjustment for a single covariate
  • Let be the observed response for the jth
    coordinate the ith year,
  • and let denote a contemporaneous value of
    a covariate. Assume that
  • .

Deterministic trend
Impact of covariate
Random error
Response
21
A semiparametric model for simultaneous
smoothingand adjustment for several covariates
  • Let be the observed response for the jth
    class the ith year,
  • and let denote
    contemporaneous values of covariates.
  • Assume that
  • .

Random error
Deterministic trend
Response
Impact of covariate
Impact of covariate
22
Gradient smoothing
  • .

Penalty of irregular interannual variation
Penalty of irregular variation along the gradient
23
Smoothing of the trend function in models of time
series data representing several sites along a
gradient
Spatial smoothing along a gradient Temporal
smoothing across years
24
Smoothing of the trend function in models of time
series data representing several seasons
Sequential smoothing across seasons Temporal
smoothing across years
25
Smoothing of the trend function in models of time
series data representing several sectors
Circular smoothing across sectors Temporal
smoothing across years
26
Satellite image of bluegreen algae
(cyanobacteria) in the Baltic Sea, summer 2005
Finland
Sweden
Baltic Sea
Algae bloom
Estonia
Latvia
27
Sampling sites for water quality in the Stockholm
archipelago
Baltic Sea
Stockholm
28
Secchi depth (water clarity) in the inner
Stockholm archipelago
Investments in improved nitrogen removal
29
Secchi depth (water clarity) at three stations in
the inner Stockholm archipelago
Water clarity varies strongly within sites
30
Secchi depth (water clarity) at three stations in
the inner Stockholm archipelago
Water clarity varies with water temperature
31
Trend surface for salinity and temperature
normalized Secchi depth data along a salinity
gradient in the inner Stockholm archipelago
Write a Comment
User Comments (0)
About PowerShow.com