Examining Activity Patterns Using Fuzzy Clustering - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Examining Activity Patterns Using Fuzzy Clustering

Description:

Interest in Activity-based model development. Willingness to explore issue of grouping ... Others Disagreed with it and used number of stops made. ( Pas) ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 40

Provided by: Gues361

Category:

more less

Transcript and Presenter's Notes

Title: Examining Activity Patterns Using Fuzzy Clustering

1
Examining Activity Patterns Using Fuzzy Clustering

by
D De Silva, University of Calgary
JD Hunt, University of Calgary
PROCESSUS Second International Colloquium
Toronto ON, Canada
June 2005

2
Overview

Introduction
Data
Method
Preliminary Results
Conclusions

3
Introduction

Context
Activity-based transport models increasing
Need for grouping into segments
At present seems largely based on received wisdom
Motivations
Opportunity in Calgary
Large Household Activity Diary Survey
Interest in Activity-based model development
Willingness to explore issue of grouping
Increase understanding of activity patterns
resulting from behavioral processes

4
Introduction

Previous work
Fair amount of work drawing in essence on three
basic elements
Data interpretation
Similarity or Dissimilarity Measures
Pattern Recognition Algorithms

5
Introduction

Previous work (Contd.)
Data Interpretation
Some used Time Slices in 5 to 15 minute intervals
(Recker et al Wilson)
Others Disagreed with it and used number of stops
made. (Pas)
Similarity or Dissimilarity Measures
Similarity Matrix (PasWilson Ma)
Sequential Alignment Method (Wilson Jun Ma)
Walsh-Hadamand transformation, a Fourier Type
Analysis, (Recker et al)
Pattern Recognition Algorithms
All have used Crisp Clustering Methods

6
Introduction

Previous work (Contd.)
Groups with similar activities
Pas 12 groups based on the number of non-home
stops
Recker 7 Groups based on Socio Economic Data
Wilson 8 groups Similar to Recker
Applications
To Model Inter Shopping Duration (Bhat)
Micro simulation of Activity Patterns
(Kitamura et al Kulkarni et al)
Extension the work described here
Time Slices
Sequential Alignment Method
Fuzzy Clustering

7
DataHousehold Activity Survey (HAS)

24-hour diary
Fall of 2001
Sample size
8,400 households overall
5,900 on weekdays
15-minute intervals
activity
location
Activities in 19 categories
Locations
X,Y
Home, Work , Travel, Other
All household members

8
Activities Covered in HAS

Travel (A)
Pick Up Someone (B)
Drop Off Someone (C)
Work (D)
School / Homework (E)
Shopping (F)
Daycare (G)
Social (H)
Eating (J)

Entertainment / Leisure (K)
Medical / Financial (L)
Exercise (M)
Religious / Civic (N)
Sleeping (O)
Household Chores (P)
Park / Un-park Vehicle (X)
Work-Travel(e.g. Taxi Driver) (Y)
Out-of-Town (Z)

9
Example Sequence

Activity Sequence of
30 min Sleep
15 min Eat
30 min Travel
1 hr Work
O O J A A D D D D

10
Initial Sample for Testing

Covered in this presentation
75 persons
50 households
Just activity type and weekdays (not location
weekends)
Later consider
Full sample
Weekends and weekdays
Location types as a further dimension

11
Method
12
Sequential Alignment Method (SAM)

Alignment Methods first used in field of
Molecular Biology for DNA matching
Activity Travel Patterns Intrinsically Sequential
SAM Evaluation of Sequence of Characters
Global Alignment (Whole Sequence)
Local Alignment (Short sequence within entire
sequence)
Simplest case is Pairwise alignment

13
Sequential Alignment Method

Pairwise Alignment
Two Character Sequences
ID 1 O O J A A D D D D
ID 2 O O O J A D D D O
Elementary Operations until equal
Insertions and Deletions (Indel)
Gaps
Gap insertion and extension Penalties
Global Alignment Needleman Wunch algorithm
minimizing the distance or maximizing the
similarity
ID 1 - O O J A A D D D D -
ID 2 O O O J A - D D D O
Similarity Score 70
Lesser operations ? Similar Pair

14
Sequential Alignment Method

Gap Opening and Extension Penalties
Role of gap penalty
High Value
Alignment compressed
Literally to matches avoiding gaping
Resemble main activities at their relative times
Recommended values 8 and 3 (Wilson)
Low Value
Identification of similar activities displaced
during the day
Better pairwise comparison
Little similarity to the actual activity Pattern
Recommended values 1 and 0.1 (Wilson)
Tested and accepted recommendation of Low Value
for Transportation Research (Wilson)

15
Sequential Alignment Method

Multiple Alignment
Extension of pairwise alignment to N dimensions
Computation power enormous after 10 sequences of
reasonable length
Approximation method based on data of pairwise
alignment
Use of ClustalG software by Wilson

16
Sequential Alignment Method

Output is a Dissimilarity Matrix

17
Fuzzy Clustering

Partition Clustering Method
Number of clusters k - specified in front
The Objects (Activity Patterns) are not assigned
to a particular cluster but assigned a membership
ranging between 0 and 1 for all clusters
Uses S-plus Software (Kaufman Procedure)
Dissimilarity matrix is input

18
Fuzzy Clustering

Minimize Objective Function (Kaufman)

19
Fuzzy Clustering

Number of clusters ?
An Open question To be determined as part of
research
Two quality indices from S-Plus
Dunns Coefficient
Average Silhouette Value with Shadow plot

20
Fuzzy Clustering

Dunns Coefficient
Where Fk always lies in the range 1/k,1.
? entirely Fuzzy Clustering ?
? Crisp Clustering ?

21
Fuzzy Clustering

Average Silhouette Value (ASV) with Shadow plot
Strength of Classification to the nearest crisp
cluster compared to the next best cluster

Width of Bar
1 Well Classified
0 Between two clusters
0lt - Badly classified
(lies near the next best cluster)
Average Value gives a approximation to the best
number of clusters
ASV must be higher than 0.25

22
Cluster Center Interpretation

Distributions of socio-economic variables
Basis for grouping in subsequent modeling
Person characteristics
Age
Gender
Person type category from survey
Employment Status
Household characteristics attributed to persons
Only income so far
Household structure later
Fuzzy weighted frequency distributions
Need for eventual Crisp
Potentially use logit to assign cluster
membership values
Calibrate utility functions for clusters with
person characteristics
Use Monte Carlo to select specific cluster in
each case

23
Cluster Center Interpretation

Fuzzy Weighted Frequency Distributions
Bar for category in histogram for cluster is
Percentage sum of people for that category in
entire sample factored by cluster membership

24
Results

Sequential Alignment
Low Vs High Gap Penalty Results
Cluster plot for 3 clusters
Low Gap High Gap

25
Results

Shadow Plot
Low Gap High Gap

Co efficient Low Gap High Gap
Dunns Co-efficient 0.4 0.33
Average Silhouette Value 0.4 0.3

Use low Gap Penalty consistent with
recommendation (1 and .1)

26
Results

Number of Clusters
Clustal Plot Helps to See the potential range of
number of clusters for Clustering

27
Results

Number of Clusters
Potential range 2 to 5

28
Results

Number of Clusters (k)
K2
Fk 0.60 ASV 0.42

29
Results

Number of Clusters (k)
K3
Fk 0.43 ASV 0.40

30
Results

Number of Clusters (k)
K 4
Fk 0.34 ASV 0.32

31
Results

Number of Clusters (k)
K 5
Fk 0.28 ASV 0.20

32
Results

Number of Clusters (k) ?
Use 3 clusters for testing
Expect different for total sample

2 Clusters 3 Clusters 4 Clusters 5 Clusters
Fk 0.60 0.43 0.34 0.28
ASV 0.42 0.40 0.32 0.20
33

Fuzzy Cluster
Memberships
Output of S-plus software
HH2701 has almost equal memberships to all three
clusters -

34
Results

Fuzzy weighted frequency Distribution

35
Results

Cluster Interpretation

Crisp presentation
36
Results

Cluster Interpretation - tends to be more
Cluster 1
Students age of 5 to 15
Mainly KEJS and youths
Cluster 2
Females
Seniors and other adults in Age range 66-70
Retired home makers and volunteers
Cluster 3
Males
100 Adults workers
Age 40s
Majority Adults workers not needing a car to work
Expect different for total sample

37
Conclusions

Methods seems to work well to identify the
clusters as intended no hurdles.
Fuzzy clustering better indicate strength of
membership
Best to have multiple measures quality of
clustering regarding number of clusters
Still work in progress
Results not complete just for example
But essential elements of analysis process set

38
Conclusions