Title: Examining Activity Patterns Using Fuzzy Clustering
1Examining Activity Patterns Using Fuzzy Clustering
- by
- D De Silva, University of Calgary
- JD Hunt, University of Calgary
- PROCESSUS Second International Colloquium
- Toronto ON, Canada
- June 2005
2Overview
- Introduction
- Data
- Method
- Preliminary Results
- Conclusions
3Introduction
- Context
- Activity-based transport models increasing
- Need for grouping into segments
- At present seems largely based on received wisdom
- Motivations
- Opportunity in Calgary
- Large Household Activity Diary Survey
- Interest in Activity-based model development
- Willingness to explore issue of grouping
- Increase understanding of activity patterns
resulting from behavioral processes
4Introduction
- Previous work
- Fair amount of work drawing in essence on three
basic elements - Data interpretation
- Similarity or Dissimilarity Measures
- Pattern Recognition Algorithms
5Introduction
- Previous work (Contd.)
- Data Interpretation
- Some used Time Slices in 5 to 15 minute intervals
(Recker et al Wilson) - Others Disagreed with it and used number of stops
made. (Pas) - Similarity or Dissimilarity Measures
- Similarity Matrix (PasWilson Ma)
- Sequential Alignment Method (Wilson Jun Ma)
- Walsh-Hadamand transformation, a Fourier Type
Analysis, (Recker et al) - Pattern Recognition Algorithms
- All have used Crisp Clustering Methods
6Introduction
- Previous work (Contd.)
- Groups with similar activities
- Pas 12 groups based on the number of non-home
stops - Recker 7 Groups based on Socio Economic Data
- Wilson 8 groups Similar to Recker
- Applications
- To Model Inter Shopping Duration (Bhat)
- Micro simulation of Activity Patterns
(Kitamura et al Kulkarni et al) - Extension the work described here
- Time Slices
- Sequential Alignment Method
- Fuzzy Clustering
7DataHousehold Activity Survey (HAS)
- 24-hour diary
- Fall of 2001
- Sample size
- 8,400 households overall
- 5,900 on weekdays
- 15-minute intervals
- activity
- location
- Activities in 19 categories
- Locations
- X,Y
- Home, Work , Travel, Other
- All household members
8Activities Covered in HAS
- Travel (A)
- Pick Up Someone (B)
- Drop Off Someone (C)
- Work (D)
- School / Homework (E)
- Shopping (F)
- Daycare (G)
- Social (H)
- Eating (J)
- Entertainment / Leisure (K)
- Medical / Financial (L)
- Exercise (M)
- Religious / Civic (N)
- Sleeping (O)
- Household Chores (P)
- Park / Un-park Vehicle (X)
- Work-Travel(e.g. Taxi Driver) (Y)
- Out-of-Town (Z)
9Example Sequence
- Activity Sequence of
- 30 min Sleep
- 15 min Eat
- 30 min Travel
- 1 hr Work
- O O J A A D D D D
10Initial Sample for Testing
- Covered in this presentation
- 75 persons
- 50 households
- Just activity type and weekdays (not location
weekends) - Later consider
- Full sample
- Weekends and weekdays
- Location types as a further dimension
11Method
12Sequential Alignment Method (SAM)
- Alignment Methods first used in field of
Molecular Biology for DNA matching - Activity Travel Patterns Intrinsically Sequential
- SAM Evaluation of Sequence of Characters
- Global Alignment (Whole Sequence)
- Local Alignment (Short sequence within entire
sequence) - Simplest case is Pairwise alignment
13Sequential Alignment Method
- Pairwise Alignment
- Two Character Sequences
- ID 1 O O J A A D D D D
- ID 2 O O O J A D D D O
- Elementary Operations until equal
- Insertions and Deletions (Indel)
- Gaps
- Gap insertion and extension Penalties
- Global Alignment Needleman Wunch algorithm
minimizing the distance or maximizing the
similarity - ID 1 - O O J A A D D D D -
- ID 2 O O O J A - D D D O
- Similarity Score 70
- Lesser operations ? Similar Pair
14Sequential Alignment Method
- Gap Opening and Extension Penalties
- Role of gap penalty
- High Value
- Alignment compressed
- Literally to matches avoiding gaping
- Resemble main activities at their relative times
- Recommended values 8 and 3 (Wilson)
- Low Value
- Identification of similar activities displaced
during the day - Better pairwise comparison
- Little similarity to the actual activity Pattern
- Recommended values 1 and 0.1 (Wilson)
- Tested and accepted recommendation of Low Value
for Transportation Research (Wilson)
15Sequential Alignment Method
- Multiple Alignment
- Extension of pairwise alignment to N dimensions
- Computation power enormous after 10 sequences of
reasonable length - Approximation method based on data of pairwise
alignment - Use of ClustalG software by Wilson
16Sequential Alignment Method
- Output is a Dissimilarity Matrix
17Fuzzy Clustering
- Partition Clustering Method
- Number of clusters k - specified in front
- The Objects (Activity Patterns) are not assigned
to a particular cluster but assigned a membership
ranging between 0 and 1 for all clusters - Uses S-plus Software (Kaufman Procedure)
- Dissimilarity matrix is input
18Fuzzy Clustering
- Minimize Objective Function (Kaufman)
19Fuzzy Clustering
- Number of clusters ?
- An Open question To be determined as part of
research - Two quality indices from S-Plus
- Dunns Coefficient
- Average Silhouette Value with Shadow plot
20Fuzzy Clustering
- Dunns Coefficient
- Where Fk always lies in the range 1/k,1.
- Â Â ? entirely Fuzzy Clustering ?
- ? Crisp Clustering ?
21Fuzzy Clustering
- Average Silhouette Value (ASV) with Shadow plot
- Strength of Classification to the nearest crisp
cluster compared to the next best cluster
- Width of Bar
- 1 Well Classified
- 0 Between two clusters
- 0lt - Badly classified
- (lies near the next best cluster)
- Average Value gives a approximation to the best
number of clusters - ASV must be higher than 0.25
22Cluster Center Interpretation
- Distributions of socio-economic variables
- Basis for grouping in subsequent modeling
- Person characteristics
- Age
- Gender
- Person type category from survey
- Employment Status
- Household characteristics attributed to persons
- Only income so far
- Household structure later
- Fuzzy weighted frequency distributions
- Need for eventual Crisp
- Potentially use logit to assign cluster
membership values - Calibrate utility functions for clusters with
person characteristics - Use Monte Carlo to select specific cluster in
each case
23Cluster Center Interpretation
- Fuzzy Weighted Frequency Distributions
- Bar for category in histogram for cluster is
Percentage sum of people for that category in
entire sample factored by cluster membership
24Results
- Sequential Alignment
- Low Vs High Gap Penalty Results
- Cluster plot for 3 clusters
- Low Gap High Gap
25Results
- Shadow Plot
- Low Gap High Gap
Co efficient Low Gap High Gap
Dunns Co-efficient 0.4 0.33
Average Silhouette Value 0.4 0.3
- Use low Gap Penalty consistent with
recommendation (1 and .1)
26Results
- Number of Clusters
- Clustal Plot Helps to See the potential range of
number of clusters for Clustering
27Results
- Number of Clusters
- Potential range 2 to 5
28Results
- Number of Clusters (k)
- K2
- Fk 0.60 ASV 0.42
29Results
- Number of Clusters (k)
- K3
- Fk 0.43 ASV 0.40
30Results
- Number of Clusters (k)
- K 4
- Fk 0.34 ASV 0.32
31Results
- Number of Clusters (k)
- K 5
- Fk 0.28 ASV 0.20
32Results
- Number of Clusters (k) ?
- Use 3 clusters for testing
- Expect different for total sample
2 Clusters 3 Clusters 4 Clusters 5 Clusters
Fk 0.60 0.43 0.34 0.28
ASV 0.42 0.40 0.32 0.20
33- Fuzzy Cluster
- Memberships
- Output of S-plus software
- HH2701 has almost equal memberships to all three
clusters -
34Results
- Fuzzy weighted frequency Distribution
35Results
Crisp presentation
36Results
- Cluster Interpretation - tends to be more
- Cluster 1
- Students age of 5 to 15
- Mainly KEJS and youths
- Cluster 2
- Females
- Seniors and other adults in Age range 66-70
- Retired home makers and volunteers
- Cluster 3
- Males
- 100 Adults workers
- Age 40s
- Majority Adults workers not needing a car to work
- Expect different for total sample
37Conclusions
- Methods seems to work well to identify the
clusters as intended no hurdles. - Fuzzy clustering better indicate strength of
membership - Best to have multiple measures quality of
clustering regarding number of clusters - Still work in progress
- Results not complete just for example
- But essential elements of analysis process set
38Conclusions
- Future Work
- Proceeding to full sample of 8,400 households
including Weekends - Expanding to location dimension
- Calibrate Logit model for allocation of clusters
- Consider Household Structure
39Thank You