Examining Activity Patterns Using Fuzzy Clustering - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Examining Activity Patterns Using Fuzzy Clustering

Description:

Interest in Activity-based model development. Willingness to explore issue of grouping ... Others Disagreed with it and used number of stops made. ( Pas) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 40
Provided by: Gues361
Category:

less

Transcript and Presenter's Notes

Title: Examining Activity Patterns Using Fuzzy Clustering


1
Examining Activity Patterns Using Fuzzy Clustering
  • by
  • D De Silva, University of Calgary
  • JD Hunt, University of Calgary
  • PROCESSUS Second International Colloquium
  • Toronto ON, Canada
  • June 2005

2
Overview
  • Introduction
  • Data
  • Method
  • Preliminary Results
  • Conclusions

3
Introduction
  • Context
  • Activity-based transport models increasing
  • Need for grouping into segments
  • At present seems largely based on received wisdom
  • Motivations
  • Opportunity in Calgary
  • Large Household Activity Diary Survey
  • Interest in Activity-based model development
  • Willingness to explore issue of grouping
  • Increase understanding of activity patterns
    resulting from behavioral processes

4
Introduction
  • Previous work
  • Fair amount of work drawing in essence on three
    basic elements
  • Data interpretation
  • Similarity or Dissimilarity Measures
  • Pattern Recognition Algorithms

5
Introduction
  • Previous work (Contd.)
  • Data Interpretation
  • Some used Time Slices in 5 to 15 minute intervals
    (Recker et al Wilson)
  • Others Disagreed with it and used number of stops
    made. (Pas)
  • Similarity or Dissimilarity Measures
  • Similarity Matrix (PasWilson Ma)
  • Sequential Alignment Method (Wilson Jun Ma)
  • Walsh-Hadamand transformation, a Fourier Type
    Analysis, (Recker et al)
  • Pattern Recognition Algorithms
  • All have used Crisp Clustering Methods

6
Introduction
  • Previous work (Contd.)
  • Groups with similar activities
  • Pas 12 groups based on the number of non-home
    stops
  • Recker 7 Groups based on Socio Economic Data
  • Wilson 8 groups Similar to Recker
  • Applications
  • To Model Inter Shopping Duration (Bhat)
  • Micro simulation of Activity Patterns
    (Kitamura et al Kulkarni et al)
  • Extension the work described here
  • Time Slices
  • Sequential Alignment Method
  • Fuzzy Clustering

7
DataHousehold Activity Survey (HAS)
  • 24-hour diary
  • Fall of 2001
  • Sample size
  • 8,400 households overall
  • 5,900 on weekdays
  • 15-minute intervals
  • activity
  • location
  • Activities in 19 categories
  • Locations
  • X,Y
  • Home, Work , Travel, Other
  • All household members

8
Activities Covered in HAS
  • Travel (A)
  • Pick Up Someone (B)
  • Drop Off Someone (C)
  • Work (D)
  • School / Homework (E)
  • Shopping (F)
  • Daycare (G)
  • Social (H)
  • Eating (J)
  • Entertainment / Leisure (K)
  • Medical / Financial (L)
  • Exercise (M)
  • Religious / Civic (N)
  • Sleeping (O)
  • Household Chores (P)
  • Park / Un-park Vehicle (X)
  • Work-Travel(e.g. Taxi Driver) (Y)
  • Out-of-Town (Z)

9
Example Sequence
  • Activity Sequence of
  • 30 min Sleep
  • 15 min Eat
  • 30 min Travel
  • 1 hr Work
  • O O J A A D D D D

10
Initial Sample for Testing
  • Covered in this presentation
  • 75 persons
  • 50 households
  • Just activity type and weekdays (not location
    weekends)
  • Later consider
  • Full sample
  • Weekends and weekdays
  • Location types as a further dimension

11
Method
12
Sequential Alignment Method (SAM)
  • Alignment Methods first used in field of
    Molecular Biology for DNA matching
  • Activity Travel Patterns Intrinsically Sequential
  • SAM Evaluation of Sequence of Characters
  • Global Alignment (Whole Sequence)
  • Local Alignment (Short sequence within entire
    sequence)
  • Simplest case is Pairwise alignment

13
Sequential Alignment Method
  • Pairwise Alignment
  • Two Character Sequences
  • ID 1 O O J A A D D D D
  • ID 2 O O O J A D D D O
  • Elementary Operations until equal
  • Insertions and Deletions (Indel)
  • Gaps
  • Gap insertion and extension Penalties
  • Global Alignment Needleman Wunch algorithm
    minimizing the distance or maximizing the
    similarity
  • ID 1 - O O J A A D D D D -
  • ID 2 O O O J A - D D D O
  • Similarity Score 70
  • Lesser operations ? Similar Pair

14
Sequential Alignment Method
  • Gap Opening and Extension Penalties
  • Role of gap penalty
  • High Value
  • Alignment compressed
  • Literally to matches avoiding gaping
  • Resemble main activities at their relative times
  • Recommended values 8 and 3 (Wilson)
  • Low Value
  • Identification of similar activities displaced
    during the day
  • Better pairwise comparison
  • Little similarity to the actual activity Pattern
  • Recommended values 1 and 0.1 (Wilson)
  • Tested and accepted recommendation of Low Value
    for Transportation Research (Wilson)

15
Sequential Alignment Method
  • Multiple Alignment
  • Extension of pairwise alignment to N dimensions
  • Computation power enormous after 10 sequences of
    reasonable length
  • Approximation method based on data of pairwise
    alignment
  • Use of ClustalG software by Wilson

16
Sequential Alignment Method
  • Output is a Dissimilarity Matrix

17
Fuzzy Clustering
  • Partition Clustering Method
  • Number of clusters k - specified in front
  • The Objects (Activity Patterns) are not assigned
    to a particular cluster but assigned a membership
    ranging between 0 and 1 for all clusters
  • Uses S-plus Software (Kaufman Procedure)
  • Dissimilarity matrix is input

18
Fuzzy Clustering
  • Minimize Objective Function (Kaufman)

19
Fuzzy Clustering
  • Number of clusters ?
  • An Open question To be determined as part of
    research
  • Two quality indices from S-Plus
  • Dunns Coefficient
  • Average Silhouette Value with Shadow plot

20
Fuzzy Clustering
  • Dunns Coefficient
  • Where Fk always lies in the range 1/k,1.
  •    ? entirely Fuzzy Clustering ?
  • ? Crisp Clustering ?

21
Fuzzy Clustering
  • Average Silhouette Value (ASV) with Shadow plot
  • Strength of Classification to the nearest crisp
    cluster compared to the next best cluster


  • Width of Bar
  • 1 Well Classified
  • 0 Between two clusters
  • 0lt - Badly classified
  • (lies near the next best cluster)
  • Average Value gives a approximation to the best
    number of clusters
  • ASV must be higher than 0.25

22
Cluster Center Interpretation
  • Distributions of socio-economic variables
  • Basis for grouping in subsequent modeling
  • Person characteristics
  • Age
  • Gender
  • Person type category from survey
  • Employment Status
  • Household characteristics attributed to persons
  • Only income so far
  • Household structure later
  • Fuzzy weighted frequency distributions
  • Need for eventual Crisp
  • Potentially use logit to assign cluster
    membership values
  • Calibrate utility functions for clusters with
    person characteristics
  • Use Monte Carlo to select specific cluster in
    each case

23
Cluster Center Interpretation
  • Fuzzy Weighted Frequency Distributions
  • Bar for category in histogram for cluster is
    Percentage sum of people for that category in
    entire sample factored by cluster membership

24
Results
  • Sequential Alignment
  • Low Vs High Gap Penalty Results
  • Cluster plot for 3 clusters
  • Low Gap High Gap

25
Results
  • Shadow Plot
  • Low Gap High Gap

Co efficient Low Gap High Gap
Dunns Co-efficient 0.4 0.33
Average Silhouette Value 0.4 0.3
  • Use low Gap Penalty consistent with
    recommendation (1 and .1)

26
Results
  • Number of Clusters
  • Clustal Plot Helps to See the potential range of
    number of clusters for Clustering

27
Results
  • Number of Clusters
  • Potential range 2 to 5

28
Results
  • Number of Clusters (k)
  • K2
  • Fk 0.60 ASV 0.42

29
Results
  • Number of Clusters (k)
  • K3
  • Fk 0.43 ASV 0.40

30
Results
  • Number of Clusters (k)
  • K 4
  • Fk 0.34 ASV 0.32

31
Results
  • Number of Clusters (k)
  • K 5
  • Fk 0.28 ASV 0.20

32
Results
  • Number of Clusters (k) ?
  • Use 3 clusters for testing
  • Expect different for total sample

2 Clusters 3 Clusters 4 Clusters 5 Clusters
Fk 0.60 0.43 0.34 0.28
ASV 0.42 0.40 0.32 0.20
33
  • Fuzzy Cluster
  • Memberships
  • Output of S-plus software
  • HH2701 has almost equal memberships to all three
    clusters -

34
Results
  • Fuzzy weighted frequency Distribution

35
Results
  • Cluster Interpretation

Crisp presentation
36
Results
  • Cluster Interpretation - tends to be more
  • Cluster 1
  • Students age of 5 to 15
  • Mainly KEJS and youths
  • Cluster 2
  • Females
  • Seniors and other adults in Age range 66-70
  • Retired home makers and volunteers
  • Cluster 3
  • Males
  • 100 Adults workers
  • Age 40s
  • Majority Adults workers not needing a car to work
  • Expect different for total sample

37
Conclusions
  • Methods seems to work well to identify the
    clusters as intended no hurdles.
  • Fuzzy clustering better indicate strength of
    membership
  • Best to have multiple measures quality of
    clustering regarding number of clusters
  • Still work in progress
  • Results not complete just for example
  • But essential elements of analysis process set

38
Conclusions
  • Future Work
  • Proceeding to full sample of 8,400 households
    including Weekends
  • Expanding to location dimension
  • Calibrate Logit model for allocation of clusters
  • Consider Household Structure

39
Thank You
  • ?
Write a Comment
User Comments (0)
About PowerShow.com