Title: Indexing Fingerprint Data with Clustering Approach
1Indexing Fingerprint Data with Clustering Approach
- Wooseok Ryu
- Database Lab.
- September 22, 2007
2Summary of Previous Seminar
- 2 approaches for indexing fingerprint data when
the number of Access Point is many - Multi-Dimensional Approach
- Description
- Assign dimension to each Access Point
- The number of dimension is same as the number of
AP - Problem
- Curse of dimensionality
- Set of 1-D Index Approach
- Description
- Maintaining one 1-D Index per Access Point
- Merge result of each index
- Problem
- Not faster than M-D Index Approach
3Todays Topic
- Comments at Previous Seminar
- How to reduce dimensions in the Multi-Dimensional
Approach? - Index Pruning Approach
- Clustering Approach
- Todays Topic
- Review of Clustering Approach
- Problem of Previous Approach
- Basic Idea for Clustering
4Clustering Approach
- Clustering
- To reduce the computational cost of searching the
radio map - A set of locations sharing a common set of access
points
5Related Work Clustering
- Clustering at the offline phase
- Group locations into clusters according to access
points that cover the locations - For a given location l, use the set of the q
strongest access points covering this location as
the cluster key - Use MaxSignal value of each access points
- Do not use weak access points
WLAN Location Determination via Clustering and
Probability Distributions Youssef, Agrawala,
Shankar (PerCom 2003)
6Related Work Clustering
- Online Location Determination Phase
- Determine cluster to search
- Use q strongest access points to determine one
cluster - Find most probable location within a cluster
- Use Bayes theorem to estimate the probability of
each location within the cluster - Compute each location one by one and choose one
with highest joint probability
7Related Work Clustering
- Experimental Testbed
- Total 110 locations along the corridors (5 feet
apart) - Each location is covered averagely by 4 access
points
8Related Work Clustering
- Experimental Result
- In the offline phase, number of clusters is 15
when q is 3 - Average cluster size is about 6 7.
- Clustering reduces the average number of
operations per location estimate by more than an
order of magnitude
9Problem of Related Work
- Problem of using only MaxSignal
- Observed vector can fall into wrong cluster
- Because of the characteristics of probability
distribution - Simply using q strongest access points to
determine one cluster cannot guarantee the
accuracy
Example of off-line Clustering
Example of on-line location determination
C1 AP1, AP2, AP3
C2 AP1, AP2, AP4
AP1
AP2
L1 L2 L3 L4 L5 L6 L7 L8
L9 L10 L11 L12 L13 L14 L15 L16
L17 L18 L19 L20 L21 L22 L23 L24
AP3
AP4
C3 AP1, AP3, AP4
C4 AP2, AP3, AP4
10Problem of Related Work
- Summary
- When choosing q strongest access points at each
location - Using just MaxSignal value degrades accuracy
- How to solve the problem?
- we must consider probability distribution
- not only MaxSignal values
11Basic Approach
- Assigning one location into multiple clusters
- Compute possible combination of q strongest
access points from one locations
3 strongest AP based on Max Signal ? AP1, AP2,
AP3
P
0.5
Possible cases when q is 3
0.4
0.3
0.2
0.1
AP1
AP3
SS at Location 9
-90
-80
-70
-60
-50
-40
-30
-100
AP4
AP2
12Basic Approach
- Multiple Clustering Example
- assign each location into multiple clusters
- Consider all possible combinations
C1 AP1, AP2, AP3
C2 AP1, AP2, AP4
AP1
AP2
L1 L2 L3 L4 L5 L6 L7 L8
L9 L10 L11 L12 L13 L14 L15 L16
L17 L18 L19 L20 L21 L22 L23 L24
AP3
AP4
Location 9 is assigned to cluster C1, C2, C3
C3 AP1, AP3, AP4
C4 AP2, AP3, AP4
13Problem Domain
- Processing Real-Time Phase
- 1. find cluster that contains q strong access
points - 2. compute probability distribution of each
location in the cluster
1. Find matching cluster for received signal
-45, -56, -65, -80
2. In a selected cluster, find best matching
location for signal -45, -56, -65, -80 ?
14First Step Finding matching cluster
- Naïve Approach
- Sequential scan
- Performance degrades when the number of cluster
is many
Received Signal AP1 -45, AP2 -56, AP3
-65, AP4 -80 3 strongest access points
AP1, AP2, AP3
15First Step Finding matching cluster
- Indexing Cluster Key
- Assign each Access Point to unique number as
following - 20, 21, 22, , 2k (k is the number of all access
points) - Example
- AP1 20
- AP2 21
- AP3 22
- AP4 23
- Unique number can be assigned to each cluster
16First Step Finding matching cluster
- Indexing Cluster Key is same as indexing Cluster
Number - We can use well-known 1-D index structure
- B-Tree for Disk Index
- T-Tree for Main Memory Index
11, C2
7, C1
14, C3
13, C4
Index Representation using Binary Tree
17Second Step Search a best matching location
- Search best matching location in a cluster
- Use q strongest access points, model q1
dimensional space
Joint Probability
1
0
-100
0
Signal Strength of AP1
-100
0
Signal Strength of AP2
Example value of q is 2, cluster key is AP1,
AP2
18Second Step Search a best matching location
- Example of Joint Probability Distribution over 2
Access Points
19Second Step Search a best matching location
- Indexing Multi-Dimensional Space
- Use GRID Approach for fast stabbing Query
SS of AP2
-20
-30
MBR7
-40
-50
MBR6
-60
-70
MBR5
-80
-90
SS of AP1
-90
-80
-70
-60
-50
-40
-30
-20
-10
-100
0
MBR4
MBR1
MBR3
MBR2
20Discussion
- Using this Approach
- multi-dimensional index problem can be reduced by
adopting clustering approach - Two-step approach
- Finding appropriate cluster
- Search location within one cluster
- Further step
- More efficient access method for a cluster
- How to get optimal q value ?
- Previous approach just showed experimental result
- Optimal value of q highly depends on real
environment - This is very critical for performance as well as
accuracy
21What is optimal q value?
- Real Example showing 24 locations at 6th Bldg
- Too many clusters when q is 3