Title: GEOINFO 2006
1GEOINFO 2006
- Utilização da biblioteca TerraLib para algoritmos
de agrupamento em Sistemas de Informações
Geográficas
Use of the TerraLib library for clustering
algorithms in Geographic Information Systems
Mauricio P. Guidini Carlos H. C. Ribeiro
Supervisor
Nov 2006
2... 3000 unregistered flights, with origin and
destiny unkown by authorities, invaded the
Brazilian airspace in the first ten months of
this year. The Air Force calculates that about
30 of these flights were related to drug dealing
...
Translated from note from
25/10/2004
3Data Mining in GIS
- Objetive
- To present the integration of a Data Mining
algorithm (k-means) to TerraLib/TerraView,
forming a Geographic Information System for
Unknown Air Traffic analysis (GisTAD).
4Data Mining in GIS
- Summary
- Data Mining
- Clustering Algorithms
- Air Traffic
- K-means Implementation
- Results
- Aplication
5Data Mining in GIS
Data Mining Definition A non-trivial process
of identification of valid, new, useful standards
implicitly present in large volumes of data
Knowledge Discovery in Database (KDD) - Fayyad et
al. (1996)
6Data Mining in GIS
- How proceed DM?
- KDD process
7Data Mining in GIS
Clustering Algorithms The clustering process
tries to grouping the data into groups that have
highly similar features, helping the
understanding of the information that they hold.
A good clustering algorithm is characterized by
the production of high level classes, where the
intraclass similarity is high, and the interclass
similarity is low. Han Kamber 2001
8Data Mining in GIS
- Major Categories
- Partitioning k-means, k-medoids
- Hierarchical CURE, BIRCH
- Density-based DBSCAN, OPTICS
- Grid-based STING
- Model-based
- Others
- ANN Kohonen network
- Incremental - Leader
9Data Mining in GIS
- Air Traffic
- Movement of aircraft, national or foreign, that
fly over national territory. - Unkown Air Traffic
- To unidentified airplanes (flight plan), two
lines of action can be takenBernabeu 2004 - Intercept or
- Generate an Unkown Air Traffic Report
10Data Mining in GIS
- Traffic Representation
- Line segments
- Latitude (decimal degrees)
- Longitude (decimal degrees)
- Distance (miles)
- Heading
- Restrictions
- Acceptable deviations
11Data Mining in GIS
K-means algorithm
Precondition set max deviation values to
coordinates, distance and route Begin K0
While criterion condition not satisfied
(deviation in clusters) Increase K
Arbitrarily choose K centers (among data
objects) While centers change (k-means)
(re)assign routes in cluster based on
weights update centers values
end movement intergroups deviation in groups
ok Save results End
12Data Mining in GIS
Distance Measure
- Minimize deviations
- Improve cluster quality
and
13Data Mining in GIS
- GIS Integration
- TerraLib
- TerraView
- k-means
14Data Mining in GIS
- Data preparation
- 8000 records
- looking for information (what?)
- Search space restrictions
15Data Mining in GIS
- Numeric Tests
- to 500 records
- GisTAD Tests
- 319 records
- 73 groups
- Aprox. time 40 sec.
16TerraView
17TerraView
18(No Transcript)
19Data Mining in GIS
- Applications
- Air Operations
- Improper use of air space
20(No Transcript)
21Data Mining in GIS
Conclusion Considering the problem proposed, the
k-means algorithm is applicable, and returned a
good set of clusters. However, the number of
records that must be clustered can make the
application of the algorithm very time consuming.
22Future Work
- Other partitioning algorithms should be
implemented, to verify which one is the most
efficient for the problem in analysis,
considering any size of records to be clustered. - The algorithms to be tested are
- Kohonen neural network
- Leader algorithm.