Title: Data Mining in Satellite Imagery:
1- Data Mining in Satellite Imagery
- Similarity Measure between Regions vs.
Segmentation, Absorbtion Technique vs.
Segmentation Scale -
- Catalin Cucu-Dumitrescu,
- Florin Serban, Manuel Buican, Mihai Datcu
Work in the frame of ESA PECS project RoKEO
2SEGMENTATION
Specialists all over the world are trying to
develop automatic tools for specific data mining.
The main interest is the identification of the
ground structures, both natural and artificial
agricultural fields, lakes, forests, urban areas,
roads, etc. This process is called segmentation.
The accuracy of the border drawing between
segments is the main quality requirement for
segmentation. The resolution scale of the
segmentation is another important matter. It has
to do with the dimension of the objects we want
to extract from an image, i.e., with the level of
details in the final puzzle. The scale of the
segmentation must be adapted to the purpose of
the data mining (the different approaches and
needs of specialists in different fields
interested in satellite images).
3TECHNIQUE OF TRANSFORMING IMAGESIN LINEAR
TEXT-TYPE SEQUENCES BASED ON MST
- Every pixel in the satellite image has
neighbors. We may consider that these direct
relations result in connections and every such
connection has certain intensity, determined by
the difference of color. - This network of connections can be reduced
to a tree. By eliminating some of the connections
that allow the existence of inner cycles, several
trees can be associated to any image. Our
judgment also includes maintaining the
connections between pixels that contain similar
information as a result, we want to keep the
connections with a low sharing. This means
obtaining a minimum spanning tree. - There are several techniques for
determining a MST(the algorithms of Prim,
Kruskal, Boruvka, Dijkstra or the sculptural
algorithm Reverse-Delete). We have adapted the
Kruskal algorithm for our application.
4ABSORPTION
- All algorithms for automatic segmentation
produce a large number of small objects when
applied to a high definition image. Actually, the
large majority of the segments (often over 75)
are, in these cases, small. One can simply
eliminate or ignore them with the inconvenience
of introducing gaps inside or between the larger
remaining objects. In order to avoid this, we
developed an algorithm for absorbing these small
unwanted details into a larger neighbor object. - The principle of the absorption process is
the following - Put a threshold for the mass of pixels, to
define a small object. - Start absorption from the smaller objects, up to
the critical mass objects. - Allow an absorption only into a large enough
object. - Allow an absorption only into a neighbor.
- If many candidates, choose the large neighbor
who has the longest border with the unwanted
small object. - By choosing a correct value for the
threshold, the puzzle of the segmented image is
simplified. The benefits are especially visible
on the mutual border areas and inside the large
objects. -
- Its a cleaning effect.
5ABSORPTION
Left Original Image Smoothed Vector
Segmentation, Right Raster Segmentation Rough
Vector Segmentation (with medium segmentation
threshold over 5,000 objects of different
sizes). No absorption have been made.
6ABSORPTION
Left Original Image Smoothed Vector
Segmentation, Right Raster Segmentation Rough
Vector Segmentation With absorption of the
small objects (with medium segmentation threshold
and a mild triggering size for the small object
absorption gt just 400 object, all of them
greater than the equivalent of 10 pixels).
7ABSORPTION
same as the previous, but with a larger
threshold for the Absorption process. Now there
are only 91 objects left. The network of roads
now comes in one piece, the forest area is well
extracted and the roofs of the houses are
revealed quite correctly.
8ZONE CLASSIFICATION/RECOGNITION
It is difficult to use and analyze the results
of the segmentation for high-resolution images,
as all the details tend to qualify as segments
(objects). In this case, the minimal entity to
consider is rather the region than the object.
A region of the image contains a set of small
objects in a particular geometrical layout.
9Determining the correlation degree between two
codes based on the compression factor
We take into consideration two linear sequences
of symbols, A and B, over the same alphabet
The A sequence will be the reference sequence. A
dictionary will be determined
To be more precise, the dictionary is the sum of
all distinctive sub-expressions, containing at
most m characters that exist in A. We can build
in a similar way a dictionary for B. Let us
assume that we want to express the degree of
resembles between A and B by means of the two
associated dictionaries. We can do this in the
following way
10Determining the correlation degree between two
codes based on the compression factor
As one can see this measure can take values
form a minimum of 0 to a maximum of 1. Obviously,
its value is proportional to the possibility of
expressing the text B through the words of A.
This is not a distance, not even a symmetrical
relation, but it is very intuitive and can be
used for building a compression rate or a
pseudo-distance
The lowest its value is, the more we can conclude
that B is more like A (and vice versa).
Good resemblance. As you can see, A area has a
better compression with respect to B area then
vice versa, because A is simpler then B and a
greater number of its details resemble with some
details of B.
Low resemblance
11Zones classification with respect to a sample zone
The scenario behind the process is the
following 1. Divide the image into a grid of
rectangular square areas (80x80 pixels in all our
examples). 2. Select a specific area as a
reference (one can recognize this initial area as
the most saturated with red square). 3. Use the
PRDC at the zone level, not in order to obtain
the objects in it, but to directly compare every
region with the reference. 4. Reconstruct the
initial image by showing the degree of resembles
of each zone with respect to the selected one
the level of resembles is illustrated through a
level of red color shift, low resembles through a
white shift.
12Zones classification with respect to a sample zone
Left Bucharest region in Romania. Right The
region is classified with respect to the selected
zone (indicated).
13Zones classification with respect to a sample zone
Left Bucharest region in Romania. Right The
region is classified with respect to the selected
zone (indicated).