Title: Spatial Data and GIS and Spatial Data Analysis
1Spatial Data and GIS andSpatial Data Analysis
2In this lecture you learn
- What are spatial data and their special
characteristics? - GIS
- Spatial data analysis tasks and techniques
- Applying region growing approaches to
segmentation of area data
3Introduction
- In many domains we process information in
relation to its spatial location - E.g., epidemiological studies are dominated by
geographical distribution of infected cases - Dr Snows study of London Cholera epidemic
- engineering designs have a strong spatial basis
- CAD/CAM systems deal with locations of components
in a design - Image processing involves segmenting pixel data
in relation to their location to identify objects
of interest - Position aware devices such as mobile phones
allow us to track individual movement
4Geo-referenced Data
- Data those are related to geographic locations
are said to be geo-referenced - Dr Snows data is geo-referenced
- Census data is geo-referenced
- Most of our decisions are based on geo-referenced
data - Weather at a location drives our decision to plan
a picnic at that location - Supermarkets decide the size and type of a new
store after thoroughly analysing the
characteristics of the neighbourhood - Building the informational and computational
infrastructure to support storing, retrieving,
analysing and visualising geo-referenced data is
the job of computer scientists - Support for geo-referenced data in MySQl (version
4.1 onwards)
5GIS
- GIS refers to
- Geographic Information System
- Or Geospatial Information System
- GIS offers
- generic (application independent) functionality
required for supporting decision making with
geo-referenced data - Data storage and retrieval
- Data analysis
- Visualization
- GIS combines
- Data analysis and Visualization for helping users
understand geo-referenced data - Therefore is an ideal example for our course
- The focus is on offering generic functionality to
help users understand data rather than make
decisions for them like expert systems
6GIS (2)
- Advancement of Geographic Information Systems
(GIS) and Global Positioning System (GPS) have
allowed us to study most data in relation to its
spatial location - We are now in a position to formulate well formed
spatial queries or hypotheses - Technology is available to answer such queries or
test those hypotheses - All of us will use more and more geo-referenced
data in the future
7GIS Modules
Main Modules of a GIS
Spatial Visualization (Maps)
Spatial Data Analysis
Spatial Database
8Characteristics of Spatial Data
- We use spatial data in this course in its
restrictive sense of geo-referenced data - Spatial Data has two kinds of attributes
- Spatial attributes location information
- E.g. longitude and latitude for points and
boundary information for areas - Non-spatial attributes
- E.g. rainfall or house prices
- We are mainly interested in the non-spatial
attributes - But want to study them taking their location
(spatial attributes) into consideration - Relationships among non-spatial attributes are
explicit - But relationships among spatial attributes are
implicit
9Characteristics of Spatial Data (2)
- Objects with similar attributes usually are
located nearby spatially - Everything is related to everything else but
nearby things are more related than distant
things first law of Geography - In spatial statistics this property is called
spatial auto-correlation - Recall auto-correlation from time series data
- Data values are not independent
- Most geographic locations are unique (spatial
heterogeneity) - Therefore global parameters do not always
accurately describe local values
10Characteristics of Spatial Data (3)
- Special properties of spatial data
- Auto-correlation
- Spatial heterogeneity
- Implicit spatial relationships
- Modelling spatial data needs to be different from
modelling ordinary data - Data modelling influences data manipulation
- Querying
- Analysis
- Visualization
11Concept of Modelling
- Common sense view
- Representation of something at a level of
detail suitable for its purpose - For example, an architects model of a bridge
- Architects model brings the bridge to life even
before its construction - Formal View
- Modelling function translates some source domain
into its corresponding target domain - Target domain is used (because it is simple in
some sense than the source domain) for analysis - An inverse modelling function should be available
for translating results of analysis from target
domain to the source domain
12Modelling Spatial (geographic) Data
- Two fundamentally distinct views
- Absolute space
- Space exists in itself and objects are located in
this absolute space - You first create space and put objects in that
space - Relative space
- Space is one of the attributes of objects related
to other objects - You first define objects and they create space as
a result of their relative locations and
interactions - Both these views are used in GIS for modelling
spatial data
13Relational Data Model
- Relational databases model data into a connected
set of relations - Each relation is a collection of tuples
- Tuple1 -gt (location1,temperature1,rainfall1)
- Tuple2 -gt (location2,temperature2,rainfall2)
- For certain applications, relational models are
often criticised for impedance mismatch between - the relational database storing the data
- the object oriented code manipulating that data
- For spatial data this mismatch is a problem
- The inherent structure of spatial data is not
captured by the relational model
14Field-Based Models
- Information space is viewed as a collection of
fields - Temperature field, rain fall field and wind speed
field form a weather information space - Data attribute values are computed by functions
of locations - Temperature1 Temperaturefield(location1)
- Tempearture2 Temperaturefield(location2)
- RainFall1 RainFallfield(location1)
- The field is the function, not the set of values
- Field is the first-class entity in this kind of
modelling
15Field-Based Models (2)
- Field-based model is a function on location
- So we need location data as independent variable
- Given a region of space (geography) we need a
framework to partition that space into locations - Tessellation of space
- For example using grids
- A field based model then a function that maps
each location to its attribute value - Useful for modelling data from continuous spatial
processes - Temperature fields, elevation data
16Object-based Models
- One or more tuples from the relational model can
be lumped together as data values corresponding
to an object - All the tuples that have temperatures below zero,
rainfall above 10mm describe an object - The object then has spatial reference
- The above weather conditions could be true for a
region of geography - Object is the first-class entity in this kind of
modelling - Useful for modelling data from discrete spatial
processes - Administrative units, rivers
17Object-based Models(2)
- Object-based model maps directly to the
object-oriented model we are familiar in
computing science - Objects have attributes some of which happen to
be spatial and therefore have values related to
space (or geography) - Field-based models also can be mapped to
object-oriented models but not directly - Field-based and object-based models are
complementary not competing - Both are useful for different contexts
18Spatial Databases
- Connected set of Themes (corresponding to
relations/tables in relational model) - Each of these is a collection of geographic
objects - Geographic objects have two components
- Description non-spatial attributes
- Spatial component spatial attributes
- Geometric attributes such as location and shape
- Topological attributes such as adjacency
- Two example themes
- Countries (name, population, georegion)
- Languages (language,georegion)
19Countries
20Languages
21Queries on Spatial databases
- Familiar operations from relational algebra can
be defined on themes - Theme projection
- ?population,geo(Countries)
- Theme selection similar to relational selection
- s populationgt50(Countries)
- Theme union similar to relational union
- You can work these out yourself
22Spatial Join
- In a relational database, join queries help users
to connect or link or join tables - Spatial databases allow users to join themes
- These are called theme overlays
- An object of one theme is joined with an object
of the other theme if their geometries interset - In our example, the resulting theme will show all
the rows and columns of both the tables - You can work it out yourself
23Special Queries
- Some queries to spatial databases are more
complicated than the relational queries - Window query select the objects that overlap a
given window or area - Point query select the objects that contain the
given point - Clipping select the objects with the exact
intersection of the geometry of the object and
the given window - To process such queries GIS possesses geometric
and topological sense - We will not go into the details here
24Visualization of Spatial Data
- Results of theme operations are not very useful
if shown as tables - They are normally shown as maps in GIS
- Theme overlay is the main operation for creating
maps in GIS - Data belonging to the required themes is
retrieved from the database and plotted as
overlays in a GIS (you will learn to use overlays
in the practical) - As discussed with other visualizations
geo-visualization (or map drawing) too has two
aspects - Designing the map
- Rendering the map
25Visualization of Spatial Data (2)
- Maps can be rendered using
- Vector graphics
- Raster graphics
- This distinction can be traced back to the
distinction between - Object-based data models (Vector models)
- Field-based data models (Raster models)
- Many modern GIS systems allow mixing and matching
these two modes to render maps - Google maps overlay vector based spatial
information on top of raster satellite image - This is the approach we use in our practicals
where we write java code for visualizing spatial
data
26Spatial Data Analysis
- Techniques to analyse data taking into
consideration their location information. - Results of spatial data analysis change if
spatial distribution of data changes - How data varies in space?
- There are many stages of spatial data analysis
- Pre-processing or Smoothing
- Exploratory Spatial Data Analysis
- Model building
- For event prediction and hypotheses testing
- For communication
- Very similar to the stages involved in processing
time series
27Data quality - Smoothing
- Data quality is a serious issue in spatial
databases - Inaccuracies in measurement of location
information - E.g.Inaccuracies due to approximations in GPS
- Inaccuracies due to integrating data
(particularly in a GIS) from different sources
each of which using a different approximation of
location information - Simple smoothing techniques such as mean and
median filters (refer to lecture 4) are still
useful
28Exploratory Spatial Data Analysis (ESDA)
- ESDA involves identification of data properties
and formulating hypotheses from data - Visualization of data using GIS is particularly
suited for ESDA - Results from ESDA often form input to subsequent
stages of analysis - ESDA is an important step in the development life
cycle - Developers gain lot of understanding of the
underlying phenomena by performing ESDA - As a result developers have better understanding
of user requirements - Therefore helps them in making better system
design to fulfil user requirements
29Spatial Data Types
- Three Types
- Data referenced to a point
- E.g. Location information of a restaurant
- Data referenced to a path
- E.g. Path information from my home to University
- Data referenced to an area
- E.g. information about a region bounded by a
polygon - We can transform point data into area data by
aggregating values over all the points in an area - Different data analysis tasks and techniques are
employed for each of these data types
30Points Data
- Event prediction
- E.g. given the spatial distribution of crimes in
an area, predict the likely location of a future
crime - Given some actual observations predict unknown
values at intermediate locations by interpolation - Spatial regression
31Paths Data
- Finding least cost path over a route map.
- Navigation systems on modern cars find paths and
communicate the path information graphically and
by speech - A navigation system is a good example of the kind
of systems we are interested in this course - They analyse spatial data to extract important
information plus - They also communicate the extracted information
in different forms to suit the user
32Area/Lattice data
- Public domain is flooded with this type of data
- E.g. census data is available for public as
aggregated values over a census tract - Scrol Scotlands Census Results Online
- Weather parameters such as temperature and
rainfall are reported as aggregated values over a
region such as Grampian and Lothian - Disease count data where counts of a disease are
recorded for regions or counties - Technology to analyse and communicate this type
of data has large impact on public life
33Segmentation
- Analysis of area data to find regions that have
similar values of one or more non-spatial
attributes - E.g. segmentation finds areas in a country with
high family income - Visualizations of segments is done using maps
with different segments shown in different
colours - Many computational approaches to segment area
data - Partitioning
- Hierarchical
- Density-based
- Grid-based and
- Model-based
34Typical area analysis problem
- Input
- a table of area names and their corresponding
attributes such as population density, number of
adult illiterates etc. - Information about the neighbourhood relationships
among the areas - A list of categories/classes of the attributes
- Output
- Grouped (segmented) areas where each group has
areas with similar attribute values - Visualizations using maps do not need
segmentation process - Census Website has plenty of examples
- http//www.statistics.gov.uk/census2001/censusmaps
/index.html - Textual presentation of segmented data requires
segmentation - Textual presentations useful for visually
impaired users
35Similarity with image segmentation
- Spatial segmentation is performed in image
processing as well - Identify regions (areas) of an image that have
similar colour (or other image attributes). - Many image segmentation techniques are available
- E.g. region-growing technique
36Region Growing Technique
- There are many flavours of this technique
- One of them is described below
- Assign seed areas to each of the segments
(classes of the attribute) - Add neighbouring areas to these segments if the
incoming areas have similar values of attributes - Repeat the above step until all the regions are
allocated to one of the segments - You will work with a version of this technique in
the practical 6
37Spatio-temporal data analysis
- Many spatial data sets have a temporal dimension
as well - Census data from several census activities (UK
collects census every 10 years) is
spatio-temporal - Weather data for a region collected over a period
of time is spatio-temporal - Spatio-temporal data analysis is concerned with
data variation in space and time - Graphical animations of spatial displays can help
visualize spatio-temporal data
38Summary
- GIS combines data analysis and visualization
seamlessly - Spatial data analysis is concerned with data
variation in space - How data changes with location
- Spatial data analysis is different because of
auto-correlation and heterogeneity in spatial
data - Area data is ubiquitous and segmentation of area
data can be achieved by region growing approaches