Title: GIS Fundamentals/ Geographic Database Design
1GIS Fundamentals/Geographic Database Design
2GIS Concepts
- Information cycle
- Data/Information/System/Information System
- Geographic Information System
- Main Components/Characteristics
- Geographic Database
- Data Modeling
- Data Representation
- Spatial Analysis
- Implementing a GIS
3Information Cycle
Territory
Data
GIS
DSS
Information
Decision
4Data / Information
- Information is the result of interpretation of
relations existing between a certain number of
single elements (called data). - Example
- The Museum located at 5th Avenue, NY, was built
in 1898. - Data Museum, address, year of construction.
5System
- A system is a set organized globally and
comprising elements which coordinate for working
towards doing a result. - Example Water supply system
- Elements pipes, valves, hydrants, water meters,
pumps, reservoirs, etc.
6Information System (IS)
- An Information System is a set organized globally
and comprising elements (data, equipment,
procedures, users) that coordinate for working
towards doing a result (information).
7GIS G IS
- Definition
- A GIS is a collection of computer hardware and
software, geographic data, methods, and personnel
assembled to capture, store, analyze and display
geographically referenced information in order to
resolve complex problems of management and
planning.
8Components of a GIS
9GIS Components
Geographic Data
Geographic Information
Input
Output
GIS
- Reports
- Maps
- Photo. Products
- Statistics
- Input Data for models
Manipulation Analysis
- Maps
- Census
- Field Data
- RS Data
- Others
Data Capture
Display
Storage
User Interface
Models
Other GIS
10GIS Main Characteristics
- Integration of Multiple data
- - Sources
- - Scales
- - Formats
- Geographic Database
- Spatial Analysis
11Data from multiple sources-at multiple scales-in
multiple formats
Census/ Tabular data
Maps
Picture Multimedia
GPS/ air photos/ satellite images
12Referencing map features Coordinate systems
map projections
- To integrate geographic data from many
different sources, we need to use a consistent
spatial referencing system for all data sets
13The Latitude/Longitude reference system
- latitude f angle from the equator to the
parallel - longitude ? angle from Greenwich meridian
14Map Projections
- Curved surface of the earth needs to be
flattened to be presented on a map - Projection is the method by which the curved
surface is converted into a flat representation
15Map Projections (Cont.)
- We can think of a projection as a light source
located inside the globe which projects the
features on the earths surface onto a flat map - Point p on the globe becomes point p on the map
16Distortion in Map Projections
- Some distortion is inevitable
- Less distortion if maps show only small areas,
but large if the entire earth is shown - Projections are classified according to which
properties they preserve area, shape, angles,
distance
17Compromise projections
- Do not preserve any property, but represent a
good compromise between the different objectives - e.g., Robinsons projection for the World
18Compromise projections
19UTM Universal Transverse Mercator
- Minimal distortions of area, angles, distance and
shape at large and medium scales - Very popular for large and medium scale mapping
(e.g., topographic maps)
20UTM
- Cylindrical projection with a central meridian
that is specific to a standard UTM zone - 60 zones around the world
21Space as an indexing system
22The concept of scale
- scale is the ratio between distances on a map and
the corresponding distances on the earths
surface - e.g., a scale of 1100,000 means that 1cm on the
map corresponds to 100,000 cm or 1 km in the real
world
23The concept of scale
- scale is essentially a ratio or representative
fraction - small scale small fraction such as 110,000,000
shows only large features -
- large scale large fraction such as 125,000
shows great detail for a small area - small scale versus large scale often confused
24Multi-scales
- The same feature represented in different scales.
- Example lake
Large scale (125.000)
Small scale 1500.000
25Multi-formats
- Raster
- Vector
- Raster-Vector-Raster
- DXF-DGN-etc.
- Shapefile
- KML
- Etc.
26Geographic Database
- Geographic Data
- Characteristics
- Examples
- Geographic Dataset
- Geographic Database Concepts
- Spatial entity
- Data Modeling
27Descriptive Data vs Geographic Data
- General Data
- Descriptive attributes
- Geographic Data
- Descriptive attributes
- Spatial attributes
- Location
- Form
28Geographic Data Characteristics
- Position
- explicit geographic reference
- Cartesian coordinates X,Y,Z
- Geographic coordinates (lat, log)
- implicit geographic reference
- Address
- Place-name
- Etc.
- Geometric Form
- ex a polygon representing a parcel of land
29Example1 Parcel of land
- Attribute (descriptive) Data
- Landowner
- Area
- Etc.
- Spatial data
- Position
- Located at 100 Nelson Mandela Ave
- X a Yb within system (X,Y)
- Form
- dimensions (sides and arcs, constituting a
polygon)
30Example 2 District
- Attribute (Descriptive) data
- District-Code
- District-Name
- Population 1990
- Population 2000
- Population 2010
- Spatial data
- Geographical Position
- Polygon
31Geographic Database
- Definition
- Components
- Spatial Entity/Attribute/Dataset
- Data Modeling/Data Dictionary
- Spatial Representation
- Vector/Raster
- Topology
- Standard Spatial Operations
32Spatial entity
- We use the term entity to refer to a phenomenon
that can not be subdivided into like units. - Example a house is not divisible into houses,
but can be split into rooms. - Others a lake, a statistical unit, a school,
etc. - In database management systems, the collection of
objects that share the same attributes. - An entity is referenced by a single identifier,
perhaps a place-name, or just a code number
33Attribute
- Each spatial entity has one or more attributes
that identify what the entity is, and describe
it. - Example you can categorize roads by whether
they are local roads, highways, etc by their
length their width their pavement etc. - The type of analysis you plan to do depends on
the type of attributes you are working with. -
-
34Dataset
- A dataset is a single collection of values or
objects without any particular requirement as to
form of organization.
35Geographic Database
-
- A geographic database is a collection of spatial
data and related descriptive data organized for
efficient storage, manipulation and analysis by
many users. - It supports all the different types of data that
can be used by a GIS such as - Attribute tables
- Geographic features
- Satellite and aerial imagery
- Surface modeling data
- Survey measurements
36Data Modeling
- Data Approach
- Modeling Process
- Entity/Relationship Approach
- Example
37Modeling Process
Abstracting the Real World
Reality
Modeling
(data treat.)
Geographic Database
38ANSI/SPARC Study Group on Data Base Management
Systems (1975)
Different users have different views of the world
Real World
External Model 1
External Model 2
External Model 3
Conceptual Model
Logical Model
Physical Model
39Conceptual Model
- A synthesis of all external models (users
views). - Schematic representations of phenomena and how
they are related. - Information content of the database (not the
physical storage) so that the same conceptual
model may be appropriate for diverse physical
implementations. - Therefore, the conceptual model is independent
from technology.
40Conceptual Model (cont.)
- Easy to read
- Conceived for the analyst or designer
- Objective representation of the reality,
therefore independently from the selected GDB
System - One conceptual model for the Database
41Data Logical Model Physical Model
- We transform the conceptual model into a new
modeling level which is more computing oriented
the logical model (Example the Relational
Database approach) - We transform the logical model into an internal
model (physical model) which is concerned with
the byte-level data structure of the database. - Whereas the logical model is concerned with
tables and data records, the physical model deals
with storage devices, file structure, access
methods, and locations of data.
42Several types of data organization
- Hierarchical model
- - Hierarchical relationships between data
(parent- child) - Network Model
- - Focus on connections
- Relational model
- - Based on relations (tables)
- Object-Oriented model
- - Focus on Objects
43Entity-relationship Formalism
Entity
Entity name
Attributes
ENTITY_NAME
-attribute 1 -attribute 2
ENTITY_NAME
-attribute 1 -attribute 2
0-N
0-1
Identifier (key-attribute)
Maximum cardinality
Association (relationship)
Minimum cardinality
44An example of land parcels
45The E/R diagram for land parcels
STREET
-name
A
B
PARCEL
-number
SEGMENT
-number
2-N
0-1
3-N
1-2
1-N
2-2
A Streets have edges (segments) B parcels have
boundaries (segments) C line have two
endpoints D parcels have owners, and people own
land.
C
D
2-N
1-N
POINT
-number -x,y
LANDOWNER
-name -date-of-birth
46Data Tables
47Data Dictionary
- Definition
- A data catalog that describes the contents of a
database. Information is listed about each field
in the attribute table and about the format,
definitions and structures of the attribute
tables. A data dictionary is an essential
component of metadata information.
48Example Census GIS database
- - Basic elements
- Entity administrative or census units
- enumeration areas
- Entity type / Relations
- Components of a digital spatial census database
- Boundary database
- Geographic attribute tables
- Census data tables
49Relations
EA entity can be linked to the entity crew
leader area. The table for this entity could have
attributes such as the name of the crew leader,
the regional office responsible, contact
information, and the crew leader code (CL code)
as primary code, which is also present in the EA
entity.
R
Crew leader area
CL-code Name RO responsible
EA
EA-code Area Pop.
1-N
1-1
50Entity Enumeration areas
Type (attributes)
EA-code Area Pop. CL-code
50101 28.5 988 78 50102 20.2 708 78 50103 18.1 590 78 50104 22.4 812 78 50201 19.3 677 79 50202 17.6 907 79 50203 25.7 879 79 50204 26.8 591 79
Identifier
51Components of a digital spatial census database
52Data Representation
Raster
Vector
Real World
53Two Fundamental Types of Data
- GIS work with two fundamentally different types
of geographic information - Vector
- Raster (or Grid)
- Both types have unique advantages and
disadvantages - A GIS should be able to handle both types
54Vector vs Raster or Discrete vs Continuous
Raster
Vector
River
x1,y1
xn,yn
55Raster Data
- A raster image is a collection of grid cells -
like a scanned map or picture - Raster data is extremely useful for continuous
data representation - elevation
- slope
- modeling surfaces
- Satellite imagery and aerial photos are commonly
used raster data sets
56Vector Data
- Vector data are stored as a series of x,y
coordinates - Good for discrete data representation
- points wells, town centroids
- lines roads, rivers, contours
- polygons enumeration areas,
- districts, town boundaries, building footprints
57Raster-Vector conversion (vectorization)
58Vector to Raster Conversion Polygons
b
a
c
59Vector to Raster Conversion Lines
60Raster to Vector Conversion Polygons
61Raster to Vector Conversion Polygons
62Vector data image (raster)
63Vector Points, lines, polygons
- Set of geometric primitives
points
lines
polygons
y
node
vertex
x
64Vector Structure
- Spaghetti
- Topology
- Network
- (graph)
65Spaghetti File
No Topology raw file or spagehetti
file Lines not connected have no intelligence
66Example of Spaghetti data structure
6
Poly coordinates A (1,4), (1,6), (6,6), (6,4), (4,4), (1,4) B (1,4), (4,4), (4,1), (1,1), (1,4) C (4,4), (6,4), (6,1), (4,1), (4,4)
A
5
4
3
B
C
2
1
1 2 3 4 5 6
67Topology
- Data structure in which each point, line and
piece or whole of a polygon - knows where it is
- knows what is around it
- understands its environment
- knows how to get around
- Helps answer the question what is where?
68Topology Spatial Relationships
Left Polygon A Right Polygon B Node 1
Chains A,B,C Chain A is connected to chains B
C Polygon B Contained within polygon A
Adjacency Connectivity Containment
69Example of Topological data structure
Node X Y Lines I 1 4 1,2,4 II 4 4 4,5,6 III 6 4 1,3,5 IV 4 1 2,3,6
1
Poly Lines A 1,4,5 B 2,4,6 C 3,5,6
6
A
5
I
II
III
4
4
5
3
From To Left Right Line Node Node Poly Poly 1 I III O A 2 I IV B O 3 III IV O C 4 I II A B 5 II III A C 6 II IV C B
2
B
C
3
6
IV
1
2
1 2 3 4 5 6
O outside polygon
70Encoding Topology (not) CAD
71Encoding Topology GIS
72Comparison
Advantages
Spaghetti Topology
Set of independent objects Representation of heterogonous objects within the same model Appropriate to CAD Pre-calculation of topological relations Maintenance of topological constraints correspondence with exchange formats
73Comparison (cont.)
Disavantages
Spaghetti Topology
Spatial Relationships calculated Risk of incoherence (duplication of common boundaries) High cost of up-to-date Many levels of indirections for complex objects Maintenance
74Some well known Topological models
- TIGER Topologically Integrated Geographic
Encoding and Referencing (Census Bureau of the
USA) - Line is the principal element to which are
related points and area features - ARC/INFO model ESRI
- Point, Line, Polygon
75TIGER Data Polygon
Counties
MCDs
Census Tracts
Block Groups
Zip Codes
Cities
Voting Districts
76TIGER Data Line
Streams
Streets
Railroads
77TIGER Data Point
Key Locations
Landmarks
Place Names
Zip4 Centroids
78Recapitulation on spatial models
- Transformations between models
- vectorization of raster images (costly)
- topology toward spaghetti (easy)
- spaghetti toward topology (possible but costly)
- The vector model most used, essentially topology
its useful to integrate raster and vector
79Spatial Analysis Query
- select features by their attributes
- find all districts with literacy rates lt 60
- select features by geographic relationships
- find all family planning clinics within this
district - combined attributes/geographic queries
- find all villages within 10km of a health
facility that have high child mortality - Query operations are based on the SQL
(Structured Query Language) concept
80Examples
What is at?
Features that meet a set of criteria
81Spatial Analysis (cont.)
- Buffer find all settlements that are more than
10km from a health clinic - Point-in-polygon operations identify for all
villages into which vegetation zone they fall - Polygon overlay combine administrative records
with health district data - Network operations find the shortest route from
village to hospital
82Modeling/Geoprocessing
- modeling identify or predict a process that has
created or will create a certain spatial pattern - diffusion how is the epidemic spreading in the
province? - interaction where do people migrate to?
- what-if scenarios if the dam is built, how many
people will be displaced?
83Spatial relationships
- Logical connections between spatial objects
represented by points, lines and polygons - e.g.,
- - point-in-polygon
- - line-line
- - polygon-polygon
84Spatial Operations
- adjacent to
- connected to
- near to
- intersects with
- within
- overlaps
- etc.
85is nearest to
- Point/point
- Which family planning clinic is closest to the
village? - Point/line
- Which road is nearest to the village
- Same with other combinations of spatial features
86is nearest to Thiessen Polygons
87is near to Buffer Operations
- Point buffer
- Affected area around a polluting facility
- Catchment area of a water source
88Buffer Operations
- Line buffer
- How many people live near the polluted river?
- What is the area impacted by highway noise
89Buffet Operations
- Polygon buffer
- Area around a reservoir where development should
not be permitted
90 is within point in polygon
- Which of the cholera cases are within the
containment area
91- Problem
- We may have a set of point coordinates
representing clusters from a demographic survey
and we would like to combine the survey
information with data from the census that is
available by enumeration areas.
Solution Point-in-Polygon operation will
identify for each point the EA area into
which it falls and will attach the census data to
the attribute record of that survey point.
92overlaps Polygon overlay
93Polygon Overlay
94Data Layers
95Spatial aggregation
- Example of Spatial aggregation
- fusion of many provinces constituting an economic
region
96Spatial data transformation interpolation
Example 1 Based on a set of station
precipitation surface estimates, we can create a
raster surface that shows rainfall in the entire
region
13.5
20.1
26.0
27.2
12.7
15.9
24.5
26.1
97GIS capabilities Visualization
98Implementing a GIS
- Consider the strategic purpose
- Plan for the planning
- Determine technology requirements
- Determine the end products
- Define the system scope
- Create a data design
- Choose a data model
- Determine system requirements
- Analyze benefits and costs
- Make an implementation plan
Source Thinking About GIS, Third Edition
Geographic Information System Planning for
Managers
99GISEnables us to handle very large amounts of
data
- Example census data
- thousands of EAs
- hundreds of variables
- many complementary data layers
- (roads, rivers, public facilities)
- Example remote sensing
- satellites send huge amounts of data
- that need to be processed, interpreted
- and stored
100GISHelps to make data re-usable and useful to
many more users
- Census geography
- EA maps do not have to be redrawn
- every time, only updated
- census information can be used for
- many more applications
- data sharing among agencies
101In Conclusion
- GIS for inventory/visualization
- GIS creates maps from data pulled from databases
anytime to any scale for anyone - GIS for database management
- GIS for spatial analysis/modeling
- GIS a tool to query, analyze, and map data in
support of the decision making process.
102What is Not GIS
- GPS Global Positioning System
- not just software!
- not just for making maps!
- Maps are an input data to and a product of a
GIS - A way to visualize the analysis
103Literature related to Census Mapping GIS
- US National Research Council
- Tools and Methods for Estimating
- Populations At Risk
- David Martin (1996)
- Geographic Information Systems
- Socioeconomic Applications
- Longley and al, Wiley (2005)
- Geographic Information Systems and
- Science, second edition
- ESRI Press
- Unlocking the Census with GIS
- Mapping the Census 2000
104Contact Information Demographic Statistics
Section UN Statistics Division New York
globalcensus2010_at_un.org