Title: CP6514 Lecture 4 Databases and GIS
1UST/ENV/PAD/PDD 642/742 Introduction to
Geographic Information Systems
Lecture 4-1 GIS Database Management and
Thematic Mapping I
2Attribute Database and GIS
The power of GIS lies not only in the pretty maps
it can make, but in the quality and amount of
attribute data behind the map
The referenced attribute data can be used to
drive thematic displays, perform spatial
analyses, and derive new data
3Definition of a Database
- A Database is a logically coherent collection
of data about some aspect(s) of the real world,
with some inherent meaning - The Database is designed, implemented and
populated with data for a specific purpose - More informally, a database is an organized
collection of records (rows) representing
distinct persons, places or things. - Each record is comprised of a consistent set
of fields (columns) which contains information
describing that distinct person, place or thing
4A Simple Database
5Field Types Text
- The Text Field Type holds characters
- You can have a single character in a text
string or a set of characters - Depending on the type of information and the
guiding business rules, values that consist
entirely as digits should be considered strings - In queries, text values should be enclosed in
single or double quotes - e.g. LU_CODE A
- Name Abigail Kristoffersson
- Zip_Code 30332
6Field Types Integers
- Integer Field Types hold integer values
restricted to particular integer ranges - The Long Integer holds signed integers
between 2,147,483,648 and 2,147,483,647 - The Short Integer holds signed integers
between 32,768 and 32,767 - Integer arithmetic is very fast, but long
integers have twice the memory requirements as
short integers - e.g. Age 32 short Int
- Yearned_Income 10000000 long Int
7Field Types Floats and Doubles
- Floats and Doubles hold decimal numbers,
depending on size and accuracy - The Float Field Type holds decimal numbers
with up to 7 significant digits - The Double Field Type holds decimal numbers
with up to 17 significant digits with accuracy - Calculations are slower and approximate
- e.g. Height 1.720000 float
- X-Coord 2,134,587.916673144 double
8Field Types Date
- The Date Field Type allows you to store both
date and time information for a range between
Jan. 1, 0100 and Dec. 1, 9999 - e.g. End_of_World Dec. 31, 1999 1159 PM
- End_of_World January 1, 2000
- End_of_World 1/1/2000 0001
- If no time is specified in the date field
value, the system will assume 1200 midnight
9Field Types BLOB
- BLOB stands for Binary Large Object and is the
binary data type of a column in an RDBMS table
that stores large image, text, or geometry data - The BLOB field type essentially can store
complex spatial information - The BLOB field is often the RDBMS equivalent
of the Shape field in the dBase component of a
shapefile
10Common Database Models
- Flat File Data Management Systems
- Hierarchical Data Management Systems
- Network Data Management Systems
- Relational Data Management Systems
11Flat File Database and GIS
Single table of rows and columns Unique
identifier for each record High degree of
redundancy Several null values Not very
flexible
12Relational Database Model
- Consists of several flat file tables or
Entities - Database defined both in terms of entities and
their relationships - Multiple tables can be joined to form a single
virtual table - Both entities and relationships are defined by
unique identifiers or primary keys
13JOINS
- In GISs, new virtual tables are usually
formed when two or more tables are JOINED - In JOINING operations, the source table fields
are appended to the destination table, if the
values in the specified join field coincide --
you therefore have to specify a from source
table, a source field item, a to destination
table, and a destination field item before
implementing a join - JOINs are used when the relationship between a
source and destination has a cardinality ratio
of 1 1 or 1 N - You can then make thematic displays or perform
analyses using the fields of the source table - You have the option of keeping all the source
records, or keeping only those source records
that had valid joins
14One-to-One Relationship (JOIN)
TABLE 2 DESTINATION
TABLE 1 SOURCE
JOIN Relationship TABLE_1 . TR_NAME TABLE_2
. TRACT_NO
New Virtual Table
15Many-to-One Relationship (JOIN)
TABLE 1 SOURCE
TABLE 2 DESTINATION
New Virtual Table
JOIN Relationship TABLE_1. Soil_Code
TABLE_2. Code
16Advantages of Relational Model
- Easier Data Sharing
- Multi-user Transaction Processing
- Multiple Views of Database
- Flexibility, Scale Economies
- Access Control
- Redundancy Control
- Fewer Data Entry Errors
- Greater Data Consistency
- Ease in Backup and Recovery
17Graphic and Attribute Data Linkages (JOIN)
Table 1
Table 2
(SOURCE)
(DESTINATION)
103
I-1
102
104
R-1
C-1
105
101
Interim Output from Table 1 and Table 2
Zoning Layer
Parcel Layer
Table 3.
(Source)
Final Output from Interim Output and Table 3
18RELATE
- In RELATING operations, the procedure involves
declaring the two tables that are going to be
related, and the two fields that contain the
relate values -- the relate operation then
forms a relationship between the two tables,
while visually keeping them separate - RELATES can be used when the relationship
between a source and destination has a
cardinality ratio of N 1 - RELATES are primarily used to select related
records in one table by selecting records in the
other table - Selections may be done either by location
spatial query or by attribute query - Each RELATE is given a user-specified name
19RELATE (source destinationN1)
TABLE 1 SOURCE
TABLE 2 DESTINATION
20Thematic Mapping Numeric Data Classification
- Used for Exploration purposes
- Used to make maps more readable, easier to
understand - Used to depict something about the data or area
that is not self-evident
Numeric Data Classification Methods Natural
Breaks Standard Deviation Quantiles Equal
Interval User Defined or Manual
21Natural Break Classification
- Data broken up into interval ranges such that
- Variance within each range is minimized
- Used for patterns/groupings inherent in data
- Uses Jenks optimization method
22Quantile Classification
- Data broken up into interval ranges such that
- each range has an equal number of observations
- useful for linear patterns in data
- not very useful if disproportionate numbers of
observations with very similar values occur
23Equal Interval Classification
- Data broken up into intervals such that
- The range of each interval is equal
Manual or User-specified Classification
- Continuous - user sets UPPER LIMIT for each range
- Discontinuous - user sets both UPPER and LOWER
LIMITS for each range similar to continuous,
but with some ranges removed - Discontinuous useful if no observations occur for
some intermediate range -- these ranges may be
skipped!
24Standard Deviation Classification
- First, the mean is found for the entire data set
- Interval breaks found above and below the mean
such that - Breaks occur at 1/4 or 1/3 or 1/2 or 1 standard
deviation(s) above and below the mean, till all
the data is classified - Useful in cases where the underlying population
distribution is known - Typically, a two-color ramp helps to emphasize
values above in blue and below in red the
mean
25Classification by Normalization
- Instead of using the direct data value, a
computation is made based on percent of total - Alternately, one can compute a ratio of two data
values for each record - If the variation in the data values is extremely
large, instead of using the direct data value, a
computation is made based on the log value - The Normalized Value then drives the data
display
or
or