Title: GIS Data Models
1GIS Data Models
2Objective
- To understand basic GIS data types
- To understand how vector data models work in
todays GIS
3Topics
- Brief history of GIS evolution
- Overview of GIS data models
- Explanation of databases
4RASTER AND VECTOR FORMATS
RASTER Grid-based, Simplify reality VECTOR
Analog map, Cartography
5DATA MODEL OF RASTER AND VECTOR
REAL WORLD
1 2 3 4 5 6
7 8 9 10
1 2 3 4 5 6 7 8 9 10
GRID RASTER
VECTOR
6RASTER DATA MODEL
- Derive from formulation that real world - it has
spatial elements and objects fills those elements - Real world is represented with uniform cells
- List of cells is a rectangle
- Cell comprises of triangles, hexagon and higher
complexities - A cell reports its own true characteristics
- Per units cell does not represent an object
- An object is represented by a group of cells
7Lake
River
Pond
Reality - Hydrography
Lake
River
Pond
Reality overlaid with a grid
1
1
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
0
0
0 No Water Feature 1 Water Body 2 River
1
1
1
2
0
0
0
0
0
0
0
0
2
2
1
1
0
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
Resulting raster
Creating a Raster
8Resolution
- Resolution is the size of the cell used in
relation to scale - Most are fundamentally aware of this concept
through digital photography Megapixels - In GIS, resolution is typically described as size
- 1m resolution aerial photography each pixel in
the image represents 1 square meter of ground
area - As resolution decreases in units of size, the
finer the detail of the image - A fire hydrant cannot be seen in a 1m
resolution image but likely will be discernable
in a 0.1m image - Resolution and scale are linked concepts
9VECTOR DATA MODEL
- Derived from the formulation of spatial concepts
that emphasize on real world objects - Geometry primitives of vector data model are
point, line and polygon - objects can be built from these primitives
- Object location determined by represented
location point - Uniqueness of vector data model lies in its
management and storage of data geometry
primitives - spaghetti model
- topology model
10VECTOR CHARACTERISTICS
POINT X LINE POLYGON
11RASTER TO VECTOR
RIVER CHANGED FROM RASTER TO VECTOR FORMAT
RIVER THAT HAS BEEN
VECTORISED ORIGINAL RIVER
12PROS AND CONS OF RASTER MODEL
- Pro
- Raster data is more affordable
- Simple data structure
- Very efficient overlay operation
- Cons
- Topology relationship difficult to implement
- Raster data requires large storage
- Not all world phenomena related directly with
raster representation - Raster data mainly is obtained from satellite
images and scanning
13PROS AND CONS OF VECTOR MODEL
- Pro
- More efficient data storage
- Topological encoding more efferent
- Suitable for most usage and compatible with data
- Good graphic presentation
- Cons
- Overlay operation not efficient
- Complex data structure
14Explanation of database types
- a database is a collection of non-redundant data
which can be shared by different application
systems - implies separation of physical storage from use
of the data by an application program, i.e.
program/data independence - changes can be made to data without affecting
other components of the system.
15Database types
- Tabular ("flat file") - data in a single table
- Hierarchical
- Network
- Relational
16The ideal GIS database is one that maximizes the
uniqueness of every feature while minimizing
total data quantity
17Hierarchical databases
- Developed in the 1960s by International Business
Machines (IBM) - Somewhat resembles real-world filing systems
- Tree-structured, similar to folder arrangements
in a computer directory - The database keeps track of the different record
types, their attributes, and the hierarchical
relationships between them - The attribute which assigns records to levels in
the database structure is called the key (e.g. is
record a department, part or supplier?)
18Features of a hierarchical model
- A set of record "types"
- E.g. supplier record type, department record
type, part record type - A set of links connecting all record types in one
data structure diagram (tree) - At most one link between two record types, hence
links need not be named - For every record, there is only one parent record
at the next level up in the tree
19Features (contd)
- E.g. every county has exactly one state, every
part has exactly one department - No connections between occurrences of the same
record type - Cannot go between records at the same level
unless they share the same parent
20http//dev.mysql.com/tech-resources/articles/hiera
rchical-data-1.png
21Pros and cons
- Data must possess a tree structure
- Tree structure is natural for geographical data
- Data access is easy via the key attribute, but
difficult for other attributes - In the business case, easy to find record given
its type (department, part or supplier) - In the geographical case, easy to find record
given its geographical level (state, county,
city, census tract), but difficult to find it
given any other attribute
22Pros and cons (contd)
- E.g. find the records with population 5,000 or
less - Tree structure is inflexible
- Cannot define new linkages between records once
the tree is established - E.g. in the geographical case, new relationships
between objects - Cannot define linkages laterally or diagonally in
the tree, only vertically
23Pros and cons (contd)
- The only geographical relationships which can be
coded easily are "is contained in" or "belongs
to" - DBMSs based on the hierarchical model (e.g.
System 2000) have often been used to store
spatial data, but have not been very successful
as bases for GIS
24Network data model
- Developed in mid 1960s by Charles Bachman as part
of work of CODASYL (Conference on Data Systems
Languages) which proposed programming language
COBOL (1966) and then network model (1971) - Other aspects of database systems also proposed
at this time include database administrator, data
security, audit trail - Objective of network model is to separate data
structure from physical storage, eliminate
unnecessary duplication of data with associated
errors and costs
25Networked model (contd)
- Uses concept of a data definition language, data
manipulation language - Uses concept of mn linkages or relationships
- An owner record can have many member records
- A member record can have several owners
- Hierarchical model allows only 1n
26Networked model (contd)
- Example of a network database
- A hospital database has three record types
- Patient name, date of admission, etc.
- Doctor name, etc.
- Ward number of beds, name of staff nurse, etc.
- Need to link patients to doctor, also to ward
- Doctor record can own many patient records
- Patient record can be owned by both doctor and
ward records - Network DBMSs include methods for building and
redefining linkages, e.g. when patient is
assigned to ward
27Problems with the networked model
- Links between records of the same type are not
allowed - While a record can be owned by several records of
different types, it cannot be owned by more than
one record of the same type (patient can have
only one doctor, only one ward)
28http//en.wikipedia.org/wiki/FileNetwork_Model.jp
g
29Relational database model
- The most popular DBMS model for GIS
- Flexible approach to linkages between records
comes closest to modeling the complexity of
spatial relationships between objects - Proposed by IBM researcher E.F. Codd in 1970
- More of a concept than a data structure
- Internal architecture varies substantially from
one RDBMS to another
30Relational databases (contd)
- Each record has a set of attributes
- The range of possible values (domain) is defined
for each attribute - Records of each type form a table or relation
- Each row is a record or tuple
- Each column is an attribute
- Note the potential confusion - a "relation" is a
table of records, not a linkage between records - The degree of a relation is the number of
attributes in the table
31Relational databases (contd)
- 1 attribute is a unary relation
- 2 attributes is a binary relation
- n attributes is an n-ary relation
- Examples
- unary COURSES(SUBJECT)
- binary PERSONS(NAME,ADDRESS) OWNER(PERSON
NAME,HOUSE ADDRESS) - ternary HOUSES(ADDRESS,PRICE,SIZE)
32How a relational database works
- A key of a relation is a subset of attributes
with the following properties - Unique identification
- The value of the key is unique for each tuple
- Non-redundancy
- No attribute in the key can be discarded without
destroying the key's uniqueness - A prime attribute of a relation is an attribute
which participates in at least one key - All other attributes are non-prime
33Relational database key example
- For example, a phone number is a unique key in a
phone directory - In the normal phone directory the key attributes
are last name, first name, street address - If street address is dropped from this key, the
key is no longer unique (many Smith, Mary's)
34Pros and cons
- The most flexible of the database models
- No obvious match of implementation to model -
model is the user's view, not the way the data is
organized internally - Is the basis of an area of formal mathematical
theory
35Pros and cons (contd)
- Most RDBMS data manipulation languages require
the user to know the contents of relations, but
allow access from one relation to another through
common attributes Example Given two relations - PROPERTY(ADDRESS,VALUE,COUNTY_ID)
- COUNTY(COUNTY ID,NAME,TAX_RATE)
- To answer the query What are the taxes on
property X" the user would
36Pros and cons (contd)
- PROPERTY(ADDRESS,VALUE,COUNTY_ID)
- COUNTY(COUNTY ID,NAME,TAX_RATE)
- Retrieve the property record
- Link the property and county records through the
common attribute COUNTY_ID - Compute the taxes by multiplying VALUE from the
property tuple with TAX_RATE from the linked
county tuple
37http//www.mbari.org/ssds/ReferenceDocuments/RDB_E
R.gif
38Evolution of GIS data models
39CAD model
- Vector based mapping
- Maps created with computer aided design programs
(CAD) - Little or no attribute data
40Coverage data model
- Created in 1981 by ESRI as part of ArcInfo, the
first commercially available GIS package - Spatial data stored with attribute data using
indexed binary files - Allowed for storage of topological relationships
41Limitations of coverage model
- All features have a generic behavior
- For example, a highway running across a polygon
split that polygon made defining behaviors
extremely difficult! - Topology (spatial relationship among objects) was
explicitly defined - Required use of macro code (ArcAML) to resolve
complex features
42http//webhelp.esri.com/arcgisdesktop/9.2/publishe
d_images/Coverage20data20elements.GIF
43The geodatabase
- Created in 2000 by ESRI
- Allows for specific behaviors to be assigned to
specific features without writing code - Based upon a relational database
- Said to be object-oriented data model
- Topology built more easily
44The ArcGIS Environment
- ArcGIS is packaged similar to Microsoft Office.
Whereas Office encompasses Excel, Word, and
PowerPoint, ArcGIS comes with - ArcMap
- ArcCatalog
- ArcToolbox
45Component Overview ArcCatalog
- ArcCatalog acts as the operating system for GIS
its look and feel is similar to Windows Explorer - ArcCatalog allows users to preview data in both a
geographic (map) format and table format
(attributes). - ArcCatalog is the principal management tool for
reading and writing metadata
46Windows Explorer vs. ArcCatalog
Windows Explorer
ArcCatalog
Table of Contents
Preview Pane
Table of Contents
Preview Pane
47A Quick Tour of ArcCatalog
Quick Launch Buttons
Navigation Buttons
Preview Selection Tabs
48ArcCatalog uses a unique symbol set to indicate
data formats
Raster
Geodatabase
Feature Dataset
Feature Classes
49Symbology is maintained among different data
formats
Shapefile formats appear green
50Preview tab allows user to view either geography
or table
Toggle views here
51Metadata Data about the data
Metadata toolbar accessed from VIEW toolbars Allo
ws users to edit all metadata and
select metadata format convention
52References
- National Center for Geographic Information
Analysis (NCGIA), Core Curriculum at - http//www.geog.ubc.ca/courses/klink/gis.notes/ncg
ia/toc.html - Goodchild, M.F., and K.K. Kemp,eds. 1990. NCGIA
Core Curriculum in GIS. National Center for
Geographic Information and Analysis, University
of California, Santa Barbara CA