Title: Data Bases: Population and Maintenance
1Data Bases Population and Maintenance
2Data Collection
- One of most expensive GIS activities
- Many diverse sources (source integration, data
fusion, interoperability) - Two broad types of collection
- Data capture (direct collection)
- Data transfer
- Two broad capture methods
- Primary (direct measurement)
- Secondary (indirect derivation)
3Stages in Data Collection Projects
4Data Collection Techniques
Raster Vector
Primary Digital remote sensing images GPS measurements
Primary Digital aerial photographs Survey measurements
Secondary Scanned maps Topographic surveys
Secondary DEMs from maps Toponymy data sets from atlases
5Primary Data Capture
- Capture specifically for GIS use
- Raster remote sensing
- e.g. SPOT and IKONOS satellites and aerial
photography - Passive and active sensors
- Resolution is key consideration
- Spatial
- Spectral
- Temporal
7Imagery for GIS
8Vector Primary Data Capture
- Surveying
- Locations of objects determines by angle and
distance measurements from known locations - Uses expensive field equipment and crews
- Most accurate method for large scale, small areas
- Collection of satellites used to fix locations on
Earths surface - Differential GPS used to improve accuracy
9Total Station
10Pen/Portable PC and GPS
11Secondary Geographic Data Capture
- Data collected for other purposes can be
converted for use in GIS - Raster conversion
- Scanning of maps, aerial photographs, documents,
etc - Important scanning parameters are spatial and
spectral (bit depth) resolution
13Raster to vector conversion
14Vector Secondary Data Capture
- Collection of vector objects from maps,
photographs, plans, etc. - Digitizing
- Manual (table)
- Heads-up and vectorization
- Photogrammetry the science and technology of
making measurements from photographs, etc.
16Data Transfer
- Buy vs. build is an important question
- Many widely distributed sources of GI
- Includes geocoding
- Key catalogs include
- Geodata.gov
- Geography Network
- Access technologies
- Translation
- Direct read
17Managing Data Capture Projects
- Key principles
- Clear plan, adequate resources, appropriate
funding, and sufficient time - Fundamental tradeoff among
- Quality, accuracy, speed and price
- Two strategies
- Incremental
- Blitzkrieg
- Alternative resource options
- In house
- Specialist external agency
18A useful rule of thumb is that positions measured
from maps are accurate to about 0.5 mm on the
map. Multiplying this by the scale of the map
gives the corresponding distance on the ground.
19Positional Accuracy (cont.)
- within a database a typical UTM coordinate pair
might be - Easting 579124.349 m
- Northing 5194732.247 m
- If the database was digitized from a 124,000 map
sheet, the last four digits in each coordinate
(units, tenths, hundredths, thousandths) would be
20Testing Positional Accuracy
- Use an independent source of higher accuracy
- find a larger scale map
- use precision GPS
- Use internal evidence
- digitized polygons that are unclosed, lines that
overshoot or undershoot nodes, etc. are
indications of error - sizes of gaps, overshoots, etc. may be a measure
of positional accuracy
21Testing Accuracy (cont.)
- Compute accuracy from knowledge of the errors
introduced by different sources - e.g., 1 mm in source document
- 0.5 mm in map registration for digitizing
- 0.2 mm in digitizing
- if sources combine independently, we can get an
estimate of overall accuracy...
- Database an integrated set of data (attributes)
on a particular subject - Geographic (spatial) database - database
containing geographic data of a particular
subject for a particular area - Database Management System (DBMS) software to
create, maintain and access databases
23A GIS links attribute and spatial data
- Attribute Data
- Flat File
- Relations
- Map Data
- Point File
- Line File
- Area File
- Topology
- Theme
24Advantages of Databases over Files
- Avoids redundancy and duplication
- Reduces data maintenance costs
- Faster for large datasets
- Applications are separated from the data
- Applications persist over time
- Support multiple concurrent applications
- Better data sharing
- Security and standards can be defined and
25Disadvantages of Databases over Files
- Expense
- Complexity
- Performance especially complex data types
- Integration with other systems can be difficult
26Types of DBMS Model
- Hierarchical
- Network
- Relational - RDBMS
- Object-oriented - OODBMS
- Object-relational - ORDBMS
27Relational Databases rule now
28Characteristics of DBMS (1)
- Data model support for multiple data types
- e.g MS Access Text, Memo, Number, Date/Time,
Currency, AutoNumber, Yes/No, OLE Object (MS
Object linking and embedding), Hyperlink, Lookup
Wizard - Load data from files, databases and other
applications - Index for rapid retrieval
29Characteristics of DBMS (2)
- Query language SQL
- Security controlled access to data
- Multi-level groups (e.g. census, NGA)
- Controlled update using a transaction manager
- Versioning
- Backup and recovery
30Characteristics of DBMS (3)
- Applications
- Forms builder
- Reportwriter
- Internet Application Server
- CASE tools
- Programmable API (Applications program interface)
31Role of DBMS
- Data load
- Editing
- Visualization
- Mapping
- Analysis
Geographic Information System
- Storage
- Indexing
- Security
- Query
Database Management System
32Relational DBMS (1)
- Data stored as tuples (tup-el), conceptualized as
tables - Table data about a class of objects
- Two-dimensional list (array)
- Rows objects
- Columns object states (properties, attributes)
Column attribute
Row object Vector feature
34Relational DBMS (2)
- Most popular type of DBMS
- Over 95 of data in DBMS is in RDBMS
- Commercial systems
- Informix
- Microsoft Access
- Microsoft SQL Server
- Oracle
- Sybase
- Structured (Standard) Query Language
(pronounced SEQUEL) - Developed by IBM in 1970s
- Now de facto and de jure standard for accessing
relational databases - Three types of usage
- Stand alone queries
- High level programming
- Embedded in other applications
36Types of SQL Statements
- Data Definition Language (DDL)
- Create, alter and delete data
- Data Manipulation Language (DML)
- Retrieve and manipulate data
- Data Control Languages (DCL)
- Control security of data
37Relational Join
- Fundamental query operation
- Occurs because
- Data created/maintained by different users, but
integration needed for queries - Table joins use common keys (column values)
- Table (attribute) join concept has been extended
to geographic case
1241 Ford 2003
1241 Subaru 2000
1241 Honda 1999
Record ID Address cars
1241 123 State St. 3
1242 1801 Main St. 1
1243 2106 Elm St. 2
1244 7262 Pine Drive 1
1241 123 State St. Ford
1241 123 State St. Subaru
1241 123 State St. Honda
1242 1801 Elm St. Kia
39Spatial indexing
- Many maps tiled
- B-tree (Balanced)
- Grid indexing
- Quad tree Points/regions
- R-tree (Based on MBR)
40New global/spatial grids QTM
41Go2 Grids
385322.08N 0770206.86W US.DC.WAS.
42Spatial SearchGateway to Spatial Analysis
- Overlay is a spatial retrieval operation that is
equivalent to an attribute join. - Buffering is a spatial retrieval around points,
lines, or areas based on distance.