Title: Fun with Spatial Data
1Fun with Spatial Data
- Building and Managing the Database to Analysis
2GIS The Need for Information
Remember this is done in the context of problem
solving, decision support, and research through
objectives and purpose. GIS provides the
information from data. YOU provide the meaning!!!
Measure spatial variables
3Data Acquisition
- Fundamental tasks
- Data capture You do it from scratch, and paper
sources - Data transfer Conversion, importing data from
other varying sources and formats - Resource cost
- Up to 85 of project cost (from book)
- But I can vouch for that too
- Time
- Energy
- Personnel
- Storage
- Money
- Costs balanced with quality based on purpose
4Data Acquisition
ITERATIVE!!! Use feedback to refine and adjust
to meet objectives Good idea to do pilot study
Planning
5Data Acquisition
- ALL STARTS FROM YOUR PURPOSE!!!!!!!
- Sometimes I like to think about output requested
to meet purpose and then work backwards - SPATIAL MODEL!!!!!!!! GAME PLAN, FLOW CHART!!!
- Establish user requirements
- Establish data and resource needs
- Have to define your variables!!!
- This is why it is important to understand
variables and data types - Acceptable limitations on quality
- Acceptable style of data e.g. classification
- If required data exists,
- whats the purpose,
- whats the format,
- variable type,
- time and space scale/resolution,
- whats the quality?
- You may also have to clean it up..
- Where do I look for this?
- Sometimes more economical to acquire from scratch
6Data Acquisition
- Establish analysis requirements
- Impact on data input
- Required model(s) again think purpose
- Primarily between raster and vector firstmore
prevalent and common - Especially vector analyses within object-oriented
model - Hardware/software
- Outlaying analysis plan
- Again .. ITERATIVE!!!!
- You may change based on intermediate evaluations
- Document along the way!!! (NOTEBOOK!)
7Data Acquisition
- Required experience and familiarity
- Disciplinary How are variables defined?
- Site specific Knowledge of location?
- Common bottleneck in GIS process
- Expensive
- Error implications increases for each step of
process!!! - Again, document along the way
8Data Acquisition
- Other things to think about?
- Project management
- Time/personnel commitment
- Organizational issues
- In-house vs. outsourcing
- Cost
- Primary vs. Secondary data sources
- Legal implications
- If you document youll have the testimony
9Data Acquisition
- Data components
- Spatial features, grids
- Attribute data entry, databases, tables
- Approaches to data collection
- Primary data collection
- Secondary data collection
10Primary Data Collection
- Definition
- Derive data directly from source
- Usage
- Desired data is unavailable or inaccessible
- Desired data is outdated or unreadable
- Issues
- Target needs to specific project
- Again quality? Purpose?
- Model? Variable? Sampling?
- Tends to be very resource intensive
11Primary Data Collection
- Methods Direct, Direct, Direct!
- Raster
- Remote sensing images or readings from remote
sensors - Other field surveys direct readings from
measurements - Self evident, you figure out how your going to
generalize and create data set from sampling - Vector
- Surveying, Global Positioning System, Telemetry
- Other field surveys
12Secondary Data Collection
- Definition
- Indirect derivation of data from existing sources
- Must watch out for purpose of existing sources
- Usage
- Desired data already exists
- Issues
- May have to compromise on data accuracy or
content - Variables may not be defined the way you want
them - May not know lineage of data Yikes!
- But may be close enough to meet your purpose
- Tends to be less resource intensive
13Secondary Data Collection
- General Data Transfer
- Using existing GIS data sets
- Raster Methods
- Scanning records reflectance
- Convert hard copy to digital
- Varying pixel size/file size - dots per inch
(dpi)? - Varying applications
- Data and supporting documentation
- Varying quality
- Specialized scanning for GIS available
- Rasterization
- Vector to raster conversion
14Secondary Data Collection
- Vector Methods
- Digitizing
- Heads down and Heads up manual methods
- Heads down on digitizing tablet
- Heads up on computer screen
- Vectorization
- Raster to vector conversion
- Coordinate geometry (COGO)
- Use survey descriptions e.g. from property deed
- Very specific x,y and bearing data
- But watch outlegal issues
15Secondary Data Collection
- Vector Methods
- Photogrammetric approaches
- Measurements from photos
- May involve stereo pairs for 3-D
- May be analog, analytical or digital
- Text entry
- Often used for attribute entry in combination
with above methods - Character recognition software can increase speed
16Remote Sensing
- See Dr. Quackenbushs PPT on folder.
- Primary Data!
- Primarily raster, but..
- Definition
- Obtaining information about an object without
touching it - Practice
- Can measure physical, chemical, biological
without direct contact - Raster
- Classical remote sensing involves measurement of
reflected or emitted electromagnetic energy
across the EM spectrum - Mostly visual representation for analysis RASTER
17Remote Sensing
- Theory in brief
- Wave theory
- velocity of light (3108 m/sec) wave frequency
wave length - Particle theory or quantum theory
- Energy of quantum Plancks constant
(6.62610-34 Joules/sec) frequency - Together
- So energy (Plancks constant velocity of
light)/wavelength - Aha! So the longer the wavelength the lower the
energy - There are more equations, but I wont get into
them here
18Remote Sensing
- The point is WAVELENGTH! Reflected or emitted
- Reflected sun present, photographic
- Emitted Non photographic, need other sensors
- Dividing line? 3 micrometers
- Above emitted prevails, below reflected prevails
- Also issue for type of sensor, active or passive
- Active sensors use energy to enhance/illuminate
readings - Passive sensors only detect naturally occurring
energy - Ultimately, we categorize by wavelength location
along EM spectrum
19Remote Sensing
um is micrometer 110-6 Visible range is .4 to
.7 um (blue, green, red), Ultraviolet below blue,
Infrared above (Near, Mid, Thermal), Microwave at
1mm to 1m
20Remote Sensing
- We have to worry about 4 forms of resolution from
remotely sensed imagery - Spectral number and width of wavelength
intervals in spectrum - Single band only one section of the EM spectrum
- Multi-band or Multi-spectral several bands
simultaneously - Radiometric sensitivity of sensor to
differences in signal strength in recording
radiant flux - Spatial - smallest linear separation between two
objects that can be resolved by the sensor - This is the one we are used to, pixel size
- Temporal - how often the sensor records imagery
over an area - AKA repeat cycle
- Geostationary or Earth Orbiting
- E.g. SPOT 26 day cycle
21Spectral Reflectance Patterns
High
Spectral Reflectance
Low
? ?
Blue
Green
Red
Near-IR
Mid-IR
Spectral Region
22Remote Sensing
- Practical Issues
- Computers restricted to 256 values of color
- So sometimes have to be normalized to fit Aha!
Interval Data because its coded along a linear
function - Typically, you get to pick the band to work with
for your analysis - Resource cost!!!
23Surveying
- Primary data
- Usually for vector data model, but.
- Principle
- Measure angular and linear distances
- Locate 3-D position of points using known control
points
24Surveying
- Usage
- Provide point locations
- Particularly useful for small area data
collection - Good for some validation
- Issues
- Limitations in study area
- Need intervisibility between ground points
- Accuracy limited by equipment and techniques
25Global Positioning System
- Primary data
- Primarily for vector, but.
- Purpose
- Used to derive global coordinate locations
- Need to separately collect attributes for GIS
applications - Space based component of GPS
- Series of satellites in orbit around earth
- Satellite positions are very accurately known
- Satellites are transmitting signals
- Ground component of GPS
- Monitoring stations
- Ground receivers acquiring satellite signals
26Now up to 27 satellites
Peter H. Dana, The Geographers Craft Project,
Department of Geography, The University of
Colorado at Boulder
27Global Positioning System
Peter H. Dana, The Geographers Craft Project,
Department of Geography, The University of
Colorado at Boulder
28Global Positioning System
- General procedure
- Measure time of signal travel from 4 satellites
to ground receiver - Why 4? 4 unknowns X,Y,Z and time
- Calculate distances from each satellite to ground
receiver - Calculate ground position with respect to a
geocentric coordinate system
29Global Positioning System
- Issues
- GPS has highest accuracy for relative positioning
- Usually need post-processing for precise
positioning - Limiting factors
- GPS requires view of 4 satellites and time to
lock-in - Geometric distribution of satellite impacts
accuracy of position too much atmosphere at
horizon - Signal will bounce off features before reaching
receiver (multi-path)
30Rasterization and Vectorization
- Secondary data acquisition!
- Conversion from one to the other
- Rasterization convert from vector to raster
- Vectorization convert from raster to vector
- Be careful!! Has to be justified by your purpose
- Software helps!
- Can take a long time
31(No Transcript)
32(No Transcript)
33Digitizing
- Secondary Data
- For Vector data model, but.
- Definition
- Use digitizing table or screen to collect points
and/or follow lines - Heads-down on digitizing tablet
- Heads-up on computer screen
- Issues
- Tics!!!! Georeferences!!!!
- USGS maps easy, other data sets not so easy
- Need at least 4, 6 or 8 better, spread out
- Condition of source document
- Media, stability, wrinkles, folds,
generalization/abstraction - Systematic approach
- Must keep yourself on track
- Need to verify accuracy and completeness
- Check your work after youre done against source
and other validation
34Data Transfer
- Secondary data acquisition
- For all models
- Basically youre using an existing GIS data set
- Application
- Alternative to data capture
- Following data capture
- Sources of digital data
- Government agencies
- Commercial data providers
- Private entities
- Special interest groups
35Data Transfer
- Issues
- Data quality
- Reliance on metadata
- Data standards
- Graphics
- Data transfer
- Metadata standards
- Lots of standards e.g. for paper map accuracy
- No quality standards for digital spatial data
- Data format
- Wide variety of formats available
- Suitable format will depend on application
- Often need to use intermediate file type to move
between data formats
36Sources of Digital Data
- Federal Geospatial Data Clearinghouse
- http//www.fgdc.gov/clearinghouse
- New York State Data Clearinghouse
- http//www.nysgis.state.ny.us
- Cornell University Geospatial Information
Repository - http//cugir.mannlib.cornell.edu/
- Many more.sometimes easy to internet search
37Preprocessing
- Okay, so you started your project and defined
your objectives and outlined your analysis - You have some data sets either captured or
transferred based on your objectives - You should have documented what you have prior to
this point from the metadata of existing data and
your own work - What else do we have to do?
- Preprocessing!
- I will try to get everything in the same place in
space, with similar resolution and scale, data
format, and known error - Again, make sure you document as you go along
- This is where things can get really troublesome
38Preprocessing
- Purpose
- Prepare data for spatial analysis
- Provide consistent spatial referencing
- Enhance spatial analysis
- Reduce noise
- Issues
- Data characteristics
- Hardware/software tools
- Format conversion
39Preprocessing
- Variation in data types and variables
- Real vs. integer vs. binary numbers
- Variation in file formats
- Raster standard and proprietary formats
- JPEG, TIF, GRID, IMG, etc
- Vector standard and proprietary formats
- TIGER, DLG, shapefiles, coverages, etc
40Preprocessing
- Variation in Data Compression
- Variation in Generalization/Abstraction
- Variation in tile size, characteristics
- Problem for both raster and vector
- Have to put two or more layers side by side
- Doesnt just simply line up at the edges
- Remember? Projections
- Also overlaps
41(No Transcript)
42Preprocessing
- What can we do?
- Edge Matching and Merging
- Creates a seamless data set
- But while somewhat automated, you may have to do
a lot of editing at the edges - You may also go the other way and clip out an
area of interest from a much larger tile
43Preprocessing
- Variation in Georeferences
- Often have datasets with unknown georeferencing
- Apply to scanned data or data from poorly
documented digital source - Or going from one set of georeferences from
another - Rectification and Registration
44Rectification and Registration
- Rectification
- Relate a data set to an established reference
- Goal
- Produce correct geometric relationships between
objects in spatial layers - Remove distortions and improve geometric
consistency - Application
- Usually occurs by relating to an established
projected coordinate system with datum - In ArcINFO we use TRANSFORM
- Registration
- Relate one data set to another
- Goal
- Produce a match between objects in spatial layers
- But by going from one to the other you inherit
the originals problems - Application
- Matching one data set with known projected
coordinate system into another data set with
different projected coordinate system - In ArcINFO we use PROJECT
45Rectification and Registration
- Need for registration and rectification
- GIS layers often from disparate sources
- No georeferencing
- Different coordinate systems
- Different datums
- Often performed as part of preprocessing
- TICS are IMPORTANT
- Known points with coordinates
- Used for registration and rectification
- Create new data
- Line-up existing data
- Adjusting existing data
46Rectification and Registration
- Components of rectification and registration
- Example applications
- Translation origin shift (from false easting)
- Scaling unit conversion (feet to meters)
- Rotation changing from magnetic to true north
- As you can image, we have to crunch a lot of
numbers to get from one set of coordinates to
anotherespecially with projections and datums
47Translation
Y
(x,y)
X
48Translation
Y'
Y
x' x Xo y' y Yo
(x',y')
X'
Yo
X
Xo
49Scaling
Y
Y'
x' x.Sx y' y.Sy
(x,y)
(x',y')
X
X'
50Rotation
Y
x' x.cos? - y sin? y' x.sin? y cos?
(x,y)
X
51Rectification and Registration
- Process
- Select model
- Describe relationship between input and desired
output - Xout f(Xin, Yin) Yout g(Xin, Yin)
- Calibrate model
- Use control points (homologous points) known in
both - Thats why we use TICS!!!!
- Produce specific model
- Verify model
- Using independent check points
- Apply model
- Apply to entire dataset
- Software does this for you, but you have to know
what coordinates, units, projection, and datum
you are coming from and what you want to go to
52Error
- Want to capture errors and fix them
- Spatial errors
- Location errors, consistency of data
- Note on RMSE
- Root Mean Square Error basically average error of
x,y distances from calibration between locations - Used in digitizing
- Attribute errors
- Consistency? Labeling
- Reliance on
- Topology connecting my arcs and polygons?
- Metadata
- Experimentation
- Comparing data layers
- Sampling and validation techniques
53Documentation
- Data Documentation
- Issue
- Preprocessing inherently changes data
- Need to provide documentation
- Understand potential error sources
- Understand impact to location and attribute
components - Repeat processing
54Next?
- Database Management
- Analysis
- Output