Title: GIS Architectures, lecture 7
1GIS Architectures, lecture 7
- Digitization
- digitization is necessary if data is not yet in
digital format - nowadays data is often already in digital format
2Digitization
- Scanning
- Raster to Vector conversion
- Optical Character Recognition (OCR)
- On-screen digitization
- Digitization with a tablet or table
3(Image) scanners
- A device which analyzes a physical image or an
object and converts it into a digital image - CCD sensor (photoreactive surface capacitors)
- hand-held scanners (not used anymore)
- flatbed/desktop scanners
- PMT sensor (photomultiplier tube)
- drum scanner
- CMOS sensor (photoreactive surface
semiconductors) - not used in scanners since low quality
4CCD, Charge-coupled Device
photon
photoreactive surface pixel
capacitor
light (photons) release electrons from the
photoreactive surface, which accumulate in the
capacitor creating a charge which is relative to
the amount of light that has fallen on the
surface
5Color sensing with CCDs
Bayer mask
respective CCD pixels measure the amount of red,
green or blue light falling on the surfaces
there are two green pixels since human eye (cone
cells) are more sensitive to green
another (more expensive) possibility is to use
three CCDs and a prism
6CCD array grid
array of pixels capacitors
Electronics
grid of pixels
the electronics measure the charge, empty the
capacitors, digitise the information and store it
Electronics
7PMT
- Extremely sensitive detectors of light
8Specifications for scanners 1.
- Size of original (width, A4, ..), type of
original - B/W, grayscale or color
- color depth (bits)
- Speed (cm/s)
- Resolution (SPI, samples per inch)
- look at optical resolution since software
interpolated values are meaningless
9Specifications for scanners 2.
- Connectivity (Parallel port (slow), SCSI
(faster), USB 1 (relatively slow), USB 2
(fast)... Driver software) - Included software
- Output formats
- Price
10Drum scanners
- The image is wet-mounted (in oil) on a cylinder
- Full-spectrum light is beamed through the image
or reflected from the image - Cylider is rotated and PMT records one pixel at a
time - High-quality instruments
- expensive 5000 - 40 000
- require monitoring and calibration
- optical resolution is 8000-14000 spi
11CCD scanners
- Mass-production scanners
- Sensor is typically a line (or three lines for
color scanners) of several thousand CCDs - Light source is cold cathode (a neon light
variant) lamp or LED (light emitting diode) - Optical resolution of a CCD scanner is
1200..1600-3200..5400 spi - Typical cost 50 - 2000
- High-end CCD scanners may be as expensive as drum
scanners (and of comparable quality)
12Comparison
Drum
CCD
http//www.marginalsoftware.com/Scanner/downtown.h
tm
13Scanner output
- typically uncompressed RGB image transferred to
host computer - can be very large (thus requires a fast
connection)
14TWAIN Technology Without An Interesting Name
- An API for Windows and Mac
- A program running in the host computer makes
requests (get image, delete image, ...) to the
peripheral - Contains an architecturel blunder Does not
separate user interface from the device driver - makes it difficult to provide network access
- an application using a TWAIN driver has to use
the manufacturers GUI also - especially in digital cameras the images cant
be retrieved as files from the source
15SANE (Scanner Access Now Easy)
- An API that provides standardized access to any
raster image scanner hardware - SANE is open and free technology
- saned allows network access to the scanner which
is connected to a computer
16Raster to Vector conversion
- Difficult (to program) and error-prone process
Convert to internal raster representation
Rectification
Remove errors (trash) from the image
Smooth the lines in the image
Make continuous lines continuous
Thin the lines
Find the lines (straight or curves?)
Manual editing
17Rectification
- On-screen digitization of known points
- xf(x,y)
- yf(x,y)
- decide on the target raster
- go through the target raster pixel by pixel and
look from the original image what to put in them
18Error removal
- Remove areas which are smaller than some
threshold - dashed lines?
- Image manipulation methods
- grayscale thresholding,
19Thinning
- Jang, B-K., Chin, R.T. 1990. Analysis of Thinning
Algorithms Using Mathematical Morphology. IEEE
Trans. Pattern Analysis and Machine Intelligence.
12(6). 541-551. - used by, e.g., GRASS
- The thinning algorithm defines a set of
structuring templates and applies them in several
passes until there are no matches or until the
maxiterations is reached. Trimming means certain
structuring templates are applied to kill
emerging short limbs which appear because of
noise in the raster.
20Templates (in the thinning algorithm)
- A 3 by 3 matrix (pixel neighborhood) which is
applied on the source raster. - Template has values
- 0 and 1 these must match the pixels in the
source raster - -1 dont care about this pixel
- if the template matches, the the target raster
will have a specified new value, otherwise the
value from the source raster is used
21OCR
- Optical character recognition
- Scanned text ? text
- Complications
- different fonts
- alignment
- Often based on trained neural networks
22Digitization tablet or table
- Specifications
- Size of the work area (44x60 112cmx152cm
-gtsmaller) - Accuracy (0,005 0,13mm -gt lower)
- Connectivity (physical, logical)
- Compatible software
- Price (..1500-2500.. for large digitizers)
23Digitization
- A thriving industry in 3rd world countries (cheap
semi-skilled labor)