Chap8: Trends in DBMS - PowerPoint PPT Presentation

About This Presentation
Title:

Chap8: Trends in DBMS

Description:

Video, i.e. time series of images. Audio data. Focus: Primarily images, ... 2. Do-it-yourself. Divide a raster data-item into smaller slices. Q? ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 43
Provided by: spatia
Category:
Tags: dbms | chap8 | trends

less

Transcript and Presenter's Notes

Title: Chap8: Trends in DBMS


1
Chap8 Trends in DBMS
8.1 Database support for Field Entities 8.2
Content-based retrieval 8.3 Introduction to
spatial data warehouses 8.4 Summary
2
Learning Objectives
  • Learning Objectives (LO)
  • LO1 Learn about field data
  • Why learn about field data type?
  • What is field data type? How is represented in
    SDBMS?
  • What are common operations on fit?
  • LO2 Learn about storage and retrieval of field
    data
  • LO3 Learn about spatial data warehouses
  • Mapping Sections to learning objectives
  • LO1 - 8.1.1
  • LO2 - 8.1.2, 8.2
  • LO3 - 8.3

3
Why learn about Field data-sets?
  • Field data is timely and abundant
  • Sensors (e.g. satellite based ones) provide
    periodic snapshot of Earth
  • Most up-to-date data about current events (e.g.
    fires, flood)
  • Field data are useful
  • in creating, revising and evaluating vector data
    sets
  • digital archival of fragile historical paper maps
  • to manually get details not captured in vector
    interpretations
  • Example Location selection for a facility (e.g.
    a grocery store)
  • Consider a set of Aerial photographs of different
    locations
  • Vector interpretation includes roads, water
    bodies, elevation
  • What other information can aerial imagery reveal
    for construction planning?
  • Trees (types and location), buildings,

4
What are Field data-sets?
  • Field data set examples
  • Satellite images, aerial photographs
  • Digitized paper maps
  • Earth Science data-sets, e.g. rainfall,
    temperature maps
  • Data types of Spatial field data sets
  • Images
  • Satellite based, e.g. www.terradata.com
  • Aerial photographs
  • Measurements from a Geo-registered sensor
    networks, e.g. weather
  • Video, i.e. time series of images
  • Audio data
  • Focus Primarily images,
  • though some discussion will apply to other data
    types

5
Fields and Rasters An Sampling of Field values
  • Definitions
  • Field a mapping from a spatial domain to a
    value domain
  • Image a mapping from a rectangular grid to a
    value domain
  • A rectangular grid is a collection of cells
    called pixels
  • Raster is geo-registered image, i.e. grid axis
    have absolute spatial locations
  • Fields are often approximated as rasters
  • Example Figure 8.1
  • Identify spatial domain, field, rectangular
    grid, raster approximation
  • Fields can be approximated as images if relative
    spatial locations are adequate

Fig 8.1
6
Computing with field data
  • Field data manipulated using operations of
  • map algebra
  • image algebra
  • An Algebra is a mathematical structure consisting
    of
  • Operands and Operations.
  • Map Algebra
  • Operand rasters
  • Operations Can be classified into four groups
  • Local, Focal, Zonal and Global
  • Image Algebra
  • Operand images
  • Operations crop, zoom, rotate

7
Local Operation
A local operation maps a raster into another
raster such that the value of a cell in the new
raster depends only on the value of that cell in
the original raster. Examples unary operation
thresholding binary operation point wise
addition
Fig 8.2
8
Focal Operation
In a focal operation, the value of a cell in the
new raster is dependent on the values of the cell
and its neighboring cells in the original
raster. Examples unary operations focal sum,
gradient,
Neighborhoods Rook, Bishop and Queen
Fig 8.3
9
Zonal Operation
In a global operation, the value of a cell in the
new raster is a function of the location or
values of all cells in the original or another
raster. Examples zonal sum, zonal average, ...
Fig 8.4
10
Global Operation
In a zonal operation, the value of a cell in the
new raster is a function of the value of that
cell in the original layer and the values of
other cells which appear in the same zone
specified in another raster. Example distance
from nearest facility
Fig 8.5
11
Image OperationsTrim
  • Image Operations
  • ignore the absolute locations of pixels.
  • come from image processing literature
  • Ex. smoothing, low pass filter, high pass filter,
  • Example A trim operation extracts an
    axis-aligned subset of the original raster.

Fig 8.6
12
Learning Objectives
  • Learning Objectives (LO)
  • LO1 Learn about field data
  • LO2 Learn about storage and retrieval of raster
    data
  • How is raster data stored on secondary storage?
  • What query families are used for retrieval?
  • What is content based retrieval (CBR)? Why is it
    interesting?
  • How is CBR computationally approached?
  • LO3 Learn about spatial data warehouses
  • Mapping Sections to learning objectives
  • LO1 - 8.1
  • LO2 - 8.1.2, 8.2
  • LO3 - 8.3

13
Storage and Retrieval of Raster Data - 1
  • Traditional Approach
  • store raster data in a file system
  • use custom software to retrieve data-items of
    interest
  • Example personal photographs stored on MS
    Windows
  • Q? What attributes can one attach to digital
    photographs ?
  • Q? Is there an easy way to retrieve all pictures
    taken in San Francisco?
  • Limitations
  • Rigid schema
  • Limited ability to add and manage additional
    attributes
  • Canned Queries only
  • Limited ability to support ad-hoc queries
  • Data quality
  • Limited ability to identify duplicates or similar
    data-items

14
Storage and Retrieval of Raster Data in a SDBMS
  • A database approach
  • Database tables store
  • raster data items
  • attributes (i.e. meta-data), e.g. creation date,
    geo-location, subject, ...
  • use SQL like query language to retrieve desired
    data-items
  • retrieve all raster data-items overlapping with
    city of San Francisco (Q1)
  • retrieve latest raster data-item within city of
    Paris (Q2)
  • retrieve raster data-items similar to a given
    image (Q3)
  • Pros
  • table schema definition allows user defined
    attributes
  • improve ability to pose ad-hoc queries (Ex. Q1,
    Q2)
  • improve data reliability and quality
  • Example Query Q3 may be used for duplicate
    reduction

15
Storage and Retrieval of Raster Data - Challenges
  • Challenges in database based approach
  • storage size( raster data item) gt size (disk
    blocks)
  • retrieval raster has rich content
  • A picture is worth a thousand word!
  • Approaches to storage challenge
  • 1. Delegate storage to DBMS
  • Use Binary Large Object (BLOB) data-type
  • create table my_picture(
  • image BLOB
  • creation_date date
  • place point
  • )
  • 2. Do-it-yourself
  • Divide a raster data-item into smaller slices
  • Q? Which way of slicing reduce disk I/Os for
    common queries?

16
8.1.2 How is raster data stored on secondary
storage?
  • Slicing approaches
  • Linear, e.g. one row per disk block (see Fig.
    8.8(b))
  • Tiling - see Fig. 8.8(c )
  • Tiling is preferred
  • for queries extracting rectangular sub-images
  • Example - terraserver.com

Fig 8.8
17
8.2 How is raster data queried?
  • Retrieval challenge of rich content
  • A. Meta-data approach
  • B. Content based retrieval
  • Meta-data approach
  • select a set of descriptive attributes
  • simpler SQL data types, e.g. numeric, string,
    date, ...
  • Example source, location, time stamp, subject,
    resolution, ...
  • Store values of descriptive attributes for each
    raster data-item
  • Allow SQL queries on the descriptive attributes
  • Limitation of meta-data approach
  • Restricts queries to content captured by
    descriptive attributes
  • Does not support Similarity based queries
  • Ex. Find all raster data-items similar to a given
    raster data item.

18
8.2 Content Based Retrieval (CBR)
  • Examples
  • Q1. Find all raster data-items similar to a given
    raster data item
  • Q2. Locate a photograph of a river in Minnesota
    with trees nearby.
  • Q3. Find all images of state parks which have a
    lake within them, are within a radius of one
    hundred miles from Chicago, and are southwest of
    Chicago.
  • State of the Art
  • However, few robust implementations of CBR are
    available as of 2002
  • Several research prototypes address similarity
    query Q1
  • Result quality is similar to those of web
    searches (e.g. www.google.com)
  • Some of the retrieved raster data-item are
    useful.
  • Many similar data item are not retrieved in the
    result
  • Usable in application domains such as publishing
  • Our goal is to understand a current approach to
    similarity queries
  • involving spatial similarities

19
8.2 Content Based Retrieval (CBR)
  • Spatial Similarity
  • Consider a pair of raster images with common
    objects (e.g. parks, lakes)
  • Spatial similarity between raster images can be
    defined based on
  • similarity of spatial relationships (e.g.
    topological, directional)
  • Q? Which pairs exhibit higher similarity?
  • P1 (inside, disjoint) or P2 (inside, covered
    by)
  • P3 (disjoint, touch) or P4 (disjoint, inside)
  • P5 (north west, north) or P6 (west, east)
  • A graph framework for comparing spatial
    relationships
  • Nodes spatial relationships Edges connect
    most similar nodes
  • Similarity metric number of edge on shortest
    path between 2 nodes
  • See Figures 8.9 and 8.10

20
8.2.1 Topological Relationship Similarity
  • Study Fig. 8.9, pp. 234
  • Nodes topological relationships
  • Edges most similar
  • Similarity measure path length
  • Inference from Model
  • P2 (inside, covered by) more similar than P1
    (inside, disjoint)
  • Do you agree?
  • Review Figure 2.3 (pp. 30)

Fig 8.9
21
8.2.2 Direction Relationship Similarity
  • Study Fig. 8.10, pp. 235
  • Nodes topological relationships Edges most
    similar
  • Similarity measure path length
  • Inference P5 (north-west, north) more similar
    than P6 (west, east)

Fig 8.10
22
8.2.3 Distance Similarity
  • Distance similarity is based on
  • Euclidean distance between the centroids of the
    objects.
  • Example Image R is more similar to P than Q in
    Fig. 8.11 (pp. 235)

Fig 8.11
23
8.2.4 A Computational Approach to CBR
  • Attribute Relation Graph (ARG)
  • Node objects in a raster
  • Edges relationships
  • Ex. Raster of Fig. 8.12(a)
  • ARG in Fig. Fig. 8.12(b)
  • Point object O3
  • Rectangles O1, O2
  • Edge (O1, O2) shows that they are disjoint, at
    61 degree direction and 5.2 units distant.
  • Vector representation of ARG
  • Lists objects and edge properties
  • Ex. In Fig. 8.12

Fig 8.12
24
A Computational Approach to CBR
  • Steps
  • 1. Represent each raster data item by its ARG
    vector
  • 2. Map query raster data item by its ARG vector
  • 3. Find most similar raster data-items in the
    database by comparing ARG vector representations.
  • Use a distance metric
  • Use a multi-dim. Index
  • Comment Result quality is similar to those of
    web searches. Some of the retrieved raster
    data-item are useful.

Fig 8.13
25
Learning Objectives
  • Learning Objectives (LO)
  • LO1 Learn about field data
  • LO2 Learn about storage and retrieval of field
    data
  • LO3 Learn about spatial data warehouses
  • What are data warehouses? Why are they
    interesting?
  • What are aggregate functions? Which ones are easy
    to compute?
  • Mapping Sections to learning objectives
  • LO1 - 8.1
  • LO2 - 8.1.2, 8.2
  • LO3 - 8.3

26
8.3 Why are Data Warehouses Interesting?
  • Data Warehouse facilitate group decision making
  • Consider a dataset
  • 1 measure (i.e. Sales)
  • 3 dimensions (e.g. Company, Year, Region)
  • Analysis questions
  • Q1. Rank Regions by total sales.
  • Q2. Rank years by total sales.
  • Q3. Where are sales consistently growing?
  • Cross tabulates summaries reports used to
    analyze the trends
  • Example

27
8.3 Generating cross-tabulation summaries
  • Traditional Approach
  • Use custom software pulling data out of a DBMS
  • Limitations redundant of work, inefficient use
    of resources
  • Data Warehouse approach
  • Cross-tab. Can be generated using a set of
    simple report
  • Each report is generated from a SQL Select ...
    group by statement
  • Example Fig. 8.19 (pp. 244) and Table 8.3 (pp.
    245)
  • Cross-tab example in last slide is a union of
  • SALES-L0-A, SALES-L1-A, SALES-L1-B and SALES-L2
  • Table 8.3 shows SQL queries to compute each part
  • Advantage
  • Rest of SQL is available for pre/post processing
    of data
  • Performance gains by eliminating unnecessary
    copying of data

28
Example Data Warehouse (Fig. 8.19)
Fig 8.19
29
8.3.4 Cross-tabulation vs.report hierarchy
  • Spreadsheet view of a report
  • Views a report a N-dim. Spreadsheet
  • N number of dimension attributes
  • Each cell contains value of measure
  • Cross-tabulation view of a Report hierarchy
  • Example report hierarchy for
  • SALES-L0-A, SALES-L2-A, SALES-L1-B, SALES-L2,
    Fig. 8.19 (pp. 244)

30
8.3 What is a Data Warehouse?
  • Data Warehouse is a special purpose database
  • Primarily used for specialized data analysis
    purposes
  • Facilitates generation and navigation of a
    hierarchy of reports
  • Special purpose data-sets and queries
  • Data consists of
  • a few measure attributes
  • a set of dimension attributes
  • The measure attribute depends on dimension
    attributes
  • Queries generate reports
  • Report measure for selected values of dimensions
  • Aggregate measure for given subset of dimensions
  • What is a spatial data warehouse?
  • Data warehouses with spatial measures or
    dimensions
  • Example census data - census tract is a spatial
    dimension
  • Example logistics data - route is a spatial
    dimension

31
8.3.4 Data Warehouse Operations
  • Operations on a data warehouse
  • Roll-up, Drill-down
  • Slice, Dice
  • Pivot
  • Roll-up
  • Inputs A report R, A subset S of dimensions in
    R
  • Output A sequence of reports summarizing R
  • Example 1 R SALES-Base, S (Year, Region) in
    Fig. 8.19 (pp. 244)
  • Output consists of reports SALES-L0-A,
    SALES-L1-B, SALES-L2
  • Example 2 R SALES-Base, S (Region, Year)
  • Output consists of reports SALES-L0-A,
    SALES-L1-A, SALES-L2
  • Drill-down
  • Inputs A report R, A dimension D not in R
  • Output A reports detailing R on D
  • Example R SALES-L1-B, D Region in Fig. 8.19
    (pp. 244)
  • Output report SALES-L0-A

32
8.3.4 Data Warehouse Operations
  • Slice, Dice
  • Reduce dimensions in a table- (Fig. 8.7, pp
    232).
  • Inputs A report R, A value V for a dimension D
    in R
  • Output A subset of R where D V
  • Example R SALES-L0-A, D Year, V 1994 in
    Fig. 8.19 (pp. 244)
  • Output Table 8.5 (pp. 246)
  • includes tuple (ALL, 1994, America, 35)

Fig 8.7
33
8.3.4 Data Warehouse Operations
  • Pivot
  • For a spreadsheet view of reports
  • Transposes a spreadsheet
  • Example
  • Inputs A spreadsheet view of a report R
  • Output A transposed spreadsheet
  • Ex. R SALES-L0-A, Fig. 8.19 (pp. 244)

34
Logical Data Model of a DWH
  • Purpose of a logical data model
  • Specify a framework to specify computational
    structure
  • Allow extension of SQL to model new needs
  • Cube operation
  • Input A fact table
  • Output A set of summary reports covering all
    subsets of dimension columns
  • Equivalent to union of all tables and reports in
    Fig. 8.19 (pp. 244)
  • Ex. Fig. 8.18, pp. 243
  • SELECT Company, Year, Region, Sum(Sales) AS Sales
  • FROM SALES
  • GROUP BY CUBE Company, Year, Region

35
Fig 8.18
36
Physical Data Model of a DWH
  • Purpose Computationally efficient
    implementation
  • Ideas
  • Pre-computation -
  • pre-compute some of reports and use those to
    compute other reports
  • New indexing methods, e.g. bit-map index
  • Query Processing Strategies
  • Strategies for aggregate functions
  • New strategies for multi-table joins
  • Let us look at strategies for aggregate functions

37
DWH Physical Model Aggregate function strategies
  • Aggregate Functions
  • Compute summary statistics for a given set of
    values
  • Examples sum, average, centroid (Table 8.1, pp.
    238)
  • Strategies for efficient computation
  • Characterize easy to compute aggregate functions
  • 3 categories
  • Distributive
  • Algebraic
  • Holistic
  • First 2 categories can be computed easily in one
    scan of the dataset

38
Definitions of Aggregate Function Categories
  • Notation
  • F, G, G1, G2, Gn are aggregate functions where
    n is small
  • S is a set of values, e.g. S (1, 2, 3, 4)
  • P (S1, S2, , Sp) is a partition of S, e.g. P
    (S1, S2), S1 (1, 2), S2 (3, 4)
  • Distributive( F ) if there exists a G such that
  • F( S ) G ( F(S1), F(S2), , F(Sn) )
  • Example sum is distributive
  • Illustration sum(1, 2, 3, 4) sum ( sum(1, 2),
    sum(3, 4)
  • Algebraic( F ) if there exists G1, , Gn, (where
    n is small) and
  • F( S ) G ( G1(S1), , Gn(S1), G2(S1), ,
    Gn(S2), , G1(Sp), , Gn(Sp) )
  • Example average is distributive
  • Illustration average(1, 2, 3, 4)
  • count(1, 2) average(1, 2) count(3, 4)
    average / count(1,2) count(3,4)

39
Example Distributive Aggregate Function
  • Examples in cross-tabulation scenario (Fig.
    8.14, pp.238)
  • Example 1. Min is distributive
  • Example 2. Count is distributive

Fig 8.14
40
Examples Algebraic Aggregate Functions
  • Examples in cross-tabulation scenario (Fig.
    8.15, pp.239)
  • Average and Variance are algebraic

Fig 8.15
41
Discussion - Spatial Data Warehouse
  • Example
  • Consider the example in Fig. 8.16, pp. 241
  • A map interpretation may be attached to each
    report
  • Each row has a spatial footprint, which can be
    aggregated by geometric-union
  • The collection of maps may be called a mapcube
  • Issues
  • What is needed in OGIS standard to support
    map-cube operation?
  • Hierarchical collection of maps in mapcube
  • What is an appropriate cartography to convey the
    relationship among maps?

42
Spatial Data Warehouses and Mapcube
Fig 8.16
43
Fig 8.17
44
Summary
  • Field data
  • useful in many applications due to rich content
  • Represented as raster or image
  • Operations can be categorized into local, focal,
    zonal, and global
  • Field data storage and retrieval
  • Tiling is a preferred way to divide raster data
    into disk blocks
  • Meta-data based query is often used for retrieval
  • Content based retrieval may be used for
    similarity searches
  • Data warehouses support analysis e.g.
    cross-tabulation reports
  • SQL CUBE operator support generation of DWH
    reports
  • Distributive and Algebraic aggregate functions
    can be computed easily
Write a Comment
User Comments (0)
About PowerShow.com