SQL:1999 Advanced Querying - PowerPoint PPT Presentation

About This Presentation
Title:

SQL:1999 Advanced Querying

Description:

A data warehouse archives information gathered from multiple sources, and stores ... 13 Silberschatz, Korth and Sudarshan. 22.13. Database System Concepts 4th Edition ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 26
Provided by: profd89
Learn more at: http://web.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: SQL:1999 Advanced Querying


1
SQL1999 Advanced Querying
  • Decision-Support Systems
  • Data Warehousing
  • Data Analysis and OLAP
  • This also apply to SQL 2003 that is just a
    minor revision of SQL1999

2
Decision Support Systems
  • Decision-Support systems are used to make
    business decisions often based on data collected
    by on-line transaction-processing systems.
  • Examples of business decisions
  • what items to stock?
  • What insurance premium to change?
  • Who to send advertisements to?
  • Examples of data used for making decisions
  • Retail sales transaction details
  • Customer profiles (income, age, sex, etc.)

3
Decision-Support Systems Overview
  • A data warehouse archives information gathered
    from multiple sources, and stores it under a
    unified schema, at a single site.
  • Important for large businesses which generate
    data from multiple divisions, possibly at
    multiple sites
  • Data may also be purchased externally
  • Data analysis tasks are simplified by specialized
    tools and SQL extensions
  • Example tasks
  • For each product category and each region, what
    were the total sales in the last quarter and how
    do they compare with the same quarter last year
  • As above, for each product category and each
    customer category
  • Statistical analysis packages (e.g., S) can
    be interfaced with databases
  • Partial support through OLAP Functions
  • Data mining seeks to discover knowledge
    automatically in the form of statistical rules
    and patterns from Large databases.

4
Data Warehousing
  • A data warehouse is a repository of information
    gathered from multiple sources.

5
Data Warehousing (Cont.)
  • Provides a single consolidated interface to data
  • Data stored for an extended period, providing
    access to historical data
  • Data/updates are periodically downloaded form
    online transaction processing (OLTP) systems.
  • Typically, download happens each night.
  • Data may not be completely up-to-date, but is
    recent enough for analysis.
  • Running large queries at the warehouse ensures
    that OLTP systems are not affected by the
    decision-support workload.

6
Issues in Building a Warehouse
  • When and how to gather data.
  • Source driven data source initiates data
    transfer
  • Destination driven warehouse initiates data
    transfer
  • What schema to use.
  • Schema integration
  • Cleaning and conversion of incoming data
  • What data to summarize.
  • Raw data may be too large to store on-line
  • Aggregate values (totals/subtotals) often suffice
  • Queries on raw data can often be transformed by
    query optimizer to use aggregate values
  • How to propagate updates.
  • Date at warehouse is a view on source data
  • Efficient view maintenance techniques required

7
Data-Warehouse Architecture
8
Star Schema For A Data Warehouse
9
Online Analytical Processing
  • The operation of changing the dimensions used in
    a cross-tab is called pivoting.
  • An OLAP system provides other functionality as
    well. For instance, the analyst may wish to see a
    cross-tab on item-name and color for a fixed
    value of size, for example, large, instead of the
    sum across all sizes. Such an operation is
    referred to as slicing. The operation is
    sometimes called dicing, particularly when values
    for multiple dimensions are fixed.
  • The operation of moving from finer-granularity
    data to a coarser granularity is called a rollup.
  • The opposite operation - that of moving from
    coarser-granularity data to finer-granularity
    data is called a drill down.

10
Relational Representation of the Data in Figure
22.1
11
Cross Tabulation of sales by item-name and color
  • The table above is an example of a
    cross-tabulation(or cross-tab) also referred to
    as a pivot-table. In general, a cross-table is a
    table where values for one attribute form the row
    headers, values for another attribute form the
    column headers, and the values in an individual
    cell are derived as follows.
  • A cross tab with summary rows/columns can be
    represented by introducing a special value all to
    represent subtotals.

12
Hierarchies on Dimensions
13
Cross Tabulation of sales With Hierarchy on
item-name
14
Three-Dimensional Data Cube
15
OLAP Implementation
  • The earliest OLAP systems used multidimensional
    arrays in memory to store data cubes, and are
    referred to as mutidimensional OLAP (MOLAP)
    systems.
  • Hybrid systems, which store some summaries in
    memory and store the base data and other
    summaries in a relational database, are called
    hybrid OLAP (HOLAP) systems.

16
Data Analysis (Cont.)
  • Rollup Moving from finer-granularity data to a
    coarser granularity by means of aggregation.
  • Drill down Moving from coarser-granularity data
    finer-granularity data.
  • Proposed extensions to SQL, such as the cube
    operation help to support generation of summary
    data
  • The following query generates the previous table.
  • select color, size, sum (number)
  • from sales
  • groupby color, size with cube

17
Data Analysis (Cont.)
  • Figure shows the combinations of dimensions size,
    color, price
  • In general computing cube operation with n
    groupby columns gives 2nd different groupby
    combinations.

18
SQL1999 Extended GroupBy
  • also supports generalizations of the group by
    constructs, using the cube and rollup constructs.
    A representative use of the cube construct is
  • select item-name, color, size,
    sum(number) from sales group by cube(item-name,
    color, size)
  • This query computes the union of eight different
    groupings of the
  • sales relation
  • (item-name, color, size), (item-name, color),
    (item-name, size), (color, size), (item-name),
    (color), (size), ( )
  • Where ( ) denotes an empty group by list.
  • For each grouping, the result contains the null
    value for attributes not present in the grouping.
    For instance, with occurrences of all replaced by
    null, can be computed by the query
  • select item-name, color, sum(number) from
    sales group by cube(item-name, color)

19
Extended Group By (Cont.)
  • A representative rollup construct is
  • select item-name, color, size,
    sum(number) from sales group by
    rollup(item-name, color, size)
  • Here only four grouping are generated
  • (item-name, color, size), (item-name, color),
    (item-name), ( )
  • Rollup can be used to generate aggregates at
    multiple levels of ahierarchy on a column. For
    instance, we have a table itemcategory(item-name,
    category) giving the category of each item. Then
    the query
  • select category, item-name, sum(number)from
    sales, categorywhere sales.item-name
    itemcategory.item-namegroup by rollup(category,
    item-name)
  • would give a hierarchical summary by item-name
    and by category.

20
Extended Group By (Cont.)
  • Multiple rollups and cubes can be used in a
    single group by clause.For instances, the
    following query
  • select item-name, color, size, sum(number)from
    salesgroup by rollup(item-name), rollup(color,
    size)
  • generates the groupings
  • (item-name, color, size), (item-name, color),
    (item-name), (color, size), (color), ( )
  • The function grouping can be applied on an
    attribute it returns 1 if the value is a null
    value representing all, and returns 0 in all
    other cases. Consider the following query
  • select item-name, color, size,
    sum(number), grouping(item-name) as
    item-name-flag, grouping(color) as
    color-flag, grouping(size) as size-flag,from
    salesgroup by cube(item-name, color, size)

21
Extended Aggregation OLAP Functions
  • Rank, row_number and other functions based on
    explicit order
  • Many new statistical functions see, e.g., DB2
    UDB's High-Function Business Intelligence in
    e-business---Red Book http//www.redbooks.ibm.com/
    redbooks/SG246546.html
  • STDDEV, etc.
  • CORRELATION,
  • COVARIANCE.
  • Regression functions Y a X b
  • REGR_SLOPE -gt a
  • REGR_INTERCEPT -gt b
  • Windows on aggregates
  • Physical windows based on the number of rows in
    the window
  • Logical windows based on the value span of the
    window.

22
Ranking
  • Ranking is done in conjunction with an order by
    specification. Suppose we are given a relation
    student-marks(student-id, marks) which stores the
    marks obtained by each student. The following
    query gives the rank of each student.
  • select student-id, rank( ) over (order by
    (marks) desc) as s-rankfrom student-marks
  • An extra order by clause is needed to get them in
    sorted order, as shown below.
  • select student-id, rank ( ) (order by (marks)
    desc) as s-rankfrom student-marks order by
    s-rank

23
Partition By
  • Ranking can be done within partition of the data.
    The following query then gives the rank of
    students within each section
  • student-marks(student-id, marks)
    student- section(student-id, section)
  • select student-id, section, rank( ) over
    (partition by section order by marks desc)
    as sec-rankfrom student-marks,
    student-sectionwhere student-marks.student-id
    student-section.student-idorder by section,
    sec-rank
  • dense_rank() no holes after ties
  • row_number() provide row numbering given a
    specific partitioning and ordering of rows.

24
Ranking (Cont.)
  • For a given constant n, the ranking the function
    ntile(n) takes the tuples in each partition in
    the specified order, and divides them into n
    buckets with qual numbers of tuples. For
    instance, we an sort employees by salary, and use
    ntile(3) to find which range (bottom third,
    middle third, or top third) each employee is in,
    and compute the total salary earned by employees
    in each range
  • select threetile, sum(salary)from ( select
    salary, ntile(3) over (order by
    (salary) as threetile from employee) as
    sgroup by threetile
  • SQL1999 permits the user to specify where they
    should occur by using nulls first or nulls last,
    for instance
  • select student-id, rank ( ) over (order by
    marks desc nulls last) as s-rankfrom
    student-marks

25
Aggregates on Windows
  • An example of window query is that, given sales
    values for each date, calculates for each date
    the average of the sales on that day, the
    previous day, and the next day such moving
    average queries are used to smooth out random
    variations.
  • In contrast to group by, the same tuple can exist
    in multiple windows. Suppose we are given a
    relation transaction(account-number, date-time,
    value),
  • select account-number, date-time, sum(value)
    over (partition by account-number order by
    date-time range unbounded preceding)
  • This query returns a new sum for each new
    tuplecumulative sum!
  • An actual window can also be specified e.g.
  • range 10 rows preceding
  • range 30 minutes preceding
Write a Comment
User Comments (0)
About PowerShow.com