Title: Formulating Accurate Queries
1Formulating Accurate Queries
- Theo Shlien, Systems Integrator
2Overview
- Know your business question
- Get details!
- Know your data
- What do you expect back?
- Table joins
- Common pitfalls
3Know Your Business Question
- Get a clear definition of the question
-
- Examples
- How many CDs does each person in our database
have? - What are the names/IDs of the people that have
CDs in our database?
4Know Your Business Question
- What kind of information you need depends on the
type of query - Two types of queries
- Detail listing
- Summary
5Types of Queries
- Detail Query
- Contains detailed information about records that
you are pulling back - Contains NO aggregation (or summaries)
- Summary Query
- Contains aggregation of some type (example
maximum, minimum, sum, average, count, etc.) - Easier to make mistakes on summaries
6General Business Question Limiting Results
(detail and summary)
- Limiting factors depend on the data/business
- Entire population or subset - Do we want
everyone? - - Just Teenagers?
- - Just People who are still alive?
- Time periods - Do we want all purchases
within the last year? - - Total purchase over their entire
life? - - How many they own?
- - Were some given as gifts?
7General Business Question (cont)
- Historical - Do we want people in the
program now? - - Or have ever been?
- Geography - Certain regions?
- - Purchased from specific stores?
- - Certain states?
8Limiting Results Define terms
- Defining terms of the business question is very
important - Consider terms that should be obvious
- Data storage for these obvious terms could be
different than expected - In our example, what constitutes a
person?Businesses could be included in person
data - What constitutes a CD?Are computer CDs
included? Music CDs? DVDs? Xbox games? Blank CDs?
9Detailed Business Question
- What information do you want displayed?
- List the tables and fields where the data is
present - Check to make sure the fields in question are not
codes, if descriptions are desired - If this will be used by someone else, consider
the formatting and order of the data - LIMIT YOUR DATA!
10Summary Business Question
- What information do you want displayed? VERY
IMPORTANT! - Which elements will be calculated?
- How will the elements be calculated? (Formula
used? Distinct used?) - Limiting is less important when aggregating
- Know your data
11Know Your Data
- Important for ALL queries
- Where is the information stored?
- What ELSE is stored in the same table?
- How is the data stored?
- How many records will come back?
- How are different tables related?
- Table Relationships
- Join types
- Join fields
12Know Your Data Table Relationships
- One-to-oneExample Each person (OK, there may be
exceptions to this) has one and only one
biological motherNote One-to-one relationships
will not result in duplication or missing data - One-to-manyExample Each person (in this room)
has more than one co-workerNote A one-to-many
relationship could cause a record to be listed
multiple times. - One-to-zeroExample Each person (in this room)
has no wings.Note A one-to-zero relationship
could result in the appearance of missing data
13Table Relationships Example
- How many CDs does each person have?
- Person Lists person information
- Name Lists names of people in Person table
- CDs Lists information about CDs
- Purchases Lists people and what CDs they
purchased
Person
Name
CD
Purchases
14Table Relationships (cont)
- Written query counts the number of purchases for
each person
Purchases
15Table Relationships (cont)
- When sorted by count, we notice no zero counts
show up, and some really HIGH counts DO - What did we do wrong?
Purchases
16Table Relationships (cont)
- Purchases table does not contain people who have
not purchased CDs - Must use Person table
- When we use the Person table, we LOST a record
- What did we do wrong?
Purchases
Person
17Table Relationships (cont)
- Person ID 138 is not in the Person tableNote
This is an example of what happens with a
one-to-zero relationship - Ignoring this for now, what if we want person
NAMES, instead of IDs?
Purchases
Person
18Table Relationships (cont)
- Why are two 20 counts showing up?
- Re-add Person ID as a displayed field
Purchases
Person
Name
19Table Relationships (cont)
- Wilma has changed her name at some pointNOTE
This is an example of a one-to-many relationship - How do we fix the problems weve encountered?
Purchases
Person
Name
20Table Relationships What We Learned
- When you need to use more than one table, start
with one - Get an initial count from your primary table to
get an expected estimate (In the example, we
probably should have gotten a count of how many
people we had, first) - Add one table at a time
- Look at each step. Does the result make sense?
- This works for detail AND summary queries
21Table Relationships Example Problems
- Do we want Blockbuster listed? How do we remove
it?KNOW YOUR DATA - How do we get rid of the extra person
listing?KNOW YOUR DATA - How could we get Person 138 back?KNOW YOUR DATA
- How do we find out who does not own any CDs?KNOW
YOUR DATA - Note If we wanted to answer questions about what
TYPE of CD we wanted included, wed need to use
the CDs table.
22Know Your Data Join types
- Inner Join Produces results from two tables only
if there is an exact match in BOTH tablesOur
example used Inner Joins exclusively - Left Outer Join Produces data for each record in
left table, even if there is no match in the
right table - Right Outer Join Produces data for each record
in right table, even if there is no match in the
left table
23Join Types Correcting Relationship Problems
- Use an outer join when you have one table
missing data, but you still want information
displayed - In our example, if using a left outer join,
which table will be on the left? (The tables
were working with are People and
Purchases)Business Question How many CDs does
each person in our database have?
Purchases
Person
24Join Types Correcting Relationship Problems
- Now we can see people with count zero
- How do we get rid of businesses?Answer This is
dependant on how the information is stored. Let
us assume, however, that there is a business
indicator flag available.
Purchases
Person
25Join Types Correcting Relationship Problems
- How do we get names?(Keep in mind the multiple
names per person) - Answer Again, data dependent. Let us assume that
there is a Name type flag available on the Name
table
Person
Purchases
26Join Types Correcting Relationship Problems
- Use the Name Type to get only the most valid name
- Note For validation purposes, we kept the Person
ID field in - What about the missing person 138?
Name
Person
Purchases
27Know Your Data Join Fields
- Join fields are as important as join type
- Example How many CDs has each person in our
database purchased in the last year?
Purchases
Person
28Know Your Data Join Fields
- Join fields are as important as join type
- Example How many CDs has each person in our
database purchased in the last year?Note We
still want to see zero counts. - Temptation is to keep the left outer join weve
done before, and add a where clause (or
qualification) for the purchase date.
Person
Purchases
29Join Fields (cont)
- No zeros purchases again
- Join happens FIRST
- Where clause eliminates records
- THEN aggregation occurs
- How do we fix this?
Purchases
Person
30Join Fields (cont)
- Put the date check INTO the left outer join (In
BI-Query, this would need a dynamic relationship) - Join happens first Since its a left outer join,
all people are kept, even if theres no purchase - Then the count is done on purchases
Person
Purchases
31Join Fields Part 2
- COMPLETE joins are important
- Example (incomplete join) No name type meant we
got too many names per person - Example (too specific) How many names does each
person have?Note Some people may not have any
names entered - If name type is added to THIS query, you well get
how many names of each type the person has - For a detailed query, extra/too specific a join
could eliminate rows that you want
32Common Pitfalls Summary
- Forgetting table/data relationshipsResult
Missing/extra information or wrong counts - Using the wrong join typeResult Missing/extra
information or wrong counts - Using the wrong join fieldsResult Missing/extra
information or wrong counts
33Common Pitfalls Additional
- Using extra fields in summary / grouped
queriesTemptation in getting extra information
for the summary - Unclear qualifiers / where clauseMultiple
conditions (due to using and in combination
with or) cause misinterpretation
34Common Pitfalls Extra Fields
- Not a problem in detail queries
- Example How many CDs were sold from each Label
(aka Record company)? - Expect to see Label/company and a countHowever,
an additional field (genre) is added for extra
information.
CD
Purchases
35Common Pitfalls Extra Fields
- While information is correct, it is not what we
wanted / were asked for - The fix is very simple, just remove the extra
column. To avoid this problem, have an idea of
what you want the results to look like BEFORE the
query is written
36Common Pitfalls Unclear Qualifiers
- Example I want to see a movie and go to dinner
or go shopping - What does this example mean?
- I want to see a movie and go to dinner, or I want
to go shoppingOR - I want to see a movie and either go to dinner or
go shopping - Same problem occurs with queries
37Common Pitfalls Unclear Qualifiers Solution
- Be specific. Group qualifiers explicitly
- I want to (see a movie and go to dinner) or I
want to go shoppingOR - I want to see a movie and either (go to dinner or
go shopping) - In SQL, just put parenthesis in to enforce order
of evaluation. In reporting tools, group
qualifiers together - Teradata makes assumptions about the order you
want to have them evaluated
38Writing Accurate Queries
- Know your business question
- Get details!
- Know your data
- What do you expect back?
- Table joins
- Common pitfalls
- Only have the fields you want / need
- Group qualifiers explicitly