Title: Taming the Wild Query
1Taming the Wild Query
- Workshop Session 439
- Sunday, March 16, High Noon
- HEUG 2003 Conference, Dallas
2 Presented By Steve Grantham Manager, Enrollment
Information Systems Terri Coolidge Functional
Lead, Student Records
3Goals of the Workshop
- To investigate a number of the commonly used
features of the Query tool in some detail,
highlighting some of the subtle points that you
need to understand to get the results you want,
and that can lead to confusion and/or erroneous
results if youre not careful. - To point out some of the differences between the
2-tier and Web versions of the Query tool in
version 8. - To provide you with some tips and techniques that
we hope will help you write more effective and
more accurate queries.
4Context and Caveats
- Most of our examples come from version 8, but the
(2-tier) tool in version 8 is almost identical to
the tool in 7.6 (we will highlight one major
difference in Chapter 4). - Emphasis will be more on the 2-tier version of
the tool, but contrasts with the Web version will
be discussed when appropriate. - The examples given here are from Student Records,
though all of the principles illustrated here
apply across all areas of PeopleSoft. Most of the
issues and examples discussed here arose in our
real world work at Boise State. - We assume you are familiar with the fundamentals
of how the 2-tier tool works and with basic SQL
syntax. We assume less familiarity with Web
tool, so in some cases more detail is given there.
52-Tier versus Web
- The 2-tier (Windows client) version has been
around a long time the Web version is, of
course, new in version 8. - Overall structure of the two versions is similar,
but a number of details differ. - The Web tool is inherently much slower to use to
build Queries because of no code on the
clientalmost every mouse click entails network
traffic to refresh the page. One reasonable
approach is to build Queries in 2-tier but run
them on the Web.
6Chapter 1 Auto-Join
- There was no auto-join feature back in version
7.0 hence the danger of Cartesian explosions
was considerably greater. - Auto-join feature was introduced in 7.6, and
remains almost identical in 2-tier tool in 8 the
Auto-join feature in the Web tool in 8 has
significant differences from the 2-tier version. - On the whole, the auto-join feature is useful,
but you still have to think! Examples given here
will illustrate some of the possible pitfalls.
7A Simple Example Joining STDNT_CAR_TERM with
PERSONAL_DATA
- Goal List the names of all students who are Term
Activated in the UGRD Career at BSU for Spring
2003 (and have not requested privacy under
FERPA). - We will need to use STDNT_CAR_TERM to find the
Term Activated students, together with
PERSONAL_DATA to pull in Name and to check FERPA
flag. - This first example is quite simple we will use
it to illustrate look-and-feel differences
between 2-tier and Web versions as well as to
introduce Auto-Join.
82-tier Version
Start by pulling in STDNT_CAR_TERM as table A
and adding criteria on Career, Institution, Term
9Go back to Database tab, pull in PERSONAL_DATA as
Table B. Auto-join detects and displays a
potential join condition on EMPLID. Note that in
this case, EMPLID happens to be a key field in
both tables.
Clicking OK adds that join condition to the
Criteria tab
10- Next
- Add the additional criterion B.FERPA N
manually, - Double-click NAME to add it to Fields tab,
- Click order-by column in NAME row of Fields tab
to order the results alphabetically by Name.
11View the SQL
12Run the Query and view the Results
13Web Version
Start by Searching for STDNT_CAR_TERM table
14Clicking Add Record takes us to
Note the Funnel icon, which can be used to add a
criterion that will have this field as its left
hand side.
15Clicking on the funnel icon next to ACAD_CAREER
takes us to this page, where we can type UGRD
into Constant field in Expression 2.
Note that we could use magnifying glass to look
up valid values, but that would slow things due
to extra network traffic required to display
lookup page.
16Viewing the criterion just added on the Criteria
tab
Another way to add a criterion is to use Add
Criteria button.
17This alternative requires extra steps to specify
the left hand side of the Criterion we must use
the magnifying glass
18Clicking on the magnifying glass in the
Expression 1 box brings up a list of all
available fields
19Clicking on the third field, A.INSTITUTION,
brings us back to the preceding page, where we
can now type in the constant value IDBSU in the
Expression 2 box
20After adding these two criteria plus the one on
STRM, Criteria tab looks like
21Now return to the Records tab to search for the
PERSONAL_DATA record and then click Join Record
to add it to the Query
22When we click the Join Record link, we are taken
to a page that looks like
23Clicking the Add Criteria button takes us again
to the Query tab, where we can check the box next
to NAME in order to select that field.
24Join on EMPLID was added automatically now add
FERPA criterion.
25View the SQL
26If you try to run the Query without saving
This restriction is not present in 2-tier
version, and may be disappearing in future Web
versions???
27so save it!
28To add an Order-By clause, must go to Fields tab
and click Edit button for NAME field.
29Enter 1 into the Order By Number field
30Resave Query and click Run results pop up in
new browser window
31A More Interesting Example Joining CLASS_TBL
with STDNT_ENRL
- In our first example, the field EMPLID is a key
field on both tables and is the only field that
they share. - We now want to look at how auto-join fields are
selected in more complex situations. - CLASS_TBL stores data about Classes and
STDNT_ENRL stores data about enrollment in those
classes, so joining those tables is very natural
and important in the SR world. - The issues discussed in this example can arise in
many other situations as well.
32Joining CLASS_TBL with STDNT_ENRL Using the
2-Tier Tool
Begin by pulling in CLASS_TBL as table A, then go
back to the Database tab and pull in STDNT_ENRL
as table B. The following Auto-join window pops
up
33How are the Auto-Join fields chosenand is the
choice exactly what we want?
- The join on STRM certainly makes sense.
- The join on SESSION_CODE is OK though actually
unnecessary. - The critical piece that is missing is a join on
CLASS_NBR the combination of STRM and CLASS_NBR
uniquely identifies a Class. - To understand how the choice was made, we need to
look at the structure of each table, and
especially at the key fields.
34Structures of the two tables
35Observations on these structures
- The only field that is a key field on both tables
in this case is STRM. - SESSION_CODE is a key field on CLASS_TBL, a
non-key field on STDNT_ENRL. - The remaining three key fields on CLASS_TBL
(CRSE_ID, CRSE_OFFER_NBR, and CLASS_SECTION) do
not appear at all on STDNT_ENRL.
36A Deduction (or an Induction?)
- We conclude (from this and other examples) that
Auto-Join adds a join condition for each key
field on table A that also appears on table B
(regardless of whether it is a key field on B). - In this case, Auto-Join generates too few join
conditions, which will result in too many rows
being selected. That is, if you do not add a
join criterion on CLASS_NBR, then you will get
one row for each combination of a class scheduled
in a given term and a student enrolled in any
class in that same term!
37What if we start with STDNT_ENRL as table A (so
CLASS_TBL becomes B)?
38What about these criteria?
- All five key fields of STDNT_ENRL appear on
CLASS_TBL, so all Auto-Join adds criteria for all
five of them. - As noted, the two joins we really want are those
on STRM and CLASS_NBR. - The other three join conditions are not just
unnecessarythey are harmful! If we accept them,
then our query will omit some rows that we really
want to include. Lets see why
39The join on EMPLID
- Dont we always want to join on EMPLID if we can?
Well - EMPLID on STDNT_ENRL is the students EMPLID.
- EMPLID on CLASS_TBL cant possibly mean the same!
- Use Application Designer to find that EMPLID on
CLASS_TBL is labeled Course Administrator on
the Basic Data Panel/Page under Schedule of
Classes. - At many institutions, Course Administrator is not
populated at all when it is populated it denotes
a faculty member, not a student. - If the join on EMPLID is left in the query, there
will be no rows selectedunless a student is
enrolled in a class for which he/she is also the
Course Administrator!
40The joins on ACAD_CAREER and INSTITUTION
- This join on ACAD_CAREER says that the Career in
which the student is enrolled must match the
Career in which the Class is offered. - This will often be true, but at many institutions
cross-Career enrollment is quite possible,
perhaps even common - If this criterion is left in, then those
cross-Career enrollments will not be included. - Whether this is a good or a bad thing depends on
your purposes, but the important thing is to
realize that this is what the criterion does, and
make an informed decision as to whether you want
it! - The join on INSTITUTION has similar implications,
for multi-institutional environments.
41The Moral of the Story
- Auto-Join is useful it can save you some work
and can help prevent the creation of Cartesian
products, but - You should never blindly accept the join
conditions generated by Auto-Join they may be
too lax or too restrictive. - To make an intelligent choice, you need to
understand the structure of the tables you are
using, and the MEANING of the data!
42Joining CLASS_TBL with STDNT_ENRL Using the Web
Tool
Lets again start with CLASS_TBL as table A and
STDNT_ENRL as table B. When we pull in table B,
the page shown here pops up.
43How does this behavior compare with that of the
2-Tier version?
- In the Web version, Auto-Join picks up all cases
in which a key field on Table A matches any field
on Table B (the first two criteria), and then
conversely, all cases in which a key field on
Table B matches any field on Table A. - Recall that STRM is a Key in both tables, but the
fact that it is a key in Table A takes precedence
in determining how the criterion is displayed. - In other words, the Auto-Join feature in the Web
version takes a symmetric approach, whereas that
in the 2-tier version does not. This symmetric
approach reduces the danger of Cartesian products
even more. - We still need to adjust the criteria to what we
really want!
44Auto-Join With More Than Two Tables
- In the 2-tier version, if we already have two or
more tables in a query and want to pull in
another one, then Auto-Join must decide which of
the tables already present to compare to the new
table. We need to understand how it decides
this. - To illustrate this situation, lets suppose that
in addition to joining STDNT_ENRL to CLASS_TBL,
we want to pull in students names from
PERSONAL_DATA. - Suppose that we decide to build this Query by
pulling in first STDNT_ENRL, then PERSONAL_DATA,
and finally CLASS_TBL.
45When we pull in PERSONAL_DATA as the table B,
with STDNT_ENRL as table A, the Auto-Join feature
gives
Theres nothing problematic about this we want
the emplid from PERSONAL_DATA to be that of the
student. We click OK and get
46Note that the focus is on the newly added table
B. Now we go back to the Database tab and
double-click on CLASS_TBL to pull it in as table
C. The following Auto-Join panel pops up
47Is this the join condition we want?
48What does B.EMPLID C.EMPLID mean?
- It says that the EMPLID on PERSONAL_DATA (B)
should match that on CLASS_TBL (C) but as noted
earlier, thats the Course Administrator for the
Class! - Our previous Auto-Join condition said that EMPLID
on PERSONAL_DATA must match that on STDNT_ENRL. - By transitivity, this again gives us the unwanted
criterion A.EMPLID C.EMPLID, which says the
student must also be the Course Administrator of
the class. - We are also missing the important criteria that
we want the ones joining STDNT_ENRL to CLASS_TBL
on STRM and CLASS_NBR. - What happened?
49A Problem of Focus!
- When we went back to the Database tab to pull in
CLASS_TBL, the focus in the Query tab was on
Table B, PERSONAL_DATA, rather than on Table A,
STDNT_ENRLand hence the Auto-Join feature looked
at Tables B and C rather than A and C. - If we had noticed this and taken care first to
change the focus to Table A (by clicking on
either the name of the table or on any field
within it in the Query tab) then we would have
gotten the following instead
50These criteria still need modification, as
discussed in our previous example, but they are
closer to what we want.
51The Moral of The Story
- In the 2-tier version, you need to be aware not
only of the order in which you pull in tables,
but of which existing table the Query tool will
try to join to the new table, based on the
current focus. - The Web-based version is better about alerting
you to these issues, though of course you still
need to make the right choice. If we run through
the same steps in that version, pulling in first
STDNT_ENRL, then PERSONAL_DATA and finally
CLASS_TBL, then when we pull in CLASS_TBL and hit
the Join Record link, we get
52so we are required to make an explicit choice
about which existing table we want to join with
the one we are adding. We still have to make the
right choice, but at least we are alerted to the
need for one! So clicking the ASTDNT_ENRL link
gives
53And we uncheck the unwanted join conditions in
the page above to get
54Chapter 2 Record Hierarchy Joins
- Record Hierarchy provides an alternative to the
standard Auto-Join mechanism. - Comes in two flavors Basic Record Hierarchy
and Related Record Hierarchy. - Applies only to cases where the relevant Record
Hierarchies (and Prompt Tables, for Related
Record Hierarchies) have been set up by the
developers in PeopleTools.
55How are Record Hierarchies Defined?
In the 2-tier version a Record Hierarchy icon
appears right after the table name and before the
list of fields. For example, looking at the
STDNT_CAR_TERM table
56Clicking on the sign expands the hierarchy,
revealing that the record PERSONAL_DATA lies at
the top of this particular hierarchy. Continuing
to click on signs until everything is expanded
completely, we eventually see the entire
hierarchy. Note that STDNT_CAR_TERM itself lies
somewhere in the middle of this hierarchy.
57Parent/Child Relationships
- Record A is a parent of a record B if the key
fields of record B include all the key fields of
record Ain the same orderplus at least one more
field. - The key fields of STDNT_CAR_TERM are EMPLID,
ACAD_CAREER, INSTITUTION and STRM, in that order.
The key fields of its child record STDNT_SESSION
are all of those fields, plus the field
SESSION_CODE. - Likewise, the keys of its parent record
STDNT_CAREER are just EMPLID and ACAD_CAREER the
first two keys of STDNT_CAR_TERM.
58Formal Recognition of Parents
For a parent/child relationship to be reflected
in the Query tool, the Parent Record field must
be explicitly populated when the child Record is
defined in PeopleTools. For example, if you look
in Application Designer at the Properties of the
Record STDNT_SESSION, you will see
59Gaps in the Genealogy
- In some cases PeopleSoft developers seem to have
skipped generations when assigning Parent
Records. - For example, the record STDNT_ENRL is also a
child of STDNT_CAR_TERM, since its keys are
EMPLID, ACAD_CAREER, INSTITUTION, STRM, and
CLASS_NBR and this parent-child relationship is
functionally important. - However, STDNT_ENRL has been assigned
STDNT_CAREER as its Parent Record, rather than
STDNT_CAR_TERM, in Application Designer. - Hence the Record Hierarchy for STDNT_ENRL looks
as follows
60Why this choice was made is a mystery. But you
need to be aware that the Record Hierarchy is
often incomplete there may be other records that
really belong but have been omitted because of
setup decisions like this one.
61Creating a Record Hierarchy Join
Suppose we start with STDNT_CAR_TERM, expand the
Record Hierarchy as shown above, highlight the
record STDNT_SESSION in the Hierarchy and
double-click on it. It will appear as table B in
the Query
Joined with A indicates that table was added
via Record Hierarchy join.
62Criteria Tab is still blank
but SQL tab reflects the join criteria due to
Record Hierarchy join.
63Now select fields and add further criteria
- Lets find all cases in which a student who is
enrolled in greater than zero Progress Units for
Term 1029 has a Session row for that Term for
which the Progress Units are zero. This means
that that Session row is essentially
superfluous. - Select the fields EMPLID, ACAD_CAREER, and
UNT_TAKEN_PRGRSS from table A (STDNT_CAR_TERM)
and the field SESSION_CODE from table B
(STDNT_SESSION), - Add the following Criteria in the Criteria tab
64(No Transcript)
65Results of the Query
66Be Careful With the Grandchildren!
- When dealing with levels that are not immediately
adjacent in a multi-level Record Hierarchy, there
is a subtle point that can trip you up if youre
not careful. - Each table in a Query has its own associated
Record Hierarchy the Record Hierarchy of a child
table is a subtree of that of the parent (hence
usually smaller) but is often very similar. - In such a situation, it sometimes makes a
difference which version of the Record Hierarchy
you use to create the joins.
67Record Hierarchy for STDNT_SESSION is almost
identical to that for STDNT_CAR_TERM only
difference is the omission of ACAD_STDNG_ACTN
68Adding STDNT_ENRL_APPT to the Query
Does it matter which copy of the Record Hierarchy
we use to pull in STDNT_ENRL_APPT? First use the
Record Hierarchy under table B (STDNT_SESSION)
69Now pull in the fields A.EMPLID, A.ACAD_CAREER,
A.STRM, B.SESSION_CODE, and C.APPOINTMENT_NBR,
and add the Criterion A.EMPLID 111111037.
70Now run the Query
71Query results using Hierarchy Join with B
Student has one Session with an Appointment for
each Term. This reflects Boise State business
practice of making one Session each Term the
Appointment Control Session for all other
Sessions.
72What if we use the Hierarchy under A?
This time, again starting with STDNT_CAR_TERM as
A and STDNT_SESSION as B, pull in STDNT_ENRL_APPT
as C using the Record Hierarchy under A instead
of B.
73Pull in same fields and add same additional
criteria as before, then rerun the query. Note
multiple rows per term this time.
74So What?
- If you did not understand exactly what was
happening, it would be easy to assume that this
means that she has multiple Appointments per
Term. - In this particular example, this potential
confusion may not be a matter of immense concern.
- In other, similar cases the consequences of this
subtle error may be more dramatic. This could be
particularly true when aggregate functions are
involved. - When using Record Hierarchy Joins, you should
always use the version of the Hierarchy that
appears below the immediate parent of the Record
you are about to pull innot the one below the
grandparent or even more senior ancestor.
75Joining Upwards in the Record Hierarchy
The Query tool is inconsistent about whether it
will let you use the Record Hierarchy Join
feature to create a join to a record that lives
higher in the hierarchy rather than lower. For
example, if you start with the record
STDNT_CAR_TERM and try to join to its parent
STDNT_CAREER, you will get the error message
Note You can get around this limitation by doing
a regular join.
76However, if you try to join to its grandparent,
PERSONAL_DATA, the join works fine
77Related Record Hierarchy Joins
Notice that, in addition to the Record Hierarchy
itself, a number of the fields on most records
also have signs next to them when you open the
record in the Query tool. Suppose we completely
expand the Related Record Hierarchy for the field
STRM.
78Related Record Hierarchy for field STRM
79How is a Related Record Hierarchy defined?
Look at definition of STDNT_CAR_TERM in App
Designer
A Prompt Table defines the set of allowable
values for the corresponding field, and the
Related Record Hierarchy is just the Record
Hierarchy for that Prompt Table.
80Now double-click on the TERM_TBL icon in the
Related Record Hierarchy for the STRM field to
bring TERM_TBL into the Query, joined on STRM.
This will allow us, for example, to select the
Term Begin and End Dates for the Terms from which
we are selecting STDNT_CAR_TERM rows
81Sometimes the Records dont Relate well
- The example just given worked just fine. But
sometimes the Prompt tables and the Record
Hierarchies dont cooperate as well as you would
like. - For example, consider the Related Record
Hierarchy for the ACAD_CAREER field. The Prompt
Table for this field is STDNT_CAREER, which has
PERSONAL_DATA as its parent record (see next
slide). - Note the contrast with the case of STRM in that
case the Prompt Table TERM_TBL lives at the top
of its hierarchy (i.e., it has no parent
record). - This difference causes the following problems.
82If we double-click on the record at the top of
the Related Record Hierarchy, PERSONAL_DATA, we
get
83It works, but the join condition generated is
totally bizarre, and no rows will be selected
unless you have some strange EMPLIDs and/or
strange Careers in your system.
84So perhaps we should have double-clicked on
STDNT_CAREER rather than PERSONAL_DATA, since
STDNT_CAREER is the Prompt Table for the
ACAD_CAREER field. But when we try that, we get
85But despite that error message, we find that the
PERSONAL_DATA record (not ACAD_CAREER) has been
pulled in anyway, giving us exactly the same
nonsensical join as before
86Record Hierarchy Joins in the Web Version
To create a Record Hierarchy join in the Web tool
you click the Hierarchy Join link to bring up a
new page (as opposed to expanding the Hierarchy
on the Query tab itself).
87Clicking one of the record names creates the join
as before, returning you to the main Query page.
88In contrast with the 2-tier version, it appears
that it is always possible to join upwards as
well as downwards for example, we can click on
the parent table STDNT_CAREER and we now get
89View SQL tab verifies that this upwards Record
Hierarchy join did indeed work as expected.
(As in 2-tier, these criteria do not appear on
Criteria tab.)
90Grandchildren on the Web
- The same caveats about multi-level Record
Hierarchies discussed for 2-tier apply to the Web
as well you should always use the version of the
Hierarchy associated with the immediate parent of
the Record you are about to pull in. - The manual contains details of an example that
is slightly different from the Term/Session/Appoin
tment example given earlier, using the join
upwards to STDNT_CAREER just shown. We omit the
details here.
91Related Record Hierarchies on the Web
- The Web version works considerably better than
2-tier version in this area. - The Web version does not actually give you the
option of displaying the entire Related Record
Hierarchy for a given field. - Instead it just contains links that allow you to
join a given field only with its Prompt Tablenot
with a table that lives above or below that
Prompt Table in the Prompt Tables Record
Hierarchy.
92Revisiting the Related Record Hierarchy for the
ACAD_CAREER field
93Clicking the indicated link successfully joins
in STDNT_CAREER
94and checking the SQL shows that the correct join
conditions were used this time!
95Chapter 3 Subqueries, (and Views, and DISTINCT)
- Subqueries always appear within Criteria, and
they can be used in three main ways. - To select a single value that will be compared to
a field value from the parent query using , not
equals, lt, lt, gt, or gt - To select a list of values that will be compared
to a field value from the parent query using in
list or not in list. - In conjunction with the exists or not exists
operators to determine whether or not a row
exists meeting certain criteria. - We will also discuss Views and the use of the
DISTINCT keyword in this chapter.
96CLASS_TBL_SE_VW An Example of a View
- A view is a virtual table whose data is not
actually stored separately in the database
rather, it is defined by a SQL statement
involving one or more real tables. - When the view involves only a single real
table, its purpose is usually to select only
certain rows and/or columns that are needed from
that table for a particular purpose. - When the view involves more than one real
table, its purpose is usually to make things
convenient by packaging a commonly used join of
several tables so it can be used as a single
entity.
97The Definition of CLASS_TBL_SE_VW in Application
Designer
98Note the Click to open SQL Editor button,which
is present only for Views. Click on it to see
the SQL that defines the view.
99(No Transcript)
100Subquery Example 1 Term GPA During First Term
of MATH Enrollment.
- Want to look at the Term GPA (CUR_GPA) that
students earn during the first term in which they
enroll in a Mathematics (MATH) class. - The CUR_GPA comes from STDNT_CAR_TERM.
- Will use CLASS_TBL_SE_VW to get Class and
Enrollment data. - The critical question is how to express the
criterion that we only want to look at the first
term in which a given student enrolls in a
Mathematics class.
101A Natural but incorrect approach
Try joining STDNT_CAR_TERM with CLASS_TBL_SE_VW
and adding a criterion saying that the SUBJECT
field from CLASS_TBL_SE_VW MATH, and ordering
by EMPLID.
102In its current form, this Query will give us the
CUR_GPA for every Term in which a student has
taken a MATH course, not just the first such
Term. If the student has more than one MATH class
for the Term, then we will get multiple rows for
the Term. So lets add STRM to the order-by
clause and run the query.
103We can see that Term 0310 is the first one in
which AV0002 took any MATH classesand he took
two of them in that Term, then two more in 0330,
two in 0350, one in 0370, two in 0390, and one
each in 0410 and 0430.
104To select only the first Term with a MATH course,
its tempting simply to add the Aggregate
function MIN to the A.STRM Field (by
double-clicking the Aggregate column in the
Fields tab).
105This gives the following SQL note that the Query
tool now seems to think the name of the field is
MIN(A.STRM) instead of just plain A.STRM!
And when we try to run it, we get this error,
because MIN(A.STRM) B.STRM makes no sense.
106Suppose we try pulling in STRM from table B and
putting the MIN aggregate function on that field
instead of A.STRM.
107This version will at least run, but what does the
data mean?
108What happened in this version?
- The results look the same as the original query,
except we have two Term columns, with different
labels but the same values. - In contrast to the previous attempt the MIN
function did not get propagated down to the
Criterion A.STRM B.STRM. This is simply a
quirk of the way the Query tool works it is due
to the fact that B.STRM was on the right-hand
side of the criterion. - Need to analyze the SQL to understand what
happened.
109 SELECT A.EMPLID, A.ACAD_CAREER, A.STRM,
A.CUR_GPA, MIN(B.STRM) FROM PS_STDNT_CAR_TERM
A, PS_CLASS_TBL_SE_VW B WHERE A.EMPLID
B.EMPLID AND A.ACAD_CAREER B.ACAD_CAREER
AND A.INSTITUTION B.INSTITUTION AND
A.STRM B.STRM AND B.SUBJECT 'MATH'
GROUP BY A.EMPLID, A.ACAD_CAREER, A.STRM,
A.CUR_GPA ORDER BY 1, 3
For any given combination of Emplid, Career, Term
and Current GPA from the STDNT_CAR_TERM table, we
choose the minimum term from CLASS_TBL_SE_VW for
which the Subject field is MATH and the Term on
CLASS_TBL_SE_VW is the same as that on
STDNT_CAR_TERM. In other words, we still get
all Terms in which the student took a MATH class.
110So what if we simply remove the criterion A.STRM
B.STRM from the Query? Doing that and
rerunning, we get
Now we actually have succeeded in selecting the
Minimum Term in which the student has MATH
enrollment when we select MIN(B.STRM), it is no
longer the minimum among those Terms that agree
with A.STRM, but
rather the minimum Term value over all Terms for
the given Emplid, Career and Institution for
which the Subject is MATH. However, we are
still seeing the GPA values for all Terms for
that Emplid, Career and Institutionincluding
Terms in which no MATH class was taken, such as
Term 0290 for student AV0002.
111The correct approach use a subquery!
- We dont really want to join STDNT_ENRL to
CLASS_TBL_SE_VW within the main Query. Instead,
we want to use a Subquery involving
CLASS_TBL_SE_VW to find the correct Term to look
at, and then just pull the GPA value from the
STDNT_CAR_TERM row for that Term. - Begin as before by pulling in EMPLID,
ACAD_CAREER, STRM, and CUR_GPA from
STDNT_CAR_TERM, but now add a new Criterion of
the form A.STRM ltltSubquerygtgt by dragging STRM
over into the Criterion tab, right-clicking in
the Expression 2 field, and choosing Subquery
from the pop-up menu, and then choosing
CLASS_TBL_SE_VW from the Database tab.
112(No Transcript)
113Pull in STRM field from table B and choose MIN
using Aggregate pop-up menu.
114Add criteria joining B to A on EMPLID,
ACAD_CAREER, and INSTITUTION, plus B.SUBJECT
MATH
115Can run the query without returning to the parent
(or we could return if we wanted). The results
now finally look like what we want!
116Subquery Example 2 Term GPA During All Terms of
MATH Enrollment
- This example will actually be simpler than the
first one, but it will illustrate the use of the
EXISTS operator, which is used only with
Subqueries. - The first Query we built earlier almost solves
the problem the only glitch was returning
multiple rows when the student has multiple MATH
classes in a given Term. - Can eliminate this duplication of rows by using
the DISTINCT keyword.
117Double-click Mr.Hand to bring up Properties
pop-up, and check the DISTINCT checkbox.
118Now rerun to get
119Note that we now get just one row per term, as
desired.
120So whats wrong with using DISTINCT?
- It works OK, but in some cases it can be rather
inefficient from a performance standpoint. In
order to process the DISTINCT keyword, must first
sort all of the data using all fields that are
selected, then compare adjacent rows to eliminate
any duplicates. If the data set is large, this
sorting and culling can take an appreciable
amount of time. - Since the comparison to determine duplicates is
made on all fields selected, you must be careful
not to include any extraneous fields that you do
not want to be used to cause otherwise identical
rows to be considered distinct. - Using a subquery often provides a good
alternative to using a join together with
DISTINCT.
121If we modify our previous example by simply
removing the MIN from the subquery
we get
122The problem is that without the MIN function, the
Subquery selects multiple rows (multiple STRMs)
for some students, and you cannot have a single
value of A.STRM equal to a list of such values.
We can fix this by going back to the main Query
and changing the operator from Equals to In
List
123Running this version gives the same (correct!)
results as our previous version using a join plus
DISTINCT
124An alternative approach using EXISTS
Delete the Criterion A.STRM ltltSubquerygtgt, then
click the funnel icon with the sign to add a new
Criterion, and choose exists from the drop-down
125Only choice for Expression 2 is Subqery, so
double-click that choice and pull in
CLASS_TBL_SE_VW again.
126Leave Fields tab blank, but add the usual
criteria, including a match on STRM with the
parent table A.
127Notice that the Query tool has placed the
constant string value X right after the SELECT
keyword, because we left the Fields tab blank.
Now run the Query to get
128These are exactly the same results as we obtained
in our previous versions.
129How does EXISTS work?
- Given values for Emplid, Career, Institution and
Term that are passed down from STDNT_CAR_TERM
in the main Query, the subquery looks for a row
of CLASS_TBL_SE_VW that matches on all these
fields and has a Subject of MATH. - If we find such a row, we dont care what field
we select, only that such a row exists, so the
simplest thing is to select the placeholder
value X (but we can select an actual field if
we wish). - If the subquery returns a row, then the row
passed down from the main query is selected
otherwise it is not.
130Subquery Example 3 Term GPA During Terms
Without MATH Enrollment
Just change Exists to Does Not Exist
131Note that there are many more rows now 6180
rather than 1723, reflecting the fact that it is
more common not to take a MATH class than it is
to take one! Note also that only one row for
AV0002 appears in this Query, for Term 0290,
since this is the only Term in which he did not
take a MATH class.
132Alternatives to this NOT EXISTS Subquery?
- We could also obtain the same result by going
back and changing the query that used A.STRM IN
ltltSubquerygtgt to A.STRM NOT IN ltltSubquerygtgt ,
though this would be less efficient. See manual
for more details. - By contrast, there is no way of modifying the
first of the three approaches used in Example
2the one that used a join in the main Query
together with the DISTINCT keywordto solve
Example 3.
133Subqueries and DISTINCT on the Web
Differences in this area are largely cosmetic.
To add an EXISTS subquery, click Add Criteria
button
change Condition Type to exists
and click the Define/Edit Subquery link
134Now choose CLASS_TBL_SE_VW from the Records tab,
which takes you to Query tab
Note that in a subquery we have Select links
instead of checkboxes next to the fields,
reflecting the fact that we can select at most
one field in a subquery.
135Click on the Criteria tab as usual
Note that we can also click on the funnel icon
with the sign next to any field on the Query
tab to create a Criterion that has that field as
the left hand side.
136After adding all our criteria
137we can view the SQL
Note the Subquery/Union Navigation link (visible
on each page), which takes us to
138(No Transcript)
139Saving and running the query yields the same
results as in the 2-tier version
140DISTINCT on the Web
Properties link appears on each page of the
Query tool (no more Mr. Hand) and takes you to
this page, where you can check Distinct as
before.
141Chapter 4 Effective Dates in Queries
- Henry Stewart notwithstanding, Effective Dating
in PeopleSoft is not just dinner, a movie and a
bottle of wine. - Effective Dating allows a history of changes to
be kept. - Effective Date (EFFDT) is always a key field.
- Often we are interested in the most current
Effective Dated row of a table. - In some situations, multiple rows with the same
effective date are allowed, in which case the
additional key field of Effective Sequence is
also included. - PeopleSoft includes built-in Effective Date
operations that are really composite operators
that involve subqueries.
142A Basic Example ACAD_PROG
STDNT_CAR_NBR is a misleading name. The
combination of EMPLID, ACAD_CAREER and
STDNT_CAR_NBR defines a stack each stack can
have multiple EFFDT/EFFSEQ combinations.
143The Effective Date Options panel
In the 2-tier tool this panel pops up when you
pull in the first Effective Dated record into a
Query. The options shown are the defaults, but
they can be changed. For our first example we
will choose No Effective Date Option.
144Lets select all five key fields plus some
others, ordering by ACAD_CAREER, STDNT_CAR_NBR,
EFFDT, EFFSEQ.
145Add a criterion to restrict to a single student
146Running the query gives the complete history of
this students Program changes. Note the
multiple Careers, multiple stacks within UGRD
Career, and multiple EFFSEQ for some EFFDTs
147What if we had accepted the default Effective
Date option?
Now we get only one row for each stack.
148The SQL behind the EFFDT operator
149Analyzing the EFFDT Subquery
AND A.EFFDT (SELECT MAX(A_ED.EFFDT) FROM
PS_ACAD_PROG A_ED WHERE A.EMPLID
A_ED.EMPLID AND A.ACAD_CAREER
A_ED.ACAD_CAREER AND A.STDNT_CAR_NBR
A_ED.STDNT_CAR_NBR AND A_ED.EFFDT lt
SYSDATE)
A row from copy A of ACAD_PROG is passed down
to the subquery. Its values for A.EMPLID,
A.ACAD_CAREER, A.STDNT_CAR_NBR uniquely
determine a stack. The subquery then looks at
all rows of copy A_ED of ACAD_PROG for that stack
for which the effective date is less than or
equal to the System Date. Among those rows, it
chooses the one with the Maximum Effective Date.
If the row passed down from copy A happens to
have that effective date, it is selected if not,
it is skipped over.
150Analyzing the EFFSEQ Subquery
A.EFFSEQ (SELECT MAX(A_ES.EFFSEQ) FROM
PS_ACAD_PROG A_ES WHERE A.EMPLID
A_ES.EMPLID AND A.ACAD_CAREER
A_ES.ACAD_CAREER AND A.STDNT_CAR_NBR
A_ES.STDNT_CAR_NBR AND A.EFFDT
A_ES.EFFDT)
We will get a match in this subquery only for
those rows from A where the EFFDT has already met
the criteria of the first subqueryand if there
happens to be more than one row with that
effective date, then we will choose from among
those rows the one with the greatest effective
sequence.
151Notes on the other EFFDT operators
- The other built in Effective Date operators work
in essentially the same way. - For the operator Eff Date lt, we simply replace lt
by lt in the first subquery. - For Eff Date gt and Eff Date gt, we replace lt by
gt or gt, and MAX by MIN - For Last Eff Date, we keep the MAX but remove
A_ED.EFFDT lt SYSDATE for First Eff Date, we
keep the MIN but remove the criterion A_ED.EFFDT
gt SYSDATE - EffSeq Last, tends to make the most sense when
using Eff Date lt , Eff Date lt, or Last Eff Date,
whileEffSeq First tends to make the most sense
when using Eff Date gt , Eff Date gt, or First Eff
Date.
152Effective Dated tables on the Web
When we click Add Record for an effective dated
table in the Web Query tool, we get
153Even though you are not given an initial choice,
you can go to the Criteria tab and either delete
or edit the Effective Date criterion, if you so
choose. Editing it gives you all the usual
options
154The SQL generated by the Web version is the same,
of course
155A Subtler Example Finding the Most Current
Active Row in a Stack
- Suppose we want to find the most recent row in
each stack that has a Program status of Active
(AC). In our example, for the GRAD stack we would
want the row with effective date 05/21/2001 and
effective sequence 1 (instead of 2) for the UGRD
stack with STDNT_CAR_NBR 0 we would want the row
with effective date 01/01/1932 (instead of the
one with date 05/23/1999), and so on. - The most obvious thing to try would be to use the
same built-in Eff Date lt SYSDATE operator as
before and add another criterion A.PROG_STATUS
AC.
156But this query yields only one row total, rather
than one row for each stack!
157The problem is that the criterion A.PROG_STATUS
AC lies outside of the two subqueries that make
up the Eff Date lt SYSDATE and EffSeq Last
criterion, so it is imposed in addition to those
criteria. Of the six stacks, only the UGRD stack
with STDNT_CAR_NBR 4 has its most current row in
Active status, so that is the only row that is
returned.
158Conceptually what we need to do is simple we
just want to move the PROG_STATUS AC criteria
inside the subqueries so that they look like AND
A.EFFDT (SELECT MAX(A_ED.EFFDT) FROM
PS_ACAD_PROG A_ED WHERE A.EMPLID
A_ED.EMPLID AND A.ACAD_CAREER
A_ED.ACAD_CAREER AND A.STDNT_CAR_NBR
A_ED.STDNT_CAR_NBR AND A_ED.PROG_STATUS
AC AND A_ED.EFFDT lt SYSDATE) AND
A.EFFSEQ (SELECT MAX(A_ES.EFFSEQ) FROM
PS_ACAD_PROG A_ES WHERE A.EMPLID
A_ES.EMPLID AND A.ACAD_CAREER
A_ES.ACAD_CAREER AND A.STDNT_CAR_NBR
A_ES.STDNT_CAR_NBR AND A.EFFDT
A_ES.EFFDT AND A_ES.PROG_STATUS AC)
159Instead of using the built-in effective date
operations, we must build our own subqueries from
scratch. Start by deleting A.PROG_STATUS AC
criterion and the effective date criterion. Then
add new criterion of the form EFFDT Subquery,
using another copy of the table ACAD_PROG and
choosing No Effective Date Option.
160Now select B.EFFDT in Fields tab, and apply the
aggregate function MAX to it
Now go to the Criteria tab and start adding
criteria
161No problem so far
162But when we try to add the criterion B.EFFDT lt
SYSDATE, we get
The problem is that the Query tool apparently now
thinks that the field is MAX(B.EFFDT) rather
than just plain B.EFFDT so it interprets the
criterion we are trying to add as if it were
MAX(B.EFFDT) lt SYSDATE (which makes no sense),
rather than simply B.EFFDT lt SYSDATE, which does
make sense
163Expressions to the Rescue!
As a workaround, turn the aggregate function MAX
off for EFFDT, restoring Aggregate setting to
None, then add the criterion B.EFFDT lt SYSDATE
(choosing Expression from the drop-down under
Expression 2 and just typing in SYSDATE).
164Now the slightly tricky part delete B.EFFDT from
the Fields tab, and scroll down to the
Expressions icon at the bottom of the Query tab
Right-click on Expressions and choose New
Expression from the pop-up menu that appears.
165Choose Date from the Expression Type drop-down
and type in the expression MAX(B.EFFDT) in the
text area
Now click OK, and
166Your newly created expression appears at the
bottom of the Query tab, where it can now be
selected just as if it were a regular field, by
double-clicking or dragging and dropping.
167The SQL now looks as we want it to
Were halfway there we now need to go through a
similar process to construct the EFFSEQ subquery.
But first notice that if we run the query in its
current form
168we get almost what we want. The only glitch is
that for the GRAD Career, we picked up both rows
dated 05/21/2001 the EffSeq 1 row, which is
Active, and the EffSeq 2 row, which has status DC
(Discontinued).
169Building the EFFSEQ Subquery
- Even though we have already narrowed down the
Effective Date to one that has an Active status,
it is still important to include that Active
status criterion in the second subquery. - To illustrate why, well first build the EFFSEQ
subquery without that criterion and see what
happens. - Start building the EFFSEQ subquery by returning
to the Parent query, adding a new criterion of
the form A.EFFSEQ Subquery, and pulling in
another copy of ACAD_PROG (which will be assigned
alias C). - Since we are not using C.EFFSEQ in any Criteria
of this subquery, we dont have to do the trick
with Expressions here we can simply use the
standard MAX function from the Aggregate
properties.
170Note that we have no criterion on C.PROG_STATUS
yet.
171Now run the Query in its current form
172Now this time we have selected only one of the
two 05/21/2001 rows in the GRAD Careerbut
unfortunately, it is the wrong one, because we
omitted the criterion C.PROG_STATUS AC. We
can fix that easily by going back into the
subquery and adding it
173(No Transcript)
174and now, finally, we get exactly the results we
should!
175Effective Dates on ACAD_PLAN
- ACAD_PLAN is a child of the table ACAD_PROG its
first five key fields are the same as those of
ACAD_PROG and its last key field is ACAD_PLAN. - Every row of ACAD_PROG must have at least one
child row in ACAD_PLAN. - The fact that the key field ACAD_PLAN occurs
after EFFDT (and EFFSEQ) turns out to be quite
important in this context. The behavior of the
Query tool in this situation has changed from 7.6
to 8and not necessarily for the better.
176Looking at all ACAD_PLAN rows
Well look now at a student who has only one
stack,, but a number of different Plans over
the course of time. Choose no Effective Date
Option and select all the key fields plus Plan
Sequence (which is different from Effective
Sequence).
177Note that 2001-09-07 and 2002-07-31 have multiple
Plans (and Plan Sequences) for the given
Date/EffSeq, while for 2002-10-22 there are two
Effective Sequences with different Plans. Note
also that this is the 7.6 version of the Query.
Lets verify that the data is the same in version
8
178Date format is different in version 8, but data
is the same.
179Now add default Effective Date Option
Recall that the default option is EFFDT Eff Date
lt SYSDATE and EffSeq Last. But if you drag
and drop EFFDT from the Query tab into the
Critieria tab, note the Effective Sequence option
that defaults. This is not only inconsistent but
a poor choice the condition EffSeq Last is
more natural when using the Eff Date lt
operation.
180To change EFFSEQ option, double click the
Expression 2 field of the EFFDT criterion this
brings up this pop-up panel. Click Last instead
of First, hit OK, and the Criterion now reads as
we want it to
181Running this query gives us just a single
rownamely the latest row of the stack that is on
or before the System date (which was 03/05/2003).
That is the row with Effective Date 10/22/2002
and EFFSEQ 2 (recall that there was also an
EFFSEQ 1 row for that same date, but we have
specified EFFSEQ Last).
Now this was in version 7.6 lets carry out the
same process in Version 8.
182Note that weve made the change to EFFSEQ Last.
We now get 9 rows instead of just one!
183Whats going on here???
The answer must lie in the SQL, of course. In
version 7.6 we have
184In Version 8, the EFFDT/EFFSEQ operators add
joins on ACAD_PLAN to the subqueries!
185Analyzing the Subqueries
- The subqueries in 7.6 find the latest EFFDT and
EFFSEQ for the given stack, irrespective of
changing ACAD_PLAN values. - The subqueries in 7.6 find the latest EFFDT and
EFFSEQ for each distinct Plan in a given stack.
So if there have been many Plan changes, there
will be many different EFFDT/EFFSEQ combinations
selected.
186To illustrate a bit further, consider the 7.6
version with the EFFDT criterion changed to EFFDT
Eff Date lt 2002-01-01 and EffSeq Last
We still get only one EFFDT and EFFSEQ, though
this time there are two Plans.
187Likewise, consider the MUSICED Plan in version 8.
The student had that Plan on both the 10/14/1999
and 10/19/1999 rows, but only the 10/19/1999 row
is picked up when we add the EFFDT EffDate lt
SYSDATE and EffSeq Last criterion in version 8.
With no Effective Date option.
With default Effective Date option
188Whose behavior is better7.6 or 8?
- Most users would prefer the 7.6 approach, from a
functional viewpoint. - Additional key fields will appear after
EFFDT/EFFSEQ only when the table is a child of an
Effective Dated table with a conventional
design. In that situation, we usually want to
find the row of the child table corresponding to
the most current row of the parent. - The best alternative is usually is to incorporate
the parent table into the query, use the built-in
Effective Date criteria on the parent table, and
join in the child table. Well illustrate this
in our final example.
189Joining Effective Dated TablesACAD_PROG and
ACAD_PLAN
- ACAD_PLAN does not include a status field that
information is on the parent table ACAD_PROG. - For this reason (among others), it is often very
natural to write queries that involve joining
these two tables. - In this situation the Record Hierarchy join has
definite advantages over the standard Auto-Join
mechanism.
190Continue using same student (111112432) but now
begin with basic query that simply selects all
ACAD_PROG rows
19110 Rows of ACAD_PROG correspond to 12 rows of
ACAD_PLAN since 2 PROG rows have 2 Plans each.
192To add ACAD_PLAN to the Query, Expand the Record
Hierarchy under ACAD_PROG and double-click on
ACAD_PLAN to bring it into the query with joins
on all common key fields, including EFFDT and
EFFSEQ.
193Now add ACAD_PLAN and PLAN_SEQUENCE to the Fields
tab in the usual way, add PLAN_SEQUENCE to the
ordering clause, and rerun the query. Note
that EMPLID, ACAD_CAREER, STDNT_CAR_NBR are not
shown here due to lack of room.
Having Program and Plan data displayed together
makes it easier to understand.
194Joining ACAD_PLAN to ACAD_PROG via Auto-Join
instead of Record Hierarchy
Well look at version 8 first. Start with
ACAD_PROG as table A again, but now pull in
ACAD_PLAN as table B from the Database tab.
Effective Date Options panel appears
Note that there is a new option and it is now the
default in this situation.
195Accepting the default and clicking Next takes
us to
Note that the join criteria displayed are only on
the fields that precede EFFDT, not on EFFDT and
EFFSEQ themselves. This is the generic behavior
for Auto-Joins involving Effective Dated tables
no parent-child relationships are detected.
Click Finish and then go look at the Criteria
tab.
196Add the fields ACAD_PLAN and PLAN_ SEQUENCE as
before, and examine the SQL. Recall that this is
version 8.
197Query returns 50 rows! Only first 24 are shown
here. Note the accumulation that appears to
occur over time.
198Add B.EFFDT to the fields selected and rerun. We
are getting, for each row of ACAD_PROG, one row
of ACAD_PLAN each for each Plan that the student
has had on or prior to the date of the ACAD_PROG
row.
199Its safe to say few people would be interested
in the logic of the query just shown. What if we
try the same Auto-Join procedure in version 7.6?
Weve added B.EFFSEQ this time.
Since we are in 7.6, the joins on ACAD_PLAN dont
appear in the subqueries.
200These results look much more reasonable and in
fact are almost identical to those of the
correct approach using Record Hierarchy join.
Only difference is in indicated row, where we
have EFFSEQ 1 in ACAD_PROG but EFFSEQ 2 for
ACAD_PLAN.
201 The EFFSEQ Subtlety
- Because ACAD_PLAN is a child of ACAD_PROG, the
EFFDT Subquery will always select the Effective
Date of the row of the stack that is passed down
from ACAD_PROG. - However, if there are two rows of ACAD_PROG with
the same EFFDT (hence different EFFSEQs), then
the EFFSEQ subquery will choose the ACAD_PLAN
row(s) with the maximum EFFSEQ value for that
EFFDT, regardless of whether that EFFSEQ value
matches the value on the ACAD_PROG row.
202The moral of the story
- Using the default Effective Date options under
Auto-Join (instead of a Record Hierarchy join) to
join in a child table is rarely advisable. - In version 8, you will often get results that are
confusing enough to alert you that something is
wrong. - In version 7.6, you may get results that look
right but arent quiteand subtle errors are much
more difficult to detect, troubleshoot and
diagnose than are egregious ones!
203Final Example! Using Built-in Effective Date
Operators with PROG/PLAN Joins
- Recall that in version 8, built-in Effective Date
options on ACAD_PLAN give us somewhat strange
results. - One way around this is to build your own subquery
to emulate the logic in 7.6 - An easier alternative that usually works well is
to use the built-in Effective Date options for
the parent table ACAD_PROG and use Record
Hierarchy join to pull in ACAD_PLAN.
204Recall the results of the query that used the
Record Hierarchy join of ACAD_PLAN to ACAD_PROG
with No Effective Date options
205Now add a criterion of the form A.EFFDT Effective
Date lt SYSDATE, EffSeq Last in the usual way,
and then rerun the query. We get exactly what we
would expect (and want).
206Now, what if we get rid of the criterion on
A.EFFDT and instead use an Effective Date
operator on ACAD_PLAN rather than ACAD_PROGthat
is, the criterion B.EFFDT Effective Date lt
SYSDATE, EffSeq Last?
Were back to 9 rows instead of 1, because of the
additional join on ACAD_PLAN used in the version
8 subqueries.
207Final Remarks
- There are some situations where you are pulling
in a new Effective Dated table B to a query that
already contains an Effective Dated table A in
which it does make sense to accept the default
operation B.EFFDT Effective Date lt A.EFFDT,
EffSeq Last. - However, you should never assume that this
default choice is the correct one for your
particular needs effective dated logic is
complicated enough that you always need to think
carefully about what criteria you really want.
- Euclid told Ptolemy that, There is no royal road
to geometry, and unfortunately the same is true
about Effective DatingHenry Stewarts claims
notwithstanding! In fact, the same is true
about Queries in general. The tool is only a
tool you need to understand the underlying
logic.
208Contact Information
- Steve Grantham sgrantha_at_boisestate.edu
- (208) 426-2099
- Terri Coolidge tcoolidg_at_boisestate.edu
(208) 426-4468