Title: Haas MFE SAS Workshop Lecture 3
1Haas MFE SAS WorkshopLecture 3
- Peng Liu http//faculty.haas.berkeley.edu/peliu/c
omputing -
Haas School of Business, Berkeley, MFE 2006
2SAS SQL
- Peng Liu http//faculty.haas.berkeley.edu/peliu/c
omputing -
Haas School of Business, Berkeley, MFE 2006
3PROC SQL - What
- What does SQL can do?
- Selecting
- Ordering/sorting
- Subsetting
- Restructuring
- Creating table/view
- Joining/Merging
- Transforming variables
- Editing
4PROC SQL - Why
- The Advantage of using SQL
- Combined functionality
- Faster for smaller tables
- SQL code is more portable for non-SAS
applications - Not require presorting
- Not require common variable names to join on.
(need same type , length)
5Selecting Data
- PROC SQL
- SELECT DISTINCT rating FROM MFE.MOVIES
- QUIT
- The simplest SQL code, need 3 statements
- By default, it will print the resultant query,
use NOPRINT option to suppress this feature. - Begin with PROC SQL, end with QUIT not RUN
- Need at least one SELECT FROM statement
- DISTINCT is an option that removes duplicate rows
6Ordering/Sorting Data
- PROC SQL
- SELECT
- FROM MFE.MOVIES
- ORDER BY category
- QUIT
- Remember the placement of the SAS statements has
no effect so we can put the middle statement
into 3 lines - SELECT means we select all variables from
dataset MFE.MOVIES - Put ORDER BY after FROM.
- We sort the data by variable category
7Subsetting Data- Character searching in WHERE
- PROC SQL
- SELECT title, category
- FROM MFE.MOVIES
- WHERE category CONTAINS 'Action'
- QUIT
- Use comma (,) to separate selected variables
- CONTAINS in WHERE statement only for character
variables - Also try WHERE UPCASE(category) LIKE 'ACTION'
- Use wildcard char. Percent sign () with LIKE
operator.
8Subsetting Data- Phonetic Matching in WHERE
- PROC SQL
- SELECT title, category, rating
- FROM MFE.MOVIES
- WHERE category 'Drana'
- QUIT
- Always Put WHERE after FROM
- Sounds like operator
- Search movie title for the phonetic variation of
drama, also help possible spelling variations
9Case Logic - reassigning/recategorize
- PROC SQL
- SELECT title, rating,
- CASE rating
- WHEN 'G' THEN General'
ELSE 'Other' - END AS level
- FROM MFE.MOVIES
- QUIT
- The order of each statement is important
- CASE END AS should in between SELECT and FROM
- Note there is , after the variables you want to
select - Use WHEN THEN ELSE to redefine variables
- Rename variable from rating to level
10Creating New Data - Create Table
- PROC SQL
- CREATE TABLE ACTION AS
- SELECT title, category
- FROM MFE.MOVIES
- WHERE category CONTAINS 'Action'
- QUIT
- CREATE TABLE AS can always be in front of
SELECT FROM statement to build a sas file. - In SELECT, the results of a query are converted
to an output object (printing). Query results can
also be stored as data. The CREATE TABLE
statement creates a table with the results of a
query. The CREATE VIEW statement stores the query
itself as a view. Either way, the data identified
in the query can beused in later SQL statements
or in other SAS steps. - Produce a new dataset (table) ACTION in work
directory, no printing
11Creating New Data - Create View
- PROC SQL
- CREATE VIEW G_MOVIES AS
- SELECT title, category, rating
- FROM MFE.MOVIES
- WHERE rating 'G'
- ORDER BY title
- SELECT FROM G_MOVIES
- QUIT
- First step-creating a view,no output is produced
then display the desired output results - Use to separate two block of code inside of
proc sql - When a table is created, the query is executed
and the resulting data is stored in a file. When
a view is created, the query itself is stored in
the file. The data is not accessed at all in the
process of creating a view.
12Join Tables (Merge datasets) - Cartesian Join
- PROC SQL
- SELECT
- FROM MFE.CUSTOMERS, MFE.MOVIES
- QUIT
- Terminology Join (Merge) datasets (tables)
- No prior sorting required one advantage over
DATA MERGE - Use comma (,) to separate two datasets in FROM
- Without WHERE, all possible combinations of rows
from each tables is produced, all columns are
included - Turn on the HTML result option for better
display - Tool/Options/Preferences/Results/ check Create
HTML/OK
13Join Tables (Merge datasets) - Inner Join using
WHERE
- PROC SQL
- SELECT
- FROM MFE.MOVIES, MFE.ACTORS
- WHERE MOVIES.title ACTORS.title
- QUIT
- Use WHERE to specify connecting columns (title)
- table1.matchvar table2.matchvar
- Produce rows that have same movie title
- The matching variable can be of different name
different datasets
14Join Tables (Merge datasets) - Inner Join using
WHERE (Cont.)
- PROC SQL
- SELECT M.title, M.rating, A.actor_leading
- FROM MFE.MOVIES M, MFE.ACTORS A
- WHERE MOVIES.title ACTORS.title
- QUIT
- Short-cut for table names
- Can be used in SELECT and WHERE statements
- Need to be declared in FROM statement
15Join Tables (Merge datasets) - Join three tables
- PROC SQL
- SELECT C.cust_no,
- M.title,M.rating, M.category,
- A.actor_leading
- FROM MFE.CUSTOMERS C,
- MFE.MOVIES2 M,
- MFE.ACTORS A
- WHERE C.cust_no M.cust_no AND M.title
A.title - QUIT
- Use AND in WHERE statement to specify two
matching conditions - Produce rows that satisfies all the conditions
- Note We use MOVIES2 in this example
- Can join up to 32 tables in one SQL code
16Join Tables (Merge datasets) - Inner Joins
using ON
- PROC SQL
- SELECT M.title, rating,actor_leading
- FROM MFE.MOVIES M
- INNER JOIN MFE.ACTORS A
- ON M.TITLE A.TITLE
- QUIT
- Same result as using where
- WHERE is used to select rows from inner joins
- ON is used to select rows from outer or inner
17Join Tables (Merge datasets) - Left Outer Joins
- PROC SQL
- SELECT MOVIES.title, actor_leading, rating
- FROM MFE.MOVIES
- LEFT JOIN
- MFE.ACTORS
- ON MOVIES.title ACTORS.title
- QUIT
- Resulting output contains all rows for which the
SQL expression, referenced in the ON clause,
matches both tables and all rows from LEFT table
(MOVIES) that did not match any row in the right
(ACTORS) table. - Essentially the rows from LEFT table are
preserved and captured exactly as they stored in
the table itself, regardless if a match exists. - Need to specify a table name for the matching
variable in SELECT
18Join Tables (Merge datasets) - Right Outer Joins
- PROC SQL
- SELECT ACTORS.title, actor_leading, rating
- FROM MFE.MOVIES
- RIGHT JOIN
- MFE.ACTORS
- ON MOVIES.title ACTORS.title
- QUIT
- Resulting output contains all rows for which the
SQL expression, referenced in the ON clause,
matches both tables and all rows from RIGHT table
(ACTORS) that did not match any row in the right
(MOVIES) table. - Essentially the rows from RIGHT table are
preserved and captured exactly as they stored in
the table itself, regardless if a match exists.
19Join Tables (Concatenating) - Outer Union
- PROC SQL
- SELECT FROM MFE.CUSTOMERS
- OUTER UNION
- SELECT FROM MFE.MOVIES
- QUIT
- SQL performs OUTER UNION, similar to DATA steps
with a SET statement to Concatenate datasets. - The result contains all the rows produced by the
first table-expression followed by all the row
produced by the second table-expression.
20Transforming Data- Creating new Variables
- /Creating new variables/
- PROC SQL
- SELECT title, length, category, year, rating,
- 2006-year AS age
- FROM MFE.MOVIES
- QUIT
- You can create new variables within SELECT
statement, the name of new variable follows after
AS. - Note the order of the express is reversed
21Transforming Data- Summarizing Data using SQL
functions
- PROC SQL
- SELECT ,
- COUNT(title) AS notitle,
- MAX(year) AS most_recent,
- MIN(year) AS earliest,
- SUM(length) AS total_length,
- NMISS(rating) AS nomissing
- FROM MFE.MOVIES
- GROUP BY rating
- QUIT
- Simple summarization functions available
- All function can be operated in GROUPs
22Editing Data Insert observations.
- PROC SQL NOPRINT
- INSERT INTO MFE.CUSTOMERS
- VALUES(1 'Peng')
- INSERT INTO MFE.CUSTOMERS
- SET Cust_no2,Name'Sasha'
- QUIT
- There are two ways of inserting observations into
a table. Data type should be the same. - VALUES( ) new values are separated by space.
- SET column name newly assigned values,
delimited by commas.
23Editing Data Deleting rows and Dropping
columns
- /Deleting rows/
- PROC SQL
- DELETE
- FROM MFE.MOVIES
- WHERE length LE 100
- QUIT
- /Droping variables/
- PROC SQL
- CREATE TABLE NEW (DROPrating) AS
- SELECT
- FROM MFE.MOVIES
- QUIT
- Deleting columns can be done in SELECT or in DROP
on created table
24Editing Data Update observations
- /Updating Observation/
- PROC SQL NOPRINT
- UPDATE MFE.CUSTOMERS
- SET Name'Liu'
- WHERE Cust_no1
- QUIT
- UPDATE SET WHERE
- Find the observation and set new value
- If more than one observations satisfies the
condition, all are updated with the new data in
SET statement