Haas MFE SAS Workshop Lecture 3 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Haas MFE SAS Workshop Lecture 3

Description:

Haas MFE SAS Workshop Lecture 3: Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006 SAS SQL Peng Liu http://faculty ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 25
Provided by: mcsrOlemi
Category:
Tags: mfe | sas | haas | lecture | workshop

less

Transcript and Presenter's Notes

Title: Haas MFE SAS Workshop Lecture 3


1
Haas MFE SAS WorkshopLecture 3
  • Peng Liu http//faculty.haas.berkeley.edu/peliu/c
    omputing

Haas School of Business, Berkeley, MFE 2006
2
SAS SQL
  • Peng Liu http//faculty.haas.berkeley.edu/peliu/c
    omputing

Haas School of Business, Berkeley, MFE 2006
3
PROC SQL - What
  • What does SQL can do?
  • Selecting
  • Ordering/sorting
  • Subsetting
  • Restructuring
  • Creating table/view
  • Joining/Merging
  • Transforming variables
  • Editing

4
PROC SQL - Why
  • The Advantage of using SQL
  • Combined functionality
  • Faster for smaller tables
  • SQL code is more portable for non-SAS
    applications
  • Not require presorting
  • Not require common variable names to join on.
    (need same type , length)

5
Selecting Data
  • PROC SQL
  • SELECT DISTINCT rating FROM MFE.MOVIES
  • QUIT
  • The simplest SQL code, need 3 statements
  • By default, it will print the resultant query,
    use NOPRINT option to suppress this feature.
  • Begin with PROC SQL, end with QUIT not RUN
  • Need at least one SELECT FROM statement
  • DISTINCT is an option that removes duplicate rows

6
Ordering/Sorting Data
  • PROC SQL
  • SELECT
  • FROM MFE.MOVIES
  • ORDER BY category
  • QUIT
  • Remember the placement of the SAS statements has
    no effect so we can put the middle statement
    into 3 lines
  • SELECT means we select all variables from
    dataset MFE.MOVIES
  • Put ORDER BY after FROM.
  • We sort the data by variable category

7
Subsetting Data- Character searching in WHERE
  • PROC SQL
  • SELECT title, category
  • FROM MFE.MOVIES
  • WHERE category CONTAINS 'Action'
  • QUIT
  • Use comma (,) to separate selected variables
  • CONTAINS in WHERE statement only for character
    variables
  • Also try WHERE UPCASE(category) LIKE 'ACTION'
  • Use wildcard char. Percent sign () with LIKE
    operator.

8
Subsetting Data- Phonetic Matching in WHERE
  • PROC SQL
  • SELECT title, category, rating
  • FROM MFE.MOVIES
  • WHERE category 'Drana'
  • QUIT
  • Always Put WHERE after FROM
  • Sounds like operator
  • Search movie title for the phonetic variation of
    drama, also help possible spelling variations

9
Case Logic - reassigning/recategorize
  • PROC SQL
  • SELECT title, rating,
  • CASE rating
  • WHEN 'G' THEN General'
    ELSE 'Other'
  • END AS level
  • FROM MFE.MOVIES
  • QUIT
  • The order of each statement is important
  • CASE END AS should in between SELECT and FROM
  • Note there is , after the variables you want to
    select
  • Use WHEN THEN ELSE to redefine variables
  • Rename variable from rating to level

10
Creating New Data - Create Table
  • PROC SQL
  • CREATE TABLE ACTION AS
  • SELECT title, category
  • FROM MFE.MOVIES
  • WHERE category CONTAINS 'Action'
  • QUIT
  • CREATE TABLE AS can always be in front of
    SELECT FROM statement to build a sas file.
  • In SELECT, the results of a query are converted
    to an output object (printing). Query results can
    also be stored as data. The CREATE TABLE
    statement creates a table with the results of a
    query. The CREATE VIEW statement stores the query
    itself as a view. Either way, the data identified
    in the query can beused in later SQL statements
    or in other SAS steps.
  • Produce a new dataset (table) ACTION in work
    directory, no printing

11
Creating New Data - Create View
  • PROC SQL
  • CREATE VIEW G_MOVIES AS
  • SELECT title, category, rating
  • FROM MFE.MOVIES
  • WHERE rating 'G'
  • ORDER BY title
  • SELECT FROM G_MOVIES
  • QUIT
  • First step-creating a view,no output is produced
    then display the desired output results
  • Use to separate two block of code inside of
    proc sql
  • When a table is created, the query is executed
    and the resulting data is stored in a file. When
    a view is created, the query itself is stored in
    the file. The data is not accessed at all in the
    process of creating a view.

12
Join Tables (Merge datasets) - Cartesian Join
  • PROC SQL
  • SELECT
  • FROM MFE.CUSTOMERS, MFE.MOVIES
  • QUIT
  • Terminology Join (Merge) datasets (tables)
  • No prior sorting required one advantage over
    DATA MERGE
  • Use comma (,) to separate two datasets in FROM
  • Without WHERE, all possible combinations of rows
    from each tables is produced, all columns are
    included
  • Turn on the HTML result option for better
    display
  • Tool/Options/Preferences/Results/ check Create
    HTML/OK

13
Join Tables (Merge datasets) - Inner Join using
WHERE
  • PROC SQL
  • SELECT
  • FROM MFE.MOVIES, MFE.ACTORS
  • WHERE MOVIES.title ACTORS.title
  • QUIT
  • Use WHERE to specify connecting columns (title)
  • table1.matchvar table2.matchvar
  • Produce rows that have same movie title
  • The matching variable can be of different name
    different datasets

14
Join Tables (Merge datasets) - Inner Join using
WHERE (Cont.)
  • PROC SQL
  • SELECT M.title, M.rating, A.actor_leading
  • FROM MFE.MOVIES M, MFE.ACTORS A
  • WHERE MOVIES.title ACTORS.title
  • QUIT
  • Short-cut for table names
  • Can be used in SELECT and WHERE statements
  • Need to be declared in FROM statement

15
Join Tables (Merge datasets) - Join three tables
  • PROC SQL
  • SELECT C.cust_no,
  • M.title,M.rating, M.category,
  • A.actor_leading
  • FROM MFE.CUSTOMERS C,
  • MFE.MOVIES2 M,
  • MFE.ACTORS A
  • WHERE C.cust_no M.cust_no AND M.title
    A.title
  • QUIT
  • Use AND in WHERE statement to specify two
    matching conditions
  • Produce rows that satisfies all the conditions
  • Note We use MOVIES2 in this example
  • Can join up to 32 tables in one SQL code

16
Join Tables (Merge datasets) - Inner Joins
using ON
  • PROC SQL
  • SELECT M.title, rating,actor_leading
  • FROM MFE.MOVIES M
  • INNER JOIN MFE.ACTORS A
  • ON M.TITLE A.TITLE
  • QUIT
  • Same result as using where
  • WHERE is used to select rows from inner joins
  • ON is used to select rows from outer or inner

17
Join Tables (Merge datasets) - Left Outer Joins
  • PROC SQL
  • SELECT MOVIES.title, actor_leading, rating
  • FROM MFE.MOVIES
  • LEFT JOIN
  • MFE.ACTORS
  • ON MOVIES.title ACTORS.title
  • QUIT
  • Resulting output contains all rows for which the
    SQL expression, referenced in the ON clause,
    matches both tables and all rows from LEFT table
    (MOVIES) that did not match any row in the right
    (ACTORS) table.
  • Essentially the rows from LEFT table are
    preserved and captured exactly as they stored in
    the table itself, regardless if a match exists.
  • Need to specify a table name for the matching
    variable in SELECT

18
Join Tables (Merge datasets) - Right Outer Joins
  • PROC SQL
  • SELECT ACTORS.title, actor_leading, rating
  • FROM MFE.MOVIES
  • RIGHT JOIN
  • MFE.ACTORS
  • ON MOVIES.title ACTORS.title
  • QUIT
  • Resulting output contains all rows for which the
    SQL expression, referenced in the ON clause,
    matches both tables and all rows from RIGHT table
    (ACTORS) that did not match any row in the right
    (MOVIES) table.
  • Essentially the rows from RIGHT table are
    preserved and captured exactly as they stored in
    the table itself, regardless if a match exists.

19
Join Tables (Concatenating) - Outer Union
  • PROC SQL
  • SELECT FROM MFE.CUSTOMERS
  • OUTER UNION
  • SELECT FROM MFE.MOVIES
  • QUIT
  • SQL performs OUTER UNION, similar to DATA steps
    with a SET statement to Concatenate datasets.
  • The result contains all the rows produced by the
    first table-expression followed by all the row
    produced by the second table-expression.

20
Transforming Data- Creating new Variables
  • /Creating new variables/
  • PROC SQL
  • SELECT title, length, category, year, rating,
  • 2006-year AS age
  • FROM MFE.MOVIES
  • QUIT
  • You can create new variables within SELECT
    statement, the name of new variable follows after
    AS.
  • Note the order of the express is reversed

21
Transforming Data- Summarizing Data using SQL
functions
  • PROC SQL
  • SELECT ,
  • COUNT(title) AS notitle,
  • MAX(year) AS most_recent,
  • MIN(year) AS earliest,
  • SUM(length) AS total_length,
  • NMISS(rating) AS nomissing
  • FROM MFE.MOVIES
  • GROUP BY rating
  • QUIT
  • Simple summarization functions available
  • All function can be operated in GROUPs

22
Editing Data Insert observations.
  • PROC SQL NOPRINT
  • INSERT INTO MFE.CUSTOMERS
  • VALUES(1 'Peng')
  • INSERT INTO MFE.CUSTOMERS
  • SET Cust_no2,Name'Sasha'
  • QUIT
  • There are two ways of inserting observations into
    a table. Data type should be the same.
  • VALUES( ) new values are separated by space.
  • SET column name newly assigned values,
    delimited by commas.

23
Editing Data Deleting rows and Dropping
columns
  • /Deleting rows/
  • PROC SQL
  • DELETE
  • FROM MFE.MOVIES
  • WHERE length LE 100
  • QUIT
  • /Droping variables/
  • PROC SQL
  • CREATE TABLE NEW (DROPrating) AS
  • SELECT
  • FROM MFE.MOVIES
  • QUIT
  • Deleting columns can be done in SELECT or in DROP
    on created table

24
Editing Data Update observations
  • /Updating Observation/
  • PROC SQL NOPRINT
  • UPDATE MFE.CUSTOMERS
  • SET Name'Liu'
  • WHERE Cust_no1
  • QUIT
  • UPDATE SET WHERE
  • Find the observation and set new value
  • If more than one observations satisfies the
    condition, all are updated with the new data in
    SET statement
Write a Comment
User Comments (0)
About PowerShow.com