Haas MFE SAS Workshop Lecture 3: - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Haas MFE SAS Workshop Lecture 3:

Description:

Haas School of Business, Berkeley, MFE 2006. Commonly used PROCedures. in Financial Economics ... The CORR procedure computes Pearson correlation coefficients, three ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 56
Provided by: liup2
Category:
Tags: mfe | sas | computes | haas | lecture | workshop

less

Transcript and Presenter's Notes

Title: Haas MFE SAS Workshop Lecture 3:


1
Haas MFE SAS WorkshopLecture 3
  • Peng Liu http//faculty.haas.berkeley.edu/peliu/c
    omputing

Haas School of Business, Berkeley, MFE 2006
2
Commonly used PROCeduresin Financial Economics
  • Peng Liu http//faculty.haas.berkeley.edu/peliu/c
    omputing

Haas School of Business, Berkeley, MFE 2006
3
SAS PROCs
  • Basic Statistical Analysis
  • Univariate statistics
  • PROC MEANS
  • PROC UNIVARIATE
  • PROC FREQ
  • Bivariate and Multivariate Statistics
  • PROC CORR
  • PROC NPAR1WAY
  • PROC TTEST
  • Financial Econometrics using SAS
  • Linear Models (OLS, GLS and their variates)
  • PROC REG
  • PROC GLM (Skip)
  • Logistic Regression
  • PROC LOGISTIC
  • PROC GENMOD
  • Hazard Regression (Cox-P.H.)
  • PROC PHREG

4
Basic Statistical Analysis
  • Univariate statistics
  • PROC MEANS
  • PROC UNIVARIATE
  • PROC FREQ
  • Bivariate and Multivariate Statistics
  • PROC CORR
  • PROC NPAR1WAY
  • PROC TTEST

5
Comparison of PROC MEANSand PROC UNIVARIATE
  • PROC MEANS
  • DESCRIPTIVE STATISTICS
  • CLM CSS CV KURTOSIS LCLM MAX MEAN MIN N NMISS
    RANGE SKEWNESS STD STDERR SUM SUMWGT UCLM USS VAR
  • QUANTILE STATISTICS
  • MEDIANP50 Q1P25 Q3P75 P1 P5 P10 P90 P95 P99
    RANGE
  • HYPOTHESIS TESTING
  • PROBT T
  • PROC UNIVARIATE
  • DESCRIPTIVE STATISTICS
  • CSS CV KURTOSIS MAX MEAN MIN MODE N NMISS RANGE
    SKEWNESS STD STDMEAN SUM SUMWGT USS VAR
  • QUANTILE STATISTICS
  • MEDIAN P1 P5 P10 P90 P95 P99 Q1 Q3 RANGE
  • QUANTILE STATISTICS
  • NORMAL PROBN MSIGN PROBM SIGNRANK PROBS T PROBT
  • ROBUST STATISTICS
  • GINI MAD QN SN STD_SINI STD_MAD STD_QN
    STD_QRANGE STD_SN

6
PROC MEANS
  • PROC MEANS DATAmfe.loan
  • VAR appraisal ltv
  • CLASS state
  • RUN

PROC MEANS DATAmfe.loan max min VAR appraisal
ltv OUTPUT OUTm maxmaxvalue maxltv
minminvalue minltv RUN
  • The default output for PROC MEANS are variable
    label N Mean Std Dev Min max
  • median min max clm alpha0.05 are examples of
    options you can specify.
  • You can get summary statistics for many variables
  • CLASS statements will produce summary stat for
    each grouping class.
  • You can suppress print using NOPRINT option
  • You can save the result in a self-defined sas
    dataset.

7
PROC UNIVARIATE
  • PROC UNIVARIATE DATAmfe.loan
  • VAR ltv ID id
  • RUN

PROC UNIVARIATE DATAmfe.loan VAR ltv
HISTOGRAM QQPLOT /normal RUN
  • Use VAR to specify which variable you want to
    analyze, otherwise, this PROC will produce all
    variables
  • Use ID to identify Extreme Observations, without
    ID statement it will use observation number by
    default
  • Can plot histogram, quantile-quantile plots etc.
  • Can do twosided T test, etc.

8
PROC FREQ
  • PROC FREQ DATAmfe.loan
  • TABLE term
  • RUN

PROC FREQ DATAmfe.loan TABLE state
stateterm/nocol norow RUN
  • One-way v.s two-way frequency table
  • /CHISQ or /BINOMIAL option can be used to test
    equal proportion
  • In one TABLE statement, you can produce more than
    one frequency tables
  • You can suppress col percentage or/and row
    percentage by option /nocol norow

9
PROC CORR
  • PROC CORR DATAmfe.loan
  • VAR rate ltv fico_orig
  • RUN

PROC CORR DATAmfe.loan COV SPEARMAN VAR rate
ltv fico_orig RUN
  • The CORR procedure computes Pearson correlation
    coefficients, three nonparametric measures of
    association (Spearman rank-oder correlation,
    Kendalls taub and Hoeffdings measure of
    dependence D), and the probabilities associated
    with these statistics for numeric variables
  • The default is Pearson correlation.
  • COV option evolke the computation of covariance

10
PROC TTEST
  • DATA
  • INPUT a b _at__at_
  • DATALINES
  • 51 55 64 61 75 74 86 90
  • 95 93 68 71 73 72 90 95
  • RUN

PROC TTEST PAIRED ab RUN
  • DATA step will produce automatic dataset, if user
    did not specify one.
  • _at__at_ in INPUT lets SAS continuously read from
    datelines
  • DATALINES is a SAS statement followed by lines
    of raw data.
  • Data are typed continuously separated by blank,
    you can separated into a different line in the
    way you like.
  • should be stand by itself
  • PROC step will perform specified procedure on
    current dataset in working directory if user did
    not specify a particular dataset name
  • Paired T-Test

11
PROC NPAR1WAY
  • PROC NPAR1WAY DATAmfe.loan
  • CLASS state
  • VAR ltv
  • RUN
  • NONPARAMETRIC TEST FOR DIFFERENCE ACROSS ONE-WAY
    CLASSIFICATION.
  • IF the normality assumption does not hold, we may
    use some nonparametric tests.
  • PROC NPAR1WAY performs nonparametric tests for
    location and scale differences across a one-way
    classiication, based on the following scores
    Wilcoxin, Median, Van Der Waerden, Savage,
    Siegel-Tukey, Ansari-Bradley, Klotz, and Modd
    Scores.

12
Financial Econometrics using SAS
  • Linear Models (OLS, GLS and their variates)
  • PROC REG
  • PROC GLM (Skip)
  • Logistic Regression
  • PROC LOGISTIC
  • PROC GENMOD
  • Hazard Regression (Cox-P.H.)
  • PROC PHREG

13
Linear Model Theory
  • Data (yi, xi(xi1, xi2, xik)) for i1, , n
    and yi ? R
  • Model yi ?0?-1xi1 ?kxik ?i for i1,,n
  • For short yX??
  • where
  • Assumption ?i are i.i.d. normal N(0,?2)
  • Ordinary Least Square Estimation
  • ? (XTX)-1XTy

14
PROC REG
  • PROC REG is a SAS procedure for simple or
    multivariate linear regression models with
    continuous dependent variables.
  • Part of SAS/STAT
  • Model fitting (parameters, residuals, confidence
    limits, influential statistics, etc)
  • Model selection (forward, backward, stepwise,
    ,etc)
  • Hypothesis testing
  • Model diagnostics
  • Plotting
  • Outputting estimates and statistics

15
PROC REG Examples
  • PROC REG DATAmfe.loan
  • MODEL ltv rate
  • PLOT ltv rate
  • QUIT

MODEL ltv rate fico_orig OLSMODEL ltv term
rate fico_orig MODEL ltv rate fico_orig
term/SELECTIONF
  • Begin with PROC REG end with QUIT
  • Multiple independent , dependent variables are
    separated by space
  • Label OLS is optional, useful for multiple
    MODEL statement in one PROC REG
  • By default, a constant is included
  • Use /Options to request additional stat or
    specify model selection method
  • PLOT creates a scatter plot of your regression
    data and automatically adds the regression line.

16
Logistic Regression Theory
  • Data (yi, xi(xi1, xi2, xik)) for i1, , n
    and yi is a binary or ordinal response variable.
    e.g. yi ? 0,1
  • Model
  • Maximum Likelihood estimate of ?
  • Assumption binomial Variation

17
Logistic Regression SAS procedure
  • SAS has several procedures that performs logistic
    regression, e.g. GENMOD, CATMOD and LOGISTIC
  • PROC LOGISTIC
  • Works for binary or ordinal response variables
  • Performs MLE using different optimization
    algorithms
  • 4 model selection methods F, B, Stepwise, Score
  • Outputs statistics to dataset
  • Tests linear hypotheses of parameters

18
PROC LOGISTIC Examples
  • PROC LOGISTIC DATAmfe.loan
  • CLASS state edu
  • MODEL default ltv age edu term rate
    state/LINKLOGIT
  • RUN
  • Begin with PROC LOGISTIC end with QUIT
  • /LINKLOGIT option can be ignored, other options
    PROBIT, CLOGIT, CLOGLOG
  • Use CLASS statement to avoid creating dummy in
    DATA step
  • /option can be used to request additional stat,
    or specify selection method.
  • TEST statement

19
Survival Analysis Background 1
20
Survival Analysis Background 2
21
Cox Proportional Hazard Regression
22
PROC PHREG - Example
  • PROC PHREG DATAmfe.loan
  • MODEL loanageprepay(0) age edu race rate ltv
    fico_orig state
  • RUN
  • Use WHERE option to subset sample to want to
    regress
  • You can define, group variables inside PHREG
    after MODEL using IF THEN ELSE
  • Handling tied data /TIESEXACT, other option
    DISCRETE
  • Run PHREG for different group, use BY option,
    need to sort data.
  • Use CLASS statement to create dummy variables

23
SAS SQL
  • Peng Liu http//faculty.haas.berkeley.edu/peliu/c
    omputing

Haas School of Business, Berkeley, MFE 2006
24
PROC SQL - What
  • What does SQL can do?
  • Selecting
  • Ordering/sorting
  • Subsetting
  • Restructuring
  • Creating table/view
  • Joining/Merging
  • Transforming variables
  • Editing

25
PROC SQL - Why
  • The Advantage of using SQL
  • Combined functionality
  • Faster for smaller tables
  • SQL code is more portable for non-SAS
    applications
  • Not require presorting
  • Not require common variable names to join on.
    (need same type , length)

26
Selecting Data
  • PROC SQL
  • SELECT DISTINCT rating FROM MFE.MOVIES
  • QUIT
  • The simplest SQL code, need 3 statements
  • By default, it will print the resultant query,
    use NOPRINT option to suppress this feature.
  • Begin with PROC SQL, end with QUIT not RUN
  • Need at least one SELECT FROM statement
  • DISTINCT is an option that removes duplicate rows

27
Ordering/Sorting Data
  • PROC SQL
  • SELECT
  • FROM MFE.MOVIES
  • ORDER BY category
  • QUIT
  • Remember the placement of the SAS statements has
    no effect so we can put the middle statement
    into 3 lines
  • SELECT means we select all variables from
    dataset MFE.MOVIES
  • Put ORDER BY after FROM.
  • We sort the data by variable category

28
Subsetting Data- Character searching in WHERE
  • PROC SQL
  • SELECT title, category
  • FROM MFE.MOVIES
  • WHERE category CONTAINS 'Action'
  • QUIT
  • Use comma (,) to separate selected variables
  • CONTAINS in WHERE statement only for character
    variables
  • Also try WHERE UPCASE(category) LIKE 'ACTION'
  • Use wildcard char. Percent sign () with LIKE
    operator.

29
Subsetting Data- Phonetic Matching in WHERE
  • PROC SQL
  • SELECT title, category, rating
  • FROM MFE.MOVIES
  • WHERE category 'Drana'
  • QUIT
  • Always Put WHERE after FROM
  • Sounds like operator
  • Search movie title for the phonetic variation of
    drama, also help possible spelling variations

30
Case Logic - reassigning/recategorize
  • PROC SQL
  • SELECT title, rating,
  • CASE rating
  • WHEN 'G' THEN General'
    ELSE 'Other'
  • END AS level
  • FROM MFE.MOVIES
  • QUIT
  • The order of each statement is important
  • CASE END AS should in between SELECT and FROM
  • Note there is , after the variables you want to
    select
  • Use WHEN THEN ELSE to redefine variables
  • Rename variable from rating to level

31
Creating New Data - Create Table
  • PROC SQL
  • CREATE TABLE ACTION AS
  • SELECT title, category
  • FROM MFE.MOVIES
  • WHERE category CONTAINS 'Action'
  • QUIT
  • CREATE TABLE AS can always be in front of
    SELECT FROM statement to build a sas file.
  • In SELECT, the results of a query are converted
    to an output object (printing). Query results can
    also be stored as data. The CREATE TABLE
    statement creates a table with the results of a
    query. The CREATE VIEW statement stores the query
    itself as a view. Either way, the data identified
    in the query can beused in later SQL statements
    or in other SAS steps.
  • Produce a new dataset (table) ACTION in work
    directory, no printing

32
Creating New Data - Create View
  • PROC SQL
  • CREATE VIEW G_MOVIES AS
  • SELECT title, category, rating
  • FROM MFE.MOVIES
  • WHERE rating 'G'
  • ORDER BY title
  • SELECT FROM G_MOVIES
  • QUIT
  • First step-creating a view,no output is produced
    then display the desired output results
  • Use to separate two block of code inside of
    proc sql
  • When a table is created, the query is executed
    and the resulting data is stored in a file. When
    a view is created, the query itself is stored in
    the file. The data is not accessed at all in the
    process of creating a view.

33
Join Tables (Merge datasets) - Cartesian Join
  • PROC SQL
  • SELECT
  • FROM MFE.CUSTOMERS, MFE.MOVIES
  • QUIT
  • Terminology Join (Merge) datasets (tables)
  • No prior sorting required one advantage over
    DATA MERGE
  • Use comma (,) to separate two datasets in FROM
  • Without WHERE, all possible combinations of rows
    from each tables is produced, all columns are
    included
  • Turn on the HTML result option for better
    display
  • Tool/Options/Preferences/Results/ check Create
    HTML/OK

34
Join Tables (Merge datasets) - Inner Join using
WHERE
  • PROC SQL
  • SELECT
  • FROM MFE.MOVIES, MFE.ACTORS
  • WHERE MOVIES.title ACTORS.title
  • QUIT
  • Use WHERE to specify connecting columns (title)
  • table1.matchvar table2.matchvar
  • Produce rows that have same movie title
  • The matching variable can be of different name
    different datasets

35
Join Tables (Merge datasets) - Inner Join using
WHERE (Cont.)
  • PROC SQL
  • SELECT M.title, M.rating, A.actor_leading
  • FROM MFE.MOVIES M, MFE.ACTORS A
  • WHERE MOVIES.title ACTORS.title
  • QUIT
  • Short-cut for table names
  • Can be used in SELECT and WHERE statements
  • Need to be declared in FROM statement

36
Join Tables (Merge datasets) - Join three tables
  • PROC SQL
  • SELECT C.cust_no,
  • M.title,M.rating, M.category,
  • A.actor_leading
  • FROM MFE.CUSTOMERS C,
  • MFE.MOVIES2 M,
  • MFE.ACTORS A
  • WHERE C.cust_no M.cust_no AND M.title
    A.title
  • QUIT
  • Use AND in WHERE statement to specify two
    matching conditions
  • Produce rows that satisfies all the conditions
  • Note We use MOVIES2 in this example
  • Can join up to 32 tables in one SQL code

37
Join Tables (Merge datasets) - Inner Joins
using ON
  • PROC SQL
  • SELECT M.title, rating,actor_leading
  • FROM MFE.MOVIES M
  • INNER JOIN MFE.ACTORS A
  • ON M.TITLE A.TITLE
  • QUIT
  • Same result as using where
  • WHERE is used to select rows from inner joins
  • ON is used to select rows from outer or inner

38
Join Tables (Merge datasets) - Left Outer Joins
  • PROC SQL
  • SELECT MOVIES.title, actor_leading, rating
  • FROM MFE.MOVIES
  • LEFT JOIN
  • MFE.ACTORS
  • ON MOVIES.title ACTORS.title
  • QUIT
  • Resulting output contains all rows for which the
    SQL expression, referenced in the ON clause,
    matches both tables and all rows from LEFT table
    (MOVIES) that did not match any row in the right
    (ACTORS) table.
  • Essentially the rows from LEFT table are
    preserved and captured exactly as they stored in
    the table itself, regardless if a match exists.
  • Need to specify a table name for the matching
    variable in SELECT

39
Join Tables (Merge datasets) - Right Outer Joins
  • PROC SQL
  • SELECT ACTORS.title, actor_leading, rating
  • FROM MFE.MOVIES
  • RIGHT JOIN
  • MFE.ACTORS
  • ON MOVIES.title ACTORS.title
  • QUIT
  • Resulting output contains all rows for which the
    SQL expression, referenced in the ON clause,
    matches both tables and all rows from RIGHT table
    (ACTORS) that did not match any row in the right
    (MOVIES) table.
  • Essentially the rows from RIGHT table are
    preserved and captured exactly as they stored in
    the table itself, regardless if a match exists.

40
Join Tables (Concatenating) - Outer Union
  • PROC SQL
  • SELECT FROM MFE.CUSTOMERS
  • OUTER UNION
  • SELECT FROM MFE.MOVIES
  • QUIT
  • SQL performs OUTER UNION, similar to DATA steps
    with a SET statement to Concatenate datasets.
  • The result contains all the rows produced by the
    first table-expression followed by all the row
    produced by the second table-expression.

41
Transforming Data- Creating new Variables
  • /Creating new variables/
  • PROC SQL
  • SELECT title, length, category, year, rating,
  • 2006-year AS age
  • FROM MFE.MOVIES
  • QUIT
  • You can create new variables within SELECT
    statement, the name of new variable follows after
    AS.
  • Note the order of the express is reversed

42
Transforming Data- Summarizing Data using SQL
functions
  • PROC SQL
  • SELECT ,
  • COUNT(title) AS notitle,
  • MAX(year) AS most_recent,
  • MIN(year) AS earliest,
  • SUM(length) AS total_length,
  • NMISS(rating) AS nomissing
  • FROM MFE.MOVIES
  • GROUP BY rating
  • QUIT
  • Simple summarization functions available
  • All function can be operated in GROUPs

43
Editing Data Insert observations.
  • PROC SQL NOPRINT
  • INSERT INTO MFE.CUSTOMERS
  • VALUES(1 'Peng')
  • INSERT INTO MFE.CUSTOMERS
  • SET Cust_no2,Name'Sasha'
  • QUIT
  • There are two ways of inserting observations into
    a table. Data type should be the same.
  • VALUES( ) new values are separated by space.
  • SET column name newly assigned values,
    delimited by commas.

44
Editing Data Deleting rows and Dropping
columns
  • /Deleting rows/
  • PROC SQL
  • DELETE
  • FROM MFE.MOVIES
  • WHERE length LE 100
  • QUIT
  • /Droping variables/
  • PROC SQL
  • CREATE TABLE NEW (DROPrating) AS
  • SELECT
  • FROM MFE.MOVIES
  • QUIT
  • Deleting columns can be done in SELECT or in DROP
    on created table

45
Editing Data Update observations
  • /Updating Observation/
  • PROC SQL NOPRINT
  • UPDATE MFE.CUSTOMERS
  • SET Name'Liu'
  • WHERE Cust_no1
  • QUIT
  • UPDATE SET WHERE
  • Find the observation and set new value
  • If more than one observations satisfies the
    condition, all are updated with the new data in
    SET statement

46
SAS IML
  • Peng Liu http//faculty.haas.berkeley.edu/peliu/c
    omputing

Haas School of Business, Berkeley, MFE 2006
47
PROC IML - What
  • OUTLINES
  • Defining Matrix
  • Creating Matrix
  • Combining Matrix
  • Matrix Algebra (Function, Reduction Operators)
  • Read-In SAS data sets
  • Create SAS data sets

48
Defining and Referencing matrices
  • PROC IML
  • RESET PRINT
  • A3
  • B1 2 3 row vector
  • C1,2,3 column vector
  • D1 2 3, 4 5 62 X 3 matrix
  • QUIT

PROC IML D1 2 3, 4 5 6 E1D,2
E2D1, E3D2,3 E4D1 3,2 3 PRINT D
E1 E2 E3 E4 QUIT
  • Begin with PROC IML end with QUIT
  • RESET PRINT produce all output otherwise specify
    what you want to PRINT , but you have to assign
    first.
  • Use curly brackets to create vector or matrix
  • Use square bracket to refer to elements
  • E4D1 3,2 3 assigns submatrix to E4
    consisting 1,3 rows and 2,3 columns of matrix D

49
Creating matrices
  • PROC IML
  • RESET PRINT
  • index15 rindex51
  • vars'X1''X8'
  • series do(0,84,12)

aa I(6) bb j(5,5,0) cc j(6,1) dd
diag(1 2 4) ee diag(1 2,3 4) QUIT
  • Index operator creates a row vector of
    consecutive integers or character strings
  • Do function creates series with any increment
  • Identity matrix I(size)
  • Constant matrix J(nrow, ncol, value)
  • Diagonal matrices DIAG(VECTOR)

50
Combining matrices
  • PROC IML
  • RESET PRINT
  • b1 2 3 4
  • btt(b)
  • c1 2 3 4,2 4 6 8, 3 9 7 5 ,9 8 7 6
  • i1c//bi2cbt
  • QUIT
  • Transpose operation using t()
  • "//" and " concatenates matrices vertically and
    horizontally respectively

51
Matrix Algebra - Basics
52
Matrix Algebra Matrix functions
53
Matrix Algebra Matrix Reduction Operators
54
Read SAS data sets
  • PROC IML
  • RESET PRINT
  • USE MFE.LOAN
  • READ ALL INTO m
  • WHERE(state'CA')
  • QUIT
  • READ ALL VARid term rate INTO m
  • USE specify which SAS data set is to be used
  • READ ALL INTO converts SAS data set into a
    matrix.
  • READ ALL only read numerical variables
  • Use VARterm rate option to select variables
  • Use WHERE to subset data.

55
Create SAS data sets
  • PROC IML
  • RESET PRINT
  • seedJ(3,3,9)
  • r ranuni(seed)
  • CREATE data FROM r
  • APPEND FROM r
  • QUIT
  • J function build 3 by 3 matrix with seed9 for
    each element
  • CREATE FROM opens the new SAS data set
  • APPEND FROM writes to the SAS data set
  • Output dataset has default column names COL1
    COL2 You can rename by option COLNAME R
Write a Comment
User Comments (0)
About PowerShow.com