Introduction to SAS LISA Short Course Series - PowerPoint PPT Presentation

1 / 87
About This Presentation
Title:

Introduction to SAS LISA Short Course Series

Description:

Creating SAS Data Sets SAS ... Creating SAS Data Sets Formatted Input Appropriate for reading: Data in fixed columns Standard and nonstandard character and ... – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 88
Provided by: MatthewW83
Category:

less

Transcript and Presenter's Notes

Title: Introduction to SAS LISA Short Course Series


1
Introduction to SASLISA Short Course Series
  • Mark Seiss, Dept. of Statistics

2
Reference Material
  • The Little SAS Book Delwiche and Slaughter
  • SAS Programming I Essentials
  • SAS Programming II Manipulating Data with the
    DATA Step
  • Presentation and Data
  • http//www.lisa.stat.vt.edu/?qnode/167

3
Presentation Outline
1. Introduction to the SAS Environment 2. Workin
g With SAS Data Sets 3. Summary
Procedures 4. Basic Statistical Analysis
Procedures
4
Presentation Outline
  • Questions/Comments

5
Introduction to the SAS Environment
  • 1. SAS Programs
  • 2. SAS Data Sets and Data Libraries
  • 2. Creating SAS Data Sets

6
SAS Programs
  • File extension - .sas
  • Editor window has four uses
  • Access and edit existing SAS programs
  • Write new SAS programs
  • Submitting SAS programs for execution
  • Saving SAS programs
  • SAS program sequence of steps that the user
    submits for execution
  • Submitting SAS programs
  • Entire program
  • Selection of the program

7
SAS Programs
  • Syntax Rules for SAS statements
  • Free-format can use upper or lower case
  • Usually begin with an identifying keyword
  • Can span multiple lines
  • Always end with a semicolon
  • Multiple statements can be on the same line
  • Errors
  • Misspelled key words
  • Missing or invalid punctuation (missing
    semi-colon common)
  • Invalid options
  • Indicated in the Log window

8
SAS Programs
  • 2 Basic steps in SAS programs
  • Data Steps
  • Typically used to create SAS datasets and
    manipulate data,
  • Begins with DATA statement
  • Proc Steps
  • Typically used to process SAS data sets
  • Begins with PROC statement
  • The end of the data or proc steps are indicated
    by
  • RUN statement most steps
  • QUIT statement some steps
  • Beginning of another step (DATA or PROC
    statement)

9
SAS Programs
  • Output generated from SAS program 2 Windows
  • SAS log
  • Information about the processing of the SAS
    program
  • Includes any warnings or error messages
  • Accumulated in the order the data and procedure
    steps are submitted
  • SAS output
  • Reports generated by the SAS procedures
  • Accumulates output in the order it is generated

10
SAS Data Sets and Data Libraries
  • SAS Data Set
  • Specifically structured file that contains data
    values.
  • File extension - .sas7bdat
  • Rows and Columns format similar to Excel
  • Columns variables in the table corresponding to
    fields of data
  • Rows single record or observation
  • Two types of variables
  • Character contain any value (letters, numbers,
    symbols, etc.)
  • Numeric floating point numbers
  • Located in SAS Data Libraries

11
SAS Data Sets and Data Libraries
  • SAS Data Libraries
  • Contain SAS data sets
  • Identified by assigning a library reference name
    libref
  • Temporary
  • Work library
  • SAS data files are deleted when session ends
  • Library reference name not necessary
  • Permanent
  • SAS data sets are saved after session ends
  • SASUSER library
  • You can create and access your own libraries

12
SAS Data Sets and Data Libraries
  • SAS Data Libraries cont.
  • Assigning library references
  • Syntax
  • LIBNAME libref SAS-data-library
  • Rules for Library References
  • 8 characters or less
  • Must begin with letter or underscore
  • Other characters are letters, numbers, or under
    scores

13
SAS Data Sets and Data Libraries
  • SAS Data Libraries cont.
  • Identifying SAS data sets within SAS Data
    Libraries
  • libref.filename
  • Accessing SAS data sets within SAS Data Libraries
  • Example DATA new_data_set
  • set libref.filename
  • run
  • Creating SAS data sets within SAS Data Libraries
  • Example DATA libref.filename
  • set old_data_set
  • run

14
Creating SAS Data Sets
  • Creating a SAS data sets from raw data
  • 4 methods
  • 1. Importing existing raw data in SAS program
  • 2. Manually entering raw data in SAS program
  • 3. Importing existing data sets using Import
    menu option
  • 4. Manually entering raw data using Table Editor

15
Creating SAS Data Sets
  • Importing existing raw data in SAS program
  • 1. Start Data step and name the SAS data set to
    be created (include SAS Data library to be
    stored in)
  • DATA libref.SAS-data-set
  • 2. Identify the file that contains the raw data
    file (.dat file)
  • INFILE raw-data-filename
  • 3. Provide instruction on how to read data from
    raw data file
  • INPUT input-specifications

16
Creating SAS Data Sets
  • Input Specifications
  • Specifies the names of the SAS variables in the
    new data set
  • Specifies whether the SAS variables are character
    or numeric
  • Identifies the locations of the variables in the
    raw data file
  • List Input
  • Column Input
  • Formatted Input
  • Mixed Input

17
Creating SAS Data Sets
  • List Input
  • Used when raw data is separated by spaces
  • All data in a row must be read in
  • All missing data must be indicated by period
  • Simple character data no embedded spaces, no
    lengths greater than 8
  • INPUT statement
  • Simply list variables after the INPUT keyword in
    the order they appear on file.
  • If variables are character format, place a
    after the variable name
  • Example) INPUT Name City Age Height Weight
    Sex

18
Creating SAS Data Sets
  • Column Input
  • Used when raw data file does not have delimiters
    between values (large data sets)
  • Each variables values are found in the same
    columns in each row
  • Numeric data must be standard numbers,
    decimals, signs, and scientific notation only
  • Advantages
  • No spaces required
  • Missing values left blank
  • Character data can have embedded spaces
  • Ability to skip unwanted variables

19
Creating SAS Data Sets
  • Column Input cont.
  • INPUT Statement
  • Numeric variables list variable name then list
    column or range of columns where the variable is
    found on the raw data file
  • Character variables list variable name, dollar
    sign, and then column or range of columns
  • Example) INPUT Name 1-10 Age 26-28 Sex 35

20
Creating SAS Data Sets
  • Formatted Input
  • Appropriate for reading
  • Data in fixed columns
  • Standard and nonstandard character and numeric
    data
  • Calendar values to be converted to SAS date value
  • Read data in using SAS informats
  • Instruction that SAS uses to read in data values
  • General forms
  • Character - informatw.
  • Numeric informatw.d
  • Date informatw.

21
Creating SAS Data Sets
  • Formatted Input cont.
  • Character Informats
  • w. character string with a width of w, trims
    leading blanks
  • charw. character string with a width of w,
    does not trim leading or trailing blanks
  • Numeric Informats
  • w.d standard numeric data with width w and d
    numbers after the decimal
  • Raw Data Value 1234567 ? informat 8.2 ? SAS
    Data Value 12345.67
  • COMMAw.d numeric data with embedded commas
  • Raw Data Value 1,000,001 ? informatCOMMA10.
  • ?SAS Data Value1000001

22
Creating SAS Data Sets
  • Formatted Input cont.
  • SAS date values
  • Stored as special numeric number data
  • Number of days between January 1, 1960 and the
    specified data
  • Informats are used to read and convert the dates

Raw Data Value Informat
11/04/2009 MMDDYY10.
11/04/09 MMDDYY8.
04NOV2009 Date9.
04/11/2009 DDMMYY10.
23
Creating SAS Data Sets
  • Formatted Input cont.
  • Columns read are determined by the starting point
    and width of the informat
  • Example
  • INPUT Name 10. Age 3. Height 5.1 BirthDate
    MMDDYY10.
  • - Name Character of length 10, columns 1-10
  • - Age Numeric with length 3, columns 11-13
  • - Height Numeric with length 5 (including
    decimal) and one decimal place (120.9 for
    instance), columns 14-18
  • - Birthdate Date format MMDDYY (11-04-2009
    for instance), columns 19 - 28

24
Creating SAS Data Sets
  • Formatted Input cont.
  • Pointer controls
  • n moves pointer n positions
  • _at_n moves pointer to column n
  • Example
  • INPUT Flight 3. 4 Date mmddyy8. _at_20 Destination
    3.
  • Flight - Number of length 3, columns 1 through 3
  • Date Date format mmddyy (11/04/09) of length 8,
    columns 8 through 15
  • Destination Character of length 3, columns 20
    through 22

25
Creating SAS Data Sets
  • Mixed Formatted Input Styles
  • Mix and match the previous 3 input styles
  • Example
  • Raw Data Great Smoky Mountains NC/TN 1926
    520,269
  • INPUT ParkName 1-22 State Year _at_40 Acreage
    COMMA9.
  • - Parkname - Character of length 22, columns 1
    through 22
  • - State - Character, separated by spaces
  • - Year - Numeric, separated by spaces
  • - Acreage - Numeric with informat COMMA9.,
    starts column 40

26
Creating SAS Data Sets
  • Manually Entering Raw Data Files in SAS program
  • 1. Start Data step and name the SAS data set to
    be created
  • DATA library.SAS-data-set
  • 2. Provide instructions on how to read data from
    raw data file
  • INPUT input-specifications
  • 3. Manually enter raw data
  • DATALINES
  • ltRaw Datagt

27
Creating SAS Data Sets
  • Manually Entering Raw Data Files in SAS program
  • Example
  • Data uspresidents
  • INPUT President Party Number
  • DATALINES
  • Adams F 2
  • Lincoln R 16
  • Grant R 18
  • Kennedy D 35
  • Run

28
Creating SAS Data Sets
  • Using the import data menu option
  • 1. File ? Import Data
  • 2. Standard data source ? select the file format
  • 3. Specify file location or Browse to select
    file
  • 4. Create name for the new SAS data set and
    specify location

29
Creating SAS Data Sets
  • Compatible file formats
  • Microsoft Excel Spreadsheets
  • Microsoft Access Databases
  • Comma Separate Files (.csv)
  • Tab Delimited Files (.txt)
  • dBASE Files (.dbf)
  • JMP data sets
  • SPSS Files
  • Lotus Spreadsheets
  • Stata Files
  • Paradox Files

30
Creating SAS Data Sets
  • Enter raw data directly into a SAS data set
  • 1. Tools ? Table Editor
  • 2. Enter data manually into table
  • - Observations in each row
  • - Variables in each column
  • 3. Left Click Column ? Column Attributes
  • - Variable Name, Variable Label, Type
    Character/Numeric,
  • Format, Informat
  • Note Informats determine how raw data is
    read. Formats determine how variable is
    displayed.
  • 4. Close window ? Save Changes Yes
  • ? Specify File name and directory

31
Introduction to the SAS Environment
  • Questions/Comments

32
Working With SAS Data Sets
  • 1. Data Set Manipulation
  • 2. Data Set Processing
  • 3. Combining Data Sets
  • A. Concatenating/Appending
  • B. Merging

33
Data Set Manipulation
  • Create a new SAS data set using an existing SAS
    data set as input
  • Specify name of the new SAS data set after the
    DATA statement
  • Use SET statement to identify SAS data set being
    read
  • Syntax
  • DATA output_data_set
  • SET input_data_set
  • ltadditional SAS statementsgt
  • RUN
  • By default the SET statement reads all
    observations and variables from the input data
    set into the output data set.

34
Data Set Manipulation
  • Assignment Statements
  • Evaluate an expression
  • Assign resulting value to a variable
  • General Form variable expression
  • Example miles_per_hour distance/time
  • SAS Functions
  • Perform arithmetic functions, compute simple
    statistics, manipulate dates, etc.
  • General Form variablefunction_name(argument1,
    argument2,)
  • Example Time_worked sum(Day1,Day2, Day3,
    Day4, Day5)

35
Data Set Manipulation
  • Selecting Variables
  • Use DROP and KEEP to determine which variables
    are written to new SAS data set.
  • 2 Ways
  • DROP and KEEP as statements
  • Form DROP Variable1 Variable2
  • KEEP Variable3 Variable4 Variable5
  • DROP and KEEP options in SET statement
  • Form SET input_data_set (KEEPVar1)

36
Data Set Manipulation
  • Conditional Processing
  • Uses IF-THEN-ELSE logic
  • General Form IF ltexpression1gt THEN ltstatementgt
  • ELSE IF ltexpression2gt THEN ltstatementgt
  • ELSE ltstatementgt
  • ltexpressiongt is a true/false statement, such as
  • Day1Day2, Day1 gt Day2, Day1 lt Day2
  • Day1Day210
  • Sum(day1,day2)10
  • Day15 and Day25

37
Data Set Manipulation
  • Conditional Processing

Symbolic Mnemonic Example
EQ IF regionSpain
or NE IF region ne Spain
gt GT IF rainfall gt 20
lt LT IF rainfall lt 20
gt GE IF rainfall ge 20
lt LE IF rainfall lt 20
AND IF rainfall ge 20 temp lt 90
or ! OR IF rainfall ge 20 OR temp lt 90
IS NOT MISSING IF region IS NOT MISSING
BETWEEN AND IF region BETWEEN Plain AND Spain
CONTAINS IF region CONTAINS ain
IN IF region IN (Rain, Spain, Plain)
38
Data Set Manipulation
  • Conditional Processing cont.
  • If ltexpression1gt is true, ltstatementgt is
    processed
  • ELSE IF and ELSE are only processed if
    ltexpression1gt is false
  • Only one statement specified using this form
  • Use DO and END statements to execute group of
    statements
  • General Form IF ltexpressiongt THEN DO
  • ltstatementsgt
  • END
  • ELSE DO
  • ltstatementsgt
  • END

39
Data Set Manipulation
  • Subsetting Rows (Observations)
  • We will look at two ways
  • Using IF statement
  • Using WHERE option in SET statement
  • IF statement
  • Only writes observations to the new data set in
    which an expression is true
  • General Form IF ltexpressiongt
  • Example IF career Teacher
  • IF sex ne M
  • In the second example, only observations where
    sex is not equal to M will be written to the
    output data set

40
Data Set Manipulation
  • Subsetting Rows (Observations) cont.
  • Where Option in SET statement
  • Use option to only read rows from the input data
    set in which the expression is true
  • General Form SET input_data_set
    (where(ltexpressiongt))
  • Example SET vacation (where(destinationBermuda
    ))
  • Only observations where the destination equals
    Bermuda will be read from the input data set
  • Comparison
  • Resulting output data set is equivalent
  • IF statement all rows read from the input data
    set
  • Where option only rows where expression is true
    are read from input data set
  • Difference in processing time when working with
    big data sets

41
Data Set Manipulation
  • PROC SORT sorts data according to specified
    variables
  • General Form PROC SORT DATAinput_data_set
    ltoptionsgt
  • BY Variable1 Variable2
  • RUN
  • Sorts data according to Variable1 and then
    Variable2
  • By default, SAS sorts data in ascending order
  • Number low to high
  • A to Z
  • Use DESCENDING statement for numbers high to low
    and letters Z to A
  • BY City DESCENDING Population
  • SAS sorts data first by city A to Z and then
    Population high to low

42
Data Set Manipulation
  • Some Options
  • NODUPKEY
  • Eliminates observations that have the same values
    for the BY variables
  • OUToutput_data_set
  • By default, PROC SORT replaces the input data set
    with the sorted data set
  • Using this option, PROC SORT creates a newly
    sorted data set and the input data set remains
    unchanged

43
Data Set Processing
  • Data Set Processing
  • DATA steps read in data from existing data sets
    or raw data files one row at a time, like a loop
  • DATA step reads data from the input data set in
    the following way
  • 1. Read in current row from input data set to
    Program Data Vector (PDV)
  • 2. Process SAS statements
  • 3. PDV to output data set
  • 4. Set current row to the next row in the input
    data set
  • 5. Iterate to Step 1
  • One row at a time is processed
  • Thus we cannot simply add the value of a variable
    in one row to the value in another row

44
Data Set Processing
  • Data Set Processing Example
  • Let the following be the input data set dfwlax

Flight Date Dest FirstClass Economy
439 14955 LAX 20 137
921 14955 DFW 15 131
114 14956 LAX 15 85
982 14956 DFW 5 196
439 14957 LAX 14 116
982 14957 DFW 20 166
45
Data Set Processing
  • Data Set Processing Example
  • Consider the following submitted code
  • DATA onboard
  • SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • RUN

46
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • Current ? SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 . .
Flight Date Dest FirstClass Economy Total FirstClassFull
47
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • SET dfwlax
  • Current ? TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 .
Flight Date Dest FirstClass Economy Total FirstClassFull
48
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • SET dfwlax
  • TotalFirstClassEconomy
  • Current? IF FirstClass20 then
    FirstClassFull1
  • ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
Flight Date Dest FirstClass Economy Total FirstClassFull
49
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • Current? RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
50
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • Current? DATA onboard
  • SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 . .
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
51
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • Current? SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 . .
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
52
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • SET dfwlax
  • Current? TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 146 .
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
53
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • Current? ELSE FirstClassFull0
  • RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 146 0
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
54
Data Set Processing
  • Data Set Processing Example
  • Execution of the Data Step
  • DATA onboard
  • SET dfwlax
  • TotalFirstClassEconomy
  • IF FirstClass20 then FirstClassFull1
  • ELSE FirstClassFull0
  • Current? RUN
  • PDV
  • Onboard

Flight Date Dest FirstClass Economy Total FirstClassFull
921 14955 DFW 15 131 146 0
Flight Date Dest FirstClass Economy Total FirstClassFull
439 14955 LAX 20 137 157 1
921 14955 DFW 15 131 146 0
55
Combining Data Sets
  • Concatenating (or Appending)
  • Stacks each data set upon the other
  • If one data set does not have a variable that the
    other datasets do, the variable in the new data
    set is set to missing for the observations from
    that data set.
  • General Form DATA output_data_set
  • SET data1 data2
  • run
  • PROC APPEND may also be used

56
Combining Data Sets
  • Merging Data Sets
  • One-to-One Match Merge
  • A single record in a data set corresponds to a
    single record in all other data sets
  • Example Patient and Billing Information
  • One-to-Many Match Merge
  • Matching one observation from one data set to
    multiple observations in other data sets
  • Example County and State Information
  • Note Data must be sorted before merging can be
    done
  • (PROC SORT)

57
Combining Data Sets
  • One-to-One Match Merge
  • Usually need at least one common variable between
    data sets matching purposes
  • For the example, a patient ID would be needed
  • Do not need common variable if all data sets are
    in exactly the same order
  • General Form DATA output_data_set
  • MERGE input_data_set1 input_data_set2
  • By variable1 variable2
  • RUN

58
Combining Data Sets
  • One-to-One Match Merge
  • Example
  • Performance Goals
  • Code
  • DATA compare
  • MERGE performance goals
  • BY month
  • differencesales-goal
  • RUN

Month Goal
1 9000
2 6000
3 5000
Month Sales
1 8223
2 6034
3 4220
59
Combining Data Sets
  • One-to-One Match Merge
  • Example cont.
  • Compare

Month Sales Goal Difference
1 8223 9000 -777
2 6034 6000 34
3 4220 5000 -780
60
Combining Data Sets
  • One-to-Many Match Merge
  • Requires at least one common variable in the data
    sets for matching purposes
  • For the example, State information is in both the
    state and county files
  • If two data sets have variables with the same
    name, the variables in the second data set will
    overwrite the variable in the first.
  • General Form DATA output_data_set
  • MERGE Data1 Data2 Data3
  • BY Variable1 Variable2
  • RUN

61
Combining Data Sets
  • One-to-Many Match Merge
  • Example
  • Videos Adjustment
  • Code
  • DATA prices
  • MERGE videos adjustment
  • BY category
  • NewPrice(1-adjustment)sales
  • RUN

Category Sales
Aerobics 12.99
Aerobics 13.99
Aerobics 13.99
Step 12.99
Step 12.99
Weights 15.99
Category Adjustment
Aerobics .20
Step .30
Weights .25
62
Combining Data Sets
  • One-to-One Many Merge
  • Example cont.
  • Videos

Category Sales Adjustment NewPrice
Aerobics 12.99 .20 10.39
Aerobics 13.99 .20 11.19
Aerobics 13.99 .20 11.19
Step 12.99 .30 9.09
Step 12.99 .30 9.09
Weights 15.99 .25 11.99
63
Working With SAS Data Sets
  • Questions/Comments

64
Summary Procedures
  • Print Procedure
  • Plot Procedure
  • Univariate Procedure
  • Means Procedure
  • Freq Procedure

65
Print Procedure
  • PROC PRINT is used to print data to the output
    window
  • By default, prints all observations and variables
    in the SAS data set
  • General Form PROC PRINT DATAinput_data_set
    ltoptionsgt
  • ltoptional SAS statementsgt
  • RUN
  • Some Options
  • input_data_set (obsn) - Specifies the number of
    observations to be printed in the output
  • NOOBS - Suppresses printing observation
    number
  • LABEL - Prints the labels instead of
    variable names

66
Print Procedure
  • Optional SAS statements
  • BY variable1 variable2 variable3
  • Starts a new section of output for every new
    value of the BY variables
  • ID variable1 variable2 variable3
  • Prints ID variables on the left hand side of the
    page and suppresses the printing of the
    observation numbers
  • SUM variable1 variable2 variable3
  • Prints sum of listed variables at the bottom of
    the output
  • VAR variable1 variable2 variable3
  • Prints only listed variables in the output

67
Plot Procedure
  • Used to create basic scatter plots of the data
  • Use PROC GPLOT or PROC SGPLOT for more
    sophisticated plots
  • General Form PROC PLOT DATAinput_data_set
  • PLOT vertical_variable horizontal_variable
    /ltoptionsgt
  • RUN
  • By default, SAS uses letters to mark points on
    plots
  • A for a single observation, B for two
    observations at the same point, etc.
  • To specify a different character to represent a
    point
  • PLOT vertical_variable horizontal variable

68
Plot Procedure
  • To specify a third variable to use to mark points
  • PLOT vertical_variable horizontal_variable
    third_variable
  • To plot more than one variable on the vertical
    axis
  • PLOT vertical_variable1 horizontal_variable2
    vertical_variable2
    horizontal_variable1/OVERLAY

69
Univariate Procedure
  • PROC UNIVARIATE is used to examine the
    distribution of data
  • Produces summary statistics for a single variable
  • Includes mean, median, mode, standard deviation,
    skewness, kurtosis, quantiles, etc.
  • General Form PROC UNIVARIATE DATAinput_data_set
    ltoptionsgt
  • VAR variable1 variable2 variable3
  • RUN
  • If the variable statement is not used, summary
    statistics will be produced for all numeric
    variables in the input data set.

70
Univariate Procedure
  • Options include
  • PLOT produces Stem-and-leaf plot, Box plot, and
    Normal probability plot
  • NORMAL produces tests of Normality

71
Means Procedure
  • Similar to the Univariate procedure
  • General Form PROC MEANS DATAinput_data_set
    options
  • ltOptional SAS statementsgt
  • RUN
  • With no options or optional SAS statements, the
    Means procedure will print out the number of
    non-missing values, mean, standard deviation,
    minimum, and maximum for all numeric variables in
    the input data set

72
Means Procedure
  • Options
  • Statistics Available
  • Note The default alpha level for confidence
    limits is 95. Use ALPHA option to specify
    different alpha level.

CLM Two-Sided Confidence Limits RANGE Range
CSS Corrected Sum of Squares SKEWNESS Skewness
CV Coefficient of Variation STDDEV Standard Deviation
KURTOSIS Kurtosis STDERR Standard Error of Mean
LCLM Lower Confidence Limit SUM Sum
MAX Maximum Value SUMWGT Sum of Weight Variables
MEAN Mean UCLM Upper Confidence Limit
MIN Minimum Value USS Uncorrected Sum of Squares
N Number Non-missing Values VAR Variance
NMISS Number Missing Values PROBT Probability for Students t
MEDIAN (or P50) Median T Students t
Q1 (P25) 25 Quantile Q3 (P75) 75 Quantile
P1 1 Quantile P5 5 Quantile
P10 10 Quantile P90 90 Quantile
P95 95 Quantile P99 99 Quantile
73
Means Procedure
  • Optional SAS Statements
  • VAR Variable1 Variable2
  • Specifies which numeric variables statistics will
    be produced for
  • BY Variable1 Variable2
  • Calculates statistics for each combination of the
    BY variables
  • Output outoutput_data_set
  • Creates data set with the default statistics

74
FREQ Procedure
  • PROC FREQ is used to generate frequency tables
  • Most common usage is create table showing the
    distribution of categorical variables
  • General Form PROC FREQ DATAinput_data_set
  • TABLE variable1variable2variable3/ltoptionsgt
  • RUN
  • Options
  • LIST prints cross tabulations in list format
    rather than grid
  • MISSING specifies that missing values should be
    included in the tabulations
  • OUToutput_data_set creates a data set
    containing frequencies, list format
  • NOPRINT suppress printing in the output window
  • Use BY statement to get percentages within each
    category of a variable

75
Summary Procedures
  • Questions/Comments

76
Statistical Analysis Procedures
  • Correlation PROC CORR
  • Regression PROC REG
  • Analysis of Variance PROC ANOVA
  • Chi-square Test of Association PROC FREQ
  • General Linear Models PROC GENMOD

77
CORR Procedure
  • PROC CORR is used to calculate the correlations
    between variables
  • Correlation coefficient measures the linear
    relationship between two variables
  • Values Range from -1 to 1
  • Negative correlation - as one variable increases
    the other decreases
  • Positive correlation as one variable increases
    the other increases
  • 0 no linear relationship between the two
    variables
  • 1 perfect positive linear relationship
  • -1 perfect negative linear relationship
  • General Form PROC CORR DATAinput_data_set
    ltoptionsgt
  • VAR Variable1 Variable2
  • With Variable3
  • RUN

78
CORR Procedure
  • If the VAR and WITH statements are not used,
    correlation is computed for all pairs of numeric
    variables
  • Options include
  • SPEARMAN computes Spearmans rank correlations
  • KENDALL computes Kendalls Tau coefficients
  • HOEFFDING computes Hoeffdings D statistic

79
REG Procedure
  • PROC REG is used to fit linear regression models
    by least squares estimation
  • One of many SAS procedures that can perform
    regression analysis
  • Only continuous independent variables (Use GENMOD
    for categorical variables)
  • General Form
  • PROC REG DATAinput_data_set ltoptionsgt
  • MODEL dependentindependent1
    independent2/ltoptionsgt
  • ltoptional statementsgt
  • RUN
  • PROC REG statement options include
  • PCOMITm - performs principle component
    estimation with m principle components
  • CORR displays correlation matrix for
    independent variables in the model

80
REG Procedure
  • MODEL statement options include
  • SELECTION
  • Specifies a model selection procedure be
    conducted FORWARD, BACKWARD, and STEPWISE
  • ADJRSQ - Computes the Adjusted R-Square
  • MSE Computes the Mean Square Error
  • COLLIN performs collinearity analysis
  • CLB computes confidence limits for parameter
    estimates
  • ALPHA
  • Sets significance value for confidence and
    prediction intervals and tests

81
REG Procedure
  • Optional statements include
  • PLOT DependentIndependent1 generates plot of
    data

82
ANOVA Procedure
  • PROC ANOVA performs analysis of variance
  • Designed for balanced data (PROC GLM used for
    unbalance data)
  • Can handle nested and crossed effects and
    repeated measures
  • General Form PROC ANOVA DATAinput_data_set
    ltoptionsgt
  • CLASS independent1 independent2
  • MODEL dependentindependent1 independent2
  • ltoptional statementsgt
  • Run
  • Class statement must come before model statement,
    used to define classification variables

83
ANOVA Procedure
  • Useful PROC ANOVA statement option
    OUTSTAToutput_data_set
  • Generates output data set that contains sums of
    squares, degrees of freedom, statistics, and
    p-values for each effect in the model
  • Useful optional statement MEANS
    independent1/ltcomparison typegt
  • Used to perform multiple comparisons analysis
  • Set ltcomparison typegt to
  • TUKEY Tukeys studentized range test
  • BON Bonferroni t test
  • T pairwise t tests
  • Duncan Duncans multiple-range test
  • Scheffe Scheffes multiple comparison procedure

84
FREQ Procedure
  • PROC FREQ can also be used to perform analysis
    with categorical data
  • General Form PROC FREQ DATAinput_data_set
  • TABLE variable1 variable2/ltoptionsgt
  • RUN
  • TABLE statement options include
  • AGREE Tests and measures of classification
    agreement including McNemars test, Bowkers
    test, Cochrans Q test, and Kappa statistics
  • CHISQ - Chi-square test of homogeneity and
    measures of association
  • MEASURE - Measures of association include
    Pearson and Spearman correlation, gamma,
    Kendalls Tau, Stuarts tau, Somers D, lambda,
    odds ratios, risk ratios, and confidence
    intervals

85
GENMOD Procedure
  • PROC GENMOD is used to estimate linear models in
    which the response is not necessarily normal
  • Logistic and Poisson regression are examples of
    generalized linear models
  • General Form
  • PROC GENMOD DATAinput_data_set
  • CLASS independent1
  • MODEL dependent independent1 independent2/
  • dist ltoptiongt
  • linkltoptiongt
  • run

86
GENMOD Procedure
  • DIST - specifies the distribution of the
    response variable
  • LINK - specifies the link function from the
    linear predictor to the mean of the response
  • Example Logistic Regression
  • DIST binomial
  • LINK logit
  • Example Poisson Regression
  • DIST poisson
  • LINK log

87
Statistical Analysis Procedures
  • Questions/Comments
Write a Comment
User Comments (0)
About PowerShow.com