Getting Started Using SAS Statistical Analysis System - PowerPoint PPT Presentation

1 / 89
About This Presentation
Title:

Getting Started Using SAS Statistical Analysis System

Description:

INPUT day DATE10. ; 1jan1961 = 366. 1 jan 1961 = 366. Date Informats. MMDDYYw. ... 85873 PM R white and red flashing lights. 85879 AM R little green men boomboxes ... – PowerPoint PPT presentation

Number of Views:731
Avg rating:3.0/5.0
Slides: 90
Provided by: Anu81
Category:

less

Transcript and Presenter's Notes

Title: Getting Started Using SAS Statistical Analysis System


1
Getting Started Using SAS(Statistical Analysis
System)
  • -Anurag Prasad

2
Getting Started Using SASBasics
  • Session 1

3
SAS Family
  • Base SAS
  • SAS/STAT
  • SAS/GRAPH
  • SAS/ACCESS
  • SAS/OR
  • SAS/ETS

4
Connecting to SAS

5
Connecting to SAS (Windows)
  • Start Exceed
  • Start-Menu gt Run gt telnet falaq (and login)
  • Type on the prompt
  • export DISPLAYltIP addressgt 0.0
  • sas
  • ( SAS windows open)
  • e.g. export DISPLAY172.28.36.2280.0
  • sas

6
User Interface

7
SAS Windows
  • Program Editor
  • Log
  • Output
  • Results
  • Explorer
  • Toolbox

8
Program Editor
  • Is like a text editor window
  • Edit text files, data files, output files
  • Edit and submit (run) SAS programs

9
Log
  • Contains
  • program statements we submitted
  • messages from SAS about the execution of the
    program

10
Output
  • Contains the results of SAS procedures
  • Each observation in a separate row

11
Results
  • Helps in managing output of the program

12
Explorer
  • Shows the directories and files managed by the
    SAS system
  • The library of temporary data sets

13
Toolbox
  • Can be used as a command prompt
  • to select SAS windows
  • to perform basic tasks e.g. copy, save, undo

14
Programming Basics

15
Layout of SAS programs
  • Capitalization
  • Indentation
  • Comments
  • Every SAS statement ends with a semicolon.

16
SAS Data Sets
  • Rows are Observations
  • Columns are Variables
  • Two data types numeric character

17
Two parts of a SAS program
  • DATA Step
  • Creates a data set
  • Reads and modifies a data set
  • PROC Step
  • Performs analysis and functions
  • Produces reports

18
DATA Steps built-in L oop
  • DATA Steps execute line by line and observation
    by observation

19
A Sample Program
20
SAS Log
  • Contains the program statements plus messages
    from SAS about the program
  • After DATA Step, it tells how many variables and
    observations are there
  • After PROC Step, it tells which procedure took
    how much time

21
Reading SAS Log
22
Reading Output
23
Getting data into SAS

24
Reading Internal Data
  • CARDS statement
  • is used to indicate internal data
  • must be the last statement of the DATA step

25
Example of Internal Data Input
  • DATA uspres
  • INPUT pres party number
  • CARDS
  • Adams F 2
  • Lincoln R 16
  • PROC PRINT
  • RUN

26
Reading External Data
  • INFILE statement
  • is used to read an external data file
  • Syntax INFILE lt file name gt

27
Example of External Data Input
  • DATA usprez
  • INFILE '/users/math/phd/anuragpr/sasfiles/pres.
    txt
  • INPUT pres party number
  • PROC PRINT
  • RUN

28
Different Types of Input
  • List Input
  • Column Input
  • Formatted Input

29
List Input
  • Used when
  • The values in the data file are separated by at
    least one blank
  • Periods (.) indicate missing data
  • No embedded space in the character data
  • No values are greater than 8 characters

30
Example data for List Input
  • Lucky 2.3 1.9 . 3.0
  • Spot 4.6 2.5 3.1 .5
  • Tubs 7.1 . . 3.8
  • Hop 4.5 3.2 1.9 1.8
  • Noisy 3.8 1.3 1.8
  • 1.5
  • Winner 5.7 . . .

31
Program for List Input
  • DATA toads
  • INFILE toadjump.txt'
  • INPUT toadname weight jump1 jump2 jump3
  • PROC PRINT
  • RUN

32
Column Input
  • Used when
  • The data file does not have spaces between all
    values
  • There are embedded spaces in the character values
  • Missing values are blank
  • Variables are always found at the same place in
    the data line

33
Example data for Column Input
  • Columbia Peaches 35 67 1 10
  • Plains Peanuts 210 2 5
  • Gilroy Garlics 151035 12 11
  • Sacramento Apples 124 85 15 4

34
Program for Column Input
  • DATA sales
  • INFILE onions.txt'
  • INPUT vteam 1-20 csales 21-24 bsales 25-28
    ourhits 29-31 vhits 32-34
  • PROC PRINT
  • TITLE 'SAS Data Set Sales'
  • RUN

35
End of Session 1

36
Getting Started Using SASAdvanced Techniques
  • Session 2

37
Working in Program Editor
  • d / dd deletes one or more lines
  • c / cc copies one or more lines
  • m / mm moves one or more lines
  • a after this line
  • b before this line
  • i / i inserts one or more lines
  • r / rr repeats one or more new lines
  • r / rr repeats number of times
  • tc connects two lines of text
  • ts splits line at cursor
  • cols displays horizontal line ruler

38
Formatted Input

39
INFORMATS
  • Three types of informats
  • Character ltinformatgtw.
  • Numeric ltinformatgtw.d
  • Date ltinformatgtw.
  • Note w denotes width and
  • d denotes the number of decimal places

40
Character Informats
  • CHARw.
  • Reads character data (does not trim leading or
    trailing blanks)
  • INPUT animal CHAR10.
  • my cat gt my cat
  • my cat gt my cat

41
Character Informats
  • w.
  • Reads character data (trims leading blanks)
  • INPUT animal 10.
  • my cat gt my cat
  • my cat gt my cat

42
Numeric Informats
  • COMMAw.d
  • Removes embedded commas and , converts left
    parenthesis to minus sign
  • INPUT income COMMA10.
  • 1,000,001 gt 1000001
  • (1234) gt -1234

43
Numeric Informats
  • w.d
  • Reads standard numeric data
  • INPUT value 5.1
  • 1234 gt 123.4
  • -12.3 gt -12.3

44
Date Informats
  • DATEw.
  • Reads dates in form ddmmmyy or ddmmmyyyy
  • INPUT day DATE10.
  • 1jan1961 gt 366
  • 1 jan 1961 gt 366

45
Date Informats
  • MMDDYYw.
  • Reads dates in form mmddyy or mmddyyyy
  • INPUT MMDDYY8.
  • 01-01-61 gt 366
  • 01/01/61 gt 366

46
INPUT styles

47
Reading Multiple Lines per Observation
  • Miami FL
  • 90 75
  • 97 65
  • Nome AK
  • 55 44
  • 76 98
  • INPUT city state
  • / var1 var2
  • 3 var3 var4

48
Reading Multiple Observations per Line
  • Nome AK 34 23 Miami DC 34 65 Raleigh NC
    76 33
  • INPUT city state num1 num2 _at__at_
  • _at__at_ at the end stops SAS from going to new line
    for each observation

49
Reading Part of a Data File
  • a Abc 74
  • a Def 97
  • b Ghi 87
  • a Jkl 79
  • b Mno 78
  • INPUT type _at_
  • IF type b THEN DELETE
  • INPUT name 7-10 marks

50
More about Data Sets

51
Creating Permanent Data Sets
  • LIBNAME statement defines a libref which points
    to a directory
  • DATA lt libref gt.lt member name gt creates a SAS
    data set in that directory whose name is member
    name
  • LIBNAME sasbook mysaslib
  • DATA sasbook.distance
  • miles 23
  • km 1.61 miles
  • RUN

52
Reading Permanent Data Sets
  • Include a LIBNAME statement in program
  • Refer to the data set by its two-level name
  • LIBNAME example mysaslib
  • PROC PRINT DATA example.distance
  • TITLE The data set distance
  • RUN

53
Writing Raw Data Files
  • LIBNAME survey mysaslib
  • DATA _NULL_
  • SET survey.soap(read the data set soap)
  • FILE newfile.txt (create a new file)
  • PUT num1 num2 _at_21 chr1 chr2
  • RUN

54
Working with Data

55
SAS Functions - Numeric
  • INT(arg)
  • LOG(arg)
  • LOG10(arg)
  • MAX(arg,arg,..)
  • MIN(arg,arg,..)
  • MEAN(arg,arg,..)
  • ROUND(arg, round-off-unit)
  • SUM(arg,arg,..)

56
SAS Functions - Character
  • Character
  • LEFT(arg)
  • Length(arg)
  • SUBSTR(arg,position,n)
  • TRIM(arg)
  • - Concatenates two character values
  • UPCASE(arg)

57
SAS Functions - Date
  • DAY(date)
  • MDY(month,day,year)
  • MONTH(date)
  • QTR(date)
  • TODAY()

58
IF-THEN Statement
  • IF condition THEN action
  • A single IF-THEN can have only one action
  • Add DO and END for multiple actions
  • IF condition THEN DO
  • action 1
  • action 2
  • END

59
Arrays
  • ARRAY name () variable-list
  • e.g. ARRAY dir (4) north east south west
  • Numbered range list
  • ARRAY class (100) group1 group100
  • INPUT cat4 cat8
  • avg MEAN( OF num1 num10 )

60
Sorting, Printing Summarizing Data

61
Sorting our Data PROC SORT
  • PROC SORT
  • BY var1 var2
  • PROC SORT DATA old OUT new
  • BY var3 DESCENDING
  • Default sort order is ascending/increasing

62
Printing our Data PROC PRINT
  • BY variable-list
  • Starts a new section in the output for each new
    value of the BY variables.
  • ID variable-list
  • SUM variable-list
  • VAR variable-list
  • Specifies which variables are to be printed and
    the order

63
Summarizing Data PROC MEANS
  • PROC MEANS options
  • N number of non-missing values
  • NMISS number of missing values
  • MEAN the mean
  • STD the standard deviation
  • MIN the minimum value
  • MAX the maximum value
  • RANGE the range
  • SUM the sum
  • VAR the variance
  • SKEWNESS skewness
  • KURTOSIS kurtosis
  • CV the coeficient of variation

64
Examining Data PROC FREQ
  • DATA ufos
  • INFILE ufo.txt PAD LRECL 400
  • INPUT num time zone descrip 12-46
  • PROC FREQ
  • TABLES zone time zone
  • TITLE UFO Reports
  • RUN

65
Data file ufo.txt
  • 85845 AM R white with long tail
  • 85776 PM C bright white light
  • 85873 PM R white and red flashing lights
  • 85879 AM R little green men boomboxes
  • 86790 PM C throbbing purple light
  • 86823 PM R giant toads

66
Visualizing Data PROC PLOT
  • PROC PLOT
  • PLOT vertical horizontal ( A, B, C,...)
  • PLOT height weight
  • PLOT height weight name ( first letter )
  • BY age_group ( separate plot for each level )
  • PLOT csales action C
  • bsales action B / OVERLAY

67
Basic SAS Procedures

68
Examining the Distribution PROC UNIVARIATE
  • N number of observations
  • Mean arithmetic mean
  • Std dev standard deviation
  • Skewness skewness
  • Kurtosis kurtosis
  • Max highest value in the data set
  • Min lowest value in the data set
  • Median median
  • Mode most frequently occuring value

69
PROC UNIVARIATE
  • Option PLOT produces stem-leaf, box and normal
    probability plot
  • Option NORMAL produces tests of normality
  • DATA class
  • INFILE scores.txt
  • INPUT score _at__at_
  • PROC UNIVARIATE PLOT NORMAL
  • VAR score
  • TITLE
  • RUN

70
Data file scores.txt
  • 56 78 84 73 90 44 76 87 92 75
  • 85 67 90 84 74 64 73 78 69 56
  • 87 73 100 54 81 78 69 64 73 65

71

72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
Examining Correlation PROC CORR
  • DATA class
  • INFILE study.txt
  • INPUT score study exercise
  • PROC CORR (Option SPEARMAN)
  • VAR study exercise (appear across the top of
    table)
  • WITH score (appear down the side)
  • RUN

76
Data file study.txt
  • 56 6 2 78 7 4 84 5 5 73 4 2 90 6 4
  • 44 2 0 76 5 1 87 6 3 92 6 7 75 8 3
  • 85 7 1 67 4 2 90 5 5 84 6 5 74 5 2
  • 64 4 1 73 8 5 78 5 2 69 6 1 56 4 1
  • 87 8 4 73 8 3 100 5 6 54 8 0 81 5 4
  • 78 5 2 69 4 1 64 7 1 73 7 3 65 4 4

77
(No Transcript)
78
Simple Regression Analysis
  • PROC REG
  • MODEL dependent indpendent
  • PLOT vertical horizontal symbol
  • Options with PLOT (use name followed by a period)
  • P predicted values
  • R residuals
  • STUDENT studentized residual
  • U95 upper bound of a 95 c.i.
    for individual prediction
  • L95 lower bound

79
Simple Regression Analysis
  • DATA hits
  • INFILE baseball.txt
  • INPUT ht dist
  • PROC REG
  • MODEL dist ht
  • PLOT dist ht P. ht p /OVERLAY
  • RUN

80
Data file baseball.txt
  • 50 110 49 135 48 129 53 150 48 124
  • 50 143 51 126 45 107 53 146 50 154
  • 47 136 52 144 47 124 50 133 50 128
  • 50 118 48 135 47 129 45 126 48 118
  • 45 121 53 142 46 122 47 119 51 134
  • 49 130 46 132 51 144 50 132 50 131

81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
Analysis of Variance
  • PROC ANOVA
  • CLASS variable-list
  • MODEL dependent effects
  • MEANS efects / options
  • For one-way anova, the effect is the
    classification variable
  • Options with MEANS
  • BON (Bonferronis t tests)
  • DUNCAN (multiple range)
  • T (pairwise t tests)
  • SCHEFFE (multiple comparision)
  • TUKEY (studentized range)

85
Analysis of Variance
  • DATA soft
  • INFILE softball.txt
  • INPUT team height _at__at_
  • PROC ANOVA
  • CLASS team
  • MODEL height team
  • MEANS team
  • RUN

86
Data file softball.txt
  • red 55 red 48 red 53 red 47 red 51 red 43
  • red 45 red 46 red 55 red 54 red 45 red 52
  • blue 46 blue 56 blue 48 blue 47 blue 54 blue 52
  • blue 49 blue 51 blue 45 blue 48 blue 55 blue 47
  • gray 55 gray 45 gray 47 gray 56 gray 49 gray 53
  • gray 48 gray 53 gray 51 gray 52 gray 48 gray 47
  • pink 53 pink 53 pink 58 pink 56 pink 50 pink 55
  • pink 59 pink 57 pink 49 pink 55 pink 56 pink 57
  • gold 53 gold 55 gold 48 gold 45 gold 47 gold 56
  • gold 55 gold 46 gold 47 gold 53 gold 51 gold 50

87
(No Transcript)
88
(No Transcript)
89
End of Session 2
Write a Comment
User Comments (0)
About PowerShow.com