Title: Getting Started Using SAS Statistical Analysis System
1Getting Started Using SAS(Statistical Analysis
System)
2Getting Started Using SASBasics
3SAS Family
- Base SAS
- SAS/STAT
- SAS/GRAPH
- SAS/ACCESS
- SAS/OR
- SAS/ETS
4Connecting to SAS
5Connecting to SAS (Windows)
- Start Exceed
- Start-Menu gt Run gt telnet falaq (and login)
- Type on the prompt
- export DISPLAYltIP addressgt 0.0
- sas
- ( SAS windows open)
- e.g. export DISPLAY172.28.36.2280.0
- sas
6User Interface
7SAS Windows
- Program Editor
- Log
- Output
- Results
- Explorer
- Toolbox
8Program Editor
- Is like a text editor window
- Edit text files, data files, output files
- Edit and submit (run) SAS programs
9Log
- Contains
- program statements we submitted
- messages from SAS about the execution of the
program
10Output
- Contains the results of SAS procedures
- Each observation in a separate row
11Results
- Helps in managing output of the program
12Explorer
- Shows the directories and files managed by the
SAS system - The library of temporary data sets
13Toolbox
- Can be used as a command prompt
- to select SAS windows
- to perform basic tasks e.g. copy, save, undo
14Programming Basics
15Layout of SAS programs
- Capitalization
- Indentation
- Comments
- Every SAS statement ends with a semicolon.
16SAS Data Sets
- Rows are Observations
- Columns are Variables
- Two data types numeric character
17Two parts of a SAS program
- DATA Step
- Creates a data set
- Reads and modifies a data set
- PROC Step
- Performs analysis and functions
- Produces reports
18DATA Steps built-in L oop
- DATA Steps execute line by line and observation
by observation
19A Sample Program
20SAS Log
- Contains the program statements plus messages
from SAS about the program - After DATA Step, it tells how many variables and
observations are there - After PROC Step, it tells which procedure took
how much time
21Reading SAS Log
22Reading Output
23Getting data into SAS
24Reading Internal Data
- CARDS statement
- is used to indicate internal data
- must be the last statement of the DATA step
25Example of Internal Data Input
- DATA uspres
- INPUT pres party number
- CARDS
- Adams F 2
- Lincoln R 16
-
- PROC PRINT
- RUN
26Reading External Data
- INFILE statement
- is used to read an external data file
- Syntax INFILE lt file name gt
27Example of External Data Input
- DATA usprez
- INFILE '/users/math/phd/anuragpr/sasfiles/pres.
txt - INPUT pres party number
- PROC PRINT
- RUN
28Different Types of Input
- List Input
- Column Input
- Formatted Input
29List Input
- Used when
- The values in the data file are separated by at
least one blank - Periods (.) indicate missing data
- No embedded space in the character data
- No values are greater than 8 characters
30Example data for List Input
- Lucky 2.3 1.9 . 3.0
- Spot 4.6 2.5 3.1 .5
- Tubs 7.1 . . 3.8
- Hop 4.5 3.2 1.9 1.8
- Noisy 3.8 1.3 1.8
- 1.5
- Winner 5.7 . . .
31Program for List Input
- DATA toads
- INFILE toadjump.txt'
- INPUT toadname weight jump1 jump2 jump3
- PROC PRINT
- RUN
32Column Input
- Used when
- The data file does not have spaces between all
values - There are embedded spaces in the character values
- Missing values are blank
- Variables are always found at the same place in
the data line
33Example data for Column Input
- Columbia Peaches 35 67 1 10
- Plains Peanuts 210 2 5
- Gilroy Garlics 151035 12 11
- Sacramento Apples 124 85 15 4
34Program for Column Input
- DATA sales
- INFILE onions.txt'
- INPUT vteam 1-20 csales 21-24 bsales 25-28
ourhits 29-31 vhits 32-34 - PROC PRINT
- TITLE 'SAS Data Set Sales'
- RUN
35End of Session 1
36Getting Started Using SASAdvanced Techniques
37Working in Program Editor
- d / dd deletes one or more lines
- c / cc copies one or more lines
- m / mm moves one or more lines
- a after this line
- b before this line
- i / i inserts one or more lines
- r / rr repeats one or more new lines
- r / rr repeats number of times
- tc connects two lines of text
- ts splits line at cursor
- cols displays horizontal line ruler
38Formatted Input
39INFORMATS
- Three types of informats
- Character ltinformatgtw.
- Numeric ltinformatgtw.d
- Date ltinformatgtw.
- Note w denotes width and
- d denotes the number of decimal places
40Character Informats
- CHARw.
- Reads character data (does not trim leading or
trailing blanks) - INPUT animal CHAR10.
- my cat gt my cat
- my cat gt my cat
41Character Informats
- w.
- Reads character data (trims leading blanks)
- INPUT animal 10.
- my cat gt my cat
- my cat gt my cat
42Numeric Informats
- COMMAw.d
- Removes embedded commas and , converts left
parenthesis to minus sign - INPUT income COMMA10.
- 1,000,001 gt 1000001
- (1234) gt -1234
43Numeric Informats
- w.d
- Reads standard numeric data
- INPUT value 5.1
- 1234 gt 123.4
- -12.3 gt -12.3
44Date Informats
- DATEw.
- Reads dates in form ddmmmyy or ddmmmyyyy
- INPUT day DATE10.
- 1jan1961 gt 366
- 1 jan 1961 gt 366
45Date Informats
- MMDDYYw.
- Reads dates in form mmddyy or mmddyyyy
- INPUT MMDDYY8.
- 01-01-61 gt 366
- 01/01/61 gt 366
46INPUT styles
47Reading Multiple Lines per Observation
- Miami FL
- 90 75
- 97 65
- Nome AK
- 55 44
- 76 98
- INPUT city state
- / var1 var2
- 3 var3 var4
48Reading Multiple Observations per Line
- Nome AK 34 23 Miami DC 34 65 Raleigh NC
76 33 - INPUT city state num1 num2 _at__at_
- _at__at_ at the end stops SAS from going to new line
for each observation
49Reading Part of a Data File
- a Abc 74
- a Def 97
- b Ghi 87
- a Jkl 79
- b Mno 78
- INPUT type _at_
- IF type b THEN DELETE
- INPUT name 7-10 marks
50More about Data Sets
51Creating Permanent Data Sets
- LIBNAME statement defines a libref which points
to a directory - DATA lt libref gt.lt member name gt creates a SAS
data set in that directory whose name is member
name - LIBNAME sasbook mysaslib
- DATA sasbook.distance
- miles 23
- km 1.61 miles
- RUN
52Reading Permanent Data Sets
- Include a LIBNAME statement in program
- Refer to the data set by its two-level name
- LIBNAME example mysaslib
- PROC PRINT DATA example.distance
- TITLE The data set distance
- RUN
53Writing Raw Data Files
- LIBNAME survey mysaslib
- DATA _NULL_
- SET survey.soap(read the data set soap)
- FILE newfile.txt (create a new file)
- PUT num1 num2 _at_21 chr1 chr2
- RUN
54Working with Data
55SAS Functions - Numeric
- INT(arg)
- LOG(arg)
- LOG10(arg)
- MAX(arg,arg,..)
- MIN(arg,arg,..)
- MEAN(arg,arg,..)
- ROUND(arg, round-off-unit)
- SUM(arg,arg,..)
56SAS Functions - Character
- Character
- LEFT(arg)
- Length(arg)
- SUBSTR(arg,position,n)
- TRIM(arg)
- - Concatenates two character values
- UPCASE(arg)
57SAS Functions - Date
- DAY(date)
- MDY(month,day,year)
- MONTH(date)
- QTR(date)
- TODAY()
58IF-THEN Statement
- IF condition THEN action
- A single IF-THEN can have only one action
- Add DO and END for multiple actions
- IF condition THEN DO
- action 1
- action 2
- END
59Arrays
- ARRAY name () variable-list
- e.g. ARRAY dir (4) north east south west
- Numbered range list
- ARRAY class (100) group1 group100
- INPUT cat4 cat8
- avg MEAN( OF num1 num10 )
60Sorting, Printing Summarizing Data
61Sorting our Data PROC SORT
- PROC SORT
- BY var1 var2
-
- PROC SORT DATA old OUT new
- BY var3 DESCENDING
- Default sort order is ascending/increasing
62Printing our Data PROC PRINT
- BY variable-list
- Starts a new section in the output for each new
value of the BY variables. - ID variable-list
- SUM variable-list
- VAR variable-list
- Specifies which variables are to be printed and
the order
63Summarizing Data PROC MEANS
- PROC MEANS options
- N number of non-missing values
- NMISS number of missing values
- MEAN the mean
- STD the standard deviation
- MIN the minimum value
- MAX the maximum value
- RANGE the range
- SUM the sum
- VAR the variance
- SKEWNESS skewness
- KURTOSIS kurtosis
- CV the coeficient of variation
64Examining Data PROC FREQ
- DATA ufos
- INFILE ufo.txt PAD LRECL 400
- INPUT num time zone descrip 12-46
- PROC FREQ
- TABLES zone time zone
- TITLE UFO Reports
- RUN
65Data file ufo.txt
- 85845 AM R white with long tail
- 85776 PM C bright white light
- 85873 PM R white and red flashing lights
- 85879 AM R little green men boomboxes
- 86790 PM C throbbing purple light
- 86823 PM R giant toads
66Visualizing Data PROC PLOT
- PROC PLOT
- PLOT vertical horizontal ( A, B, C,...)
- PLOT height weight
- PLOT height weight name ( first letter )
- BY age_group ( separate plot for each level )
- PLOT csales action C
- bsales action B / OVERLAY
67Basic SAS Procedures
68Examining the Distribution PROC UNIVARIATE
- N number of observations
- Mean arithmetic mean
- Std dev standard deviation
- Skewness skewness
- Kurtosis kurtosis
- Max highest value in the data set
- Min lowest value in the data set
- Median median
- Mode most frequently occuring value
69PROC UNIVARIATE
- Option PLOT produces stem-leaf, box and normal
probability plot - Option NORMAL produces tests of normality
- DATA class
- INFILE scores.txt
- INPUT score _at__at_
- PROC UNIVARIATE PLOT NORMAL
- VAR score
- TITLE
- RUN
70Data file scores.txt
- 56 78 84 73 90 44 76 87 92 75
- 85 67 90 84 74 64 73 78 69 56
- 87 73 100 54 81 78 69 64 73 65
71 72(No Transcript)
73(No Transcript)
74(No Transcript)
75Examining Correlation PROC CORR
- DATA class
- INFILE study.txt
- INPUT score study exercise
- PROC CORR (Option SPEARMAN)
- VAR study exercise (appear across the top of
table) - WITH score (appear down the side)
- RUN
76Data file study.txt
- 56 6 2 78 7 4 84 5 5 73 4 2 90 6 4
- 44 2 0 76 5 1 87 6 3 92 6 7 75 8 3
- 85 7 1 67 4 2 90 5 5 84 6 5 74 5 2
- 64 4 1 73 8 5 78 5 2 69 6 1 56 4 1
- 87 8 4 73 8 3 100 5 6 54 8 0 81 5 4
- 78 5 2 69 4 1 64 7 1 73 7 3 65 4 4
77(No Transcript)
78Simple Regression Analysis
- PROC REG
- MODEL dependent indpendent
- PLOT vertical horizontal symbol
-
- Options with PLOT (use name followed by a period)
- P predicted values
- R residuals
- STUDENT studentized residual
- U95 upper bound of a 95 c.i.
for individual prediction - L95 lower bound
79Simple Regression Analysis
- DATA hits
- INFILE baseball.txt
- INPUT ht dist
- PROC REG
- MODEL dist ht
- PLOT dist ht P. ht p /OVERLAY
- RUN
80Data file baseball.txt
- 50 110 49 135 48 129 53 150 48 124
- 50 143 51 126 45 107 53 146 50 154
- 47 136 52 144 47 124 50 133 50 128
- 50 118 48 135 47 129 45 126 48 118
- 45 121 53 142 46 122 47 119 51 134
- 49 130 46 132 51 144 50 132 50 131
81(No Transcript)
82(No Transcript)
83(No Transcript)
84Analysis of Variance
- PROC ANOVA
- CLASS variable-list
- MODEL dependent effects
- MEANS efects / options
- For one-way anova, the effect is the
classification variable - Options with MEANS
- BON (Bonferronis t tests)
- DUNCAN (multiple range)
- T (pairwise t tests)
- SCHEFFE (multiple comparision)
- TUKEY (studentized range)
85Analysis of Variance
- DATA soft
- INFILE softball.txt
- INPUT team height _at__at_
- PROC ANOVA
- CLASS team
- MODEL height team
- MEANS team
- RUN
86Data file softball.txt
- red 55 red 48 red 53 red 47 red 51 red 43
- red 45 red 46 red 55 red 54 red 45 red 52
- blue 46 blue 56 blue 48 blue 47 blue 54 blue 52
- blue 49 blue 51 blue 45 blue 48 blue 55 blue 47
- gray 55 gray 45 gray 47 gray 56 gray 49 gray 53
- gray 48 gray 53 gray 51 gray 52 gray 48 gray 47
- pink 53 pink 53 pink 58 pink 56 pink 50 pink 55
- pink 59 pink 57 pink 49 pink 55 pink 56 pink 57
- gold 53 gold 55 gold 48 gold 45 gold 47 gold 56
- gold 55 gold 46 gold 47 gold 53 gold 51 gold 50
87(No Transcript)
88(No Transcript)
89End of Session 2