Title: FINAL SAS PAPER
1SAS Essentials
How SAS Thinks
Neil.Howard_at_amgen.com
2The DATA step is your most powerful programming
tool.So understand and use it well.
Socrates
3Objectives
- understand DATA step
- processes
- internals
- defaults
4- compilation of DATA step source code
- execution of resultant machine code
5- compile and execute phases of
- INPUT (non SAS data)
- SET
6Compile Time Activities
- syntax scan
- source code translation to machine language
- definition of input and output files
7Compile TimeActivities
- input buffer
- LPDV (logical program data vector)
- data set descriptor information
8Creation of LPDV
- Variables added in the order seen by the compiler
- during parsing and interpretation of source
statements
9Compile Time Statements
- location critical
- BY
- WHERE
- ARRAY
- ATTRIB
- FORMAT
- INFORMAT
- LENGTH
- location irrelevant
- DROP
- KEEP
- LABEL
- RENAME
- RETAIN
10Retained Variables
- all SAS special variables
- _N_
- _ERROR_
- all vars in RETAIN statement
- all vars from SET, MERGE, or UPDATE
- accumulator vars in SUM statement(s)
11Variables Not Retained
- Variables from input statement
- user defined variables (other
than SUM statement)
12Type and Length of Variables
- determined at compile time
- by first reference to the compiler
(in the DATA step) - Numerics
- length is 8 during DATA step processing
- length is an output property
13INPUT statement
14Compile Loop and LPDV
- data a
- put _all_ write LPDV to LOG
- input idnum
- diagdate mmddyy8.
- sex
- rx_grp 10.
- time intck (year, diagdate, today() )
- put _all_ write LPDV to LOG
- cards
- 1 09-09-52 F placebo
- 2 11-15-64 M 300 mg.
- 3 04-07-48 F 600 mg.
- run
15input buffer
logical program data vector
idnum diagdate sex rx_grp
time
numeric numeric char
char numeric
8 8
8 10 8
Building descriptor portion of SAS data set
16logical program data vector
DKR keep keep
keep keep
keep drop drop
Drop/keep/rename
17Execution of a DATA Step
18Execution of a DATA Step
_N_ 1
Initialization of LPDV
read input file
Y
next step
end of file?
N
process statements in step
termination
implied output
19DATA Step Execution
- Implied read/write loop, stopped by
- no more data to read
- explicit STOP
- no input data
- some execution time errors
20Execution Time Activities
- execute initialize-to-missing (ITM)
- read from input source
- modify data using user-controlled statements
- supply values of variables to LPDV
- output observation to SAS data set
21Initialization
- _N_ set to loop count
- _ERROR_ set to 0
- user variables set to missing
-
22Execution Loop - raw data
- data a
- put _all_ write LPDV to LOG
- input idnum
- diagdate mmddyy8.
- sex
- rx_grp 10.
- time intck (year, diagdate, today() )
- put _all_ write LPDV to LOG
- cards
- 1 09-09-52 F placebo
- 2 11-15-64 M 300 mg.
- 3 04-07-48 F 600 mg.
- run
- proc contents run
- proc print run
23LPDV
IDNUM DIAGDATE SEX RX_GRP
TIME _N_
- . . . 1
- 1 -2670 F placebo 48 1
- . . . 2
- 2 1780 M 300 mg. 36 2
- . . . 3
- 3 -4286 F 600 mg. 52 3
- . . . 4
-
- (over all executions of DATA step..)
242 data a 3 put _all_ write LPDV
to LOG 4 input idnum 5 diagdate
mmddyy8. 6 sex 7 rx_grp
10. 8 time intck ('year', diagdate,
today() ) 9 put _all_ write LPDV to
LOG 10 cards IDNUM. DIAGDATE. SEX
RX_GRP TIME. _ERROR_0 _N_1 IDNUM1
DIAGDATE-2670 SEXF RX_GRPplacebo TIME49
_ERROR_0 _N_1 IDNUM. DIAGDATE. SEX RX_GRP
TIME. _ERROR_0 _N_2 IDNUM2 DIAGDATE1780
SEXM RX_GRP300 mg. TIME37 _ERROR_0
_N_2 IDNUM. DIAGDATE. SEX RX_GRP TIME.
_ERROR_0 _N_3 IDNUM3 DIAGDATE-4286 SEXF
RX_GRP600 mg. TIME53 _ERROR_0 _N_3 IDNUM.
DIAGDATE. SEX RX_GRP TIME. _ERROR_0
_N_4 NOTE The data set WORK.A has 3
observations and 5 variables. NOTE The DATA
statement used 0.59 seconds. 14 run 15 16
proc contents run NOTE The PROCEDURE CONTENTS
used 0.39 seconds.
25 Data Set Name WORK.A
Observations 3 Member Type DATA
Variables 5 Engine
V612
Indexes 0 Created 1118
Saturday, January 20, 2001 Observation
Length 42 Last Modified 1118 Saturday,
January 20, 2001 Deleted Observations
0 Protection
Compressed NO Data Set Type
Sorted
NO Label -----Engine/Host Dependent
Information----- Data Set Page Size
8192 Number of Data Set Pages 1 File Format
607 First Data Page 1 Max Obs
per Page 194 Obs in First Data Page
3 -----Alphabetic List of Variables and
Attributes----- Variable Type Len
Pos 5 TIME
Num 8 34 2 DIAGDATE Num
8 8 1 IDNUM Num 8 0 4
RX_GRP Char 10 24 3 SEX
Char 8 16
26PROC PRINT
IDNUM DIAGDATE SEX RX_GRP TIME 1
-2670 F placebo 48 2
1780 M 300 mg. 36 3 -4286
F 600 mg. 52
27SET statement
- reading existing SAS data
28DATA Step Compile
- no input buffer
- compiler reads descriptor portion of input SAS
data set to build the LPDV - returns same variables/attributes, including new
variables
29SET
- determine which SAS data set to be read
- identify next observation to be read
- copy variable values to LPDV
30Execution Loop - SAS data
- data sas_a
- put _all_
- set a
- tot_rec 1
- put _all_
- run
31Building LPDV from descriptor portion of old SAS
data set
logical program data vector
idnum diagdate sex rx_grp
time tot_rec
numeric numeric char
char numeric numeric
8 8
8 10 8
8
Building descriptor portion of new SAS data set
32 LPDV
IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC
_N_
- . .
. 0 1 - 1 -2670 F placebo 48 1
1 - 1 -2670 F placebo 48 1
2 - 2 1780 M 300 mg. 36 2
2 - 2 1780 M 300 mg. 36 2
3 - 3 -4286 F 600 mg. 52 3
3 - 3 -4286 F 600 mg. 52 3
4 - (over all executions of DATA step..)
33LOG
- idnum. diagdate. sex rx_grp time.
tot_rec0 _ERROR_0 _N_1 - idnum1 diagdate-2670 sexF rx_grpplacebo
time48 tot_rec1 _ERROR_0 _N_1 - idnum1 diagdate-2670 sexF rx_grpplacebo
time48 tot_rec1 _ERROR_0 _N_2 - idnum2 diagdate1780 sexM rx_grp300 mg.
time36 tot_rec2 _ERROR_0 _N_2 - idnum2 diagdate1780 sexM rx_grp300 mg.
time36 tot_rec2 _ERROR_0 _N_3 - idnum3 diagdate-4286 sexF rx_grp600 mg.
time52 tot_rec3 _ERROR_0 _N_3 - idnum3 diagdate-4286 sexF rx_grp600 mg.
time52 tot_rec3 _ERROR_0 _N_4
34PROC PRINT
- IDNUM DIAGDATE SEX RX_GRP TIME
TOT_REC - 1 -2670 F placebo
48 1 - 2 1780 M 300 mg. 36
2 - 3 -4286 F 600 mg. 52
3
35Logic of a MERGE
36- data left
- input ID X Y
- cards
- 1 88 99
- 2 66 77
- 44 55
data right input ID A B cards 1
A14 B32 3 A53 B11
37proc sort dataleft by ID run proc sort
dataright by ID run data both merge left
(ininleft) right (ininright)
by ID run
38logical program data vector first iteration MATCH
ID X Y A B
INLEFT INRIGHT _N_ _ERROR_
1 88 99 A14 B32 1 1
1 0
39logical program data vector second iteration NO
MATCH
ID X Y A B
INLEFT INRIGHT _N_ _ERROR_
2 66 77 1
0 2 0
40logical program data vector third iteration MATCH
ID X Y A B
INLEFT INRIGHT _N_ _ERROR_
3 44 55 A53 B11 1 1
3 0
41Lets try this again
- data left
- input ID X Y
- cards
- 1 88 99
- 2 66 77
- 44 55
data right input ID A B cards 1
A14 B32 3 A53 B11
42proc sort dataleft by ID run proc sort
dataright by ID run data both merge left
(ininleft) right (ininright)
by ID (one-on-one merge) run
43logical program data vector first iteration 11
MATCH
ID X Y A B
_N_ _ERROR_
1 88 99 A14 B32 1 0
1
OVERWRITTEN value came from data set right
44logical program data vector second iteration 11
MATCH
ID X Y A B
_N_ _ERROR_
2 66 77 A53 B11 2 0
3
OVERWRITTEN value came from data set right
45logical program data vector third iteration 11
NO MATCH
ID X Y A B
_N_ _ERROR_
3 44 55 3
0
MISSING no values from right
46Output SAS data set
ID X Y A B 1 88 99
A14 B32 3 66 77 A53 B11 3
44 55
47DATA Step Conclusions
- Understanding internals and default activities
allows you to - make informed coding decisions
- write flexible and efficient code
- debug and test effectively
- interpret results readily
48Remember
- We have discussed DEFAULTS
- As soon as you add options, statements, features,
etc., the default actions change TEST them! - You can use these same tools to track whats
happening.