FINAL SAS PAPER - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

FINAL SAS PAPER

Description:

idnum diagdate sex rx_grp time. numeric numeric char char numeric. 8 8 8 10 8 ... Execution Loop - raw data. data a ; put _all_ ; *write LPDV to LOG; input idnum ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 49
Provided by: neilhoward
Category:
Tags: final | paper | sas | raw

less

Transcript and Presenter's Notes

Title: FINAL SAS PAPER


1
SAS Essentials
How SAS Thinks
Neil.Howard_at_amgen.com
2
The DATA step is your most powerful programming
tool.So understand and use it well.
Socrates
3
Objectives
  • understand DATA step
  • processes
  • internals
  • defaults

4
  • compilation of DATA step source code
  • execution of resultant machine code

5
  • compile and execute phases of
  • INPUT (non SAS data)
  • SET

6
Compile Time Activities
  • syntax scan
  • source code translation to machine language
  • definition of input and output files

7
Compile TimeActivities
  • input buffer
  • LPDV (logical program data vector)
  • data set descriptor information

8
Creation of LPDV
  • Variables added in the order seen by the compiler
  • during parsing and interpretation of source
    statements

9
Compile Time Statements
  • location critical
  • BY
  • WHERE
  • ARRAY
  • ATTRIB
  • FORMAT
  • INFORMAT
  • LENGTH
  • location irrelevant
  • DROP
  • KEEP
  • LABEL
  • RENAME
  • RETAIN

10
Retained Variables
  • all SAS special variables
  • _N_
  • _ERROR_
  • all vars in RETAIN statement
  • all vars from SET, MERGE, or UPDATE
  • accumulator vars in SUM statement(s)

11
Variables Not Retained
  • Variables from input statement
  • user defined variables (other
    than SUM statement)

12
Type and Length of Variables
  • determined at compile time
  • by first reference to the compiler
    (in the DATA step)
  • Numerics
  • length is 8 during DATA step processing
  • length is an output property

13
INPUT statement
  • reading non-SAS data

14
Compile Loop and LPDV
  • data a
  • put _all_ write LPDV to LOG
  • input idnum
  • diagdate mmddyy8.
  • sex
  • rx_grp 10.
  • time intck (year, diagdate, today() )
  • put _all_ write LPDV to LOG
  • cards
  • 1 09-09-52 F placebo
  • 2 11-15-64 M 300 mg.
  • 3 04-07-48 F 600 mg.
  • run

15
input buffer
logical program data vector
idnum diagdate sex rx_grp
time
numeric numeric char
char numeric
8 8
8 10 8
Building descriptor portion of SAS data set
16
logical program data vector
DKR keep keep
keep keep
keep drop drop
Drop/keep/rename
17
Execution of a DATA Step
18
Execution of a DATA Step
_N_ 1
Initialization of LPDV
read input file
Y
next step
end of file?
N
process statements in step
termination
implied output
19
DATA Step Execution
  • Implied read/write loop, stopped by
  • no more data to read
  • explicit STOP
  • no input data
  • some execution time errors

20
Execution Time Activities
  • execute initialize-to-missing (ITM)
  • read from input source
  • modify data using user-controlled statements
  • supply values of variables to LPDV
  • output observation to SAS data set

21
Initialization
  • _N_ set to loop count
  • _ERROR_ set to 0
  • user variables set to missing

22
Execution Loop - raw data
  • data a
  • put _all_ write LPDV to LOG
  • input idnum
  • diagdate mmddyy8.
  • sex
  • rx_grp 10.
  • time intck (year, diagdate, today() )
  • put _all_ write LPDV to LOG
  • cards
  • 1 09-09-52 F placebo
  • 2 11-15-64 M 300 mg.
  • 3 04-07-48 F 600 mg.
  • run
  • proc contents run
  • proc print run

23
LPDV
IDNUM DIAGDATE SEX RX_GRP
TIME _N_
  • . . . 1
  • 1 -2670 F placebo 48 1
  • . . . 2
  • 2 1780 M 300 mg. 36 2
  • . . . 3
  • 3 -4286 F 600 mg. 52 3
  • . . . 4
  • (over all executions of DATA step..)

24
2 data a 3 put _all_ write LPDV
to LOG 4 input idnum 5 diagdate
mmddyy8. 6 sex 7 rx_grp
10. 8 time intck ('year', diagdate,
today() ) 9 put _all_ write LPDV to
LOG 10 cards IDNUM. DIAGDATE. SEX
RX_GRP TIME. _ERROR_0 _N_1 IDNUM1
DIAGDATE-2670 SEXF RX_GRPplacebo TIME49
_ERROR_0 _N_1 IDNUM. DIAGDATE. SEX RX_GRP
TIME. _ERROR_0 _N_2 IDNUM2 DIAGDATE1780
SEXM RX_GRP300 mg. TIME37 _ERROR_0
_N_2 IDNUM. DIAGDATE. SEX RX_GRP TIME.
_ERROR_0 _N_3 IDNUM3 DIAGDATE-4286 SEXF
RX_GRP600 mg. TIME53 _ERROR_0 _N_3 IDNUM.
DIAGDATE. SEX RX_GRP TIME. _ERROR_0
_N_4 NOTE The data set WORK.A has 3
observations and 5 variables. NOTE The DATA
statement used 0.59 seconds. 14 run 15 16
proc contents run NOTE The PROCEDURE CONTENTS
used 0.39 seconds.
25
Data Set Name WORK.A
Observations 3 Member Type DATA
Variables 5 Engine
V612
Indexes 0 Created 1118
Saturday, January 20, 2001 Observation
Length 42 Last Modified 1118 Saturday,
January 20, 2001 Deleted Observations
0 Protection
Compressed NO Data Set Type
Sorted
NO Label -----Engine/Host Dependent
Information----- Data Set Page Size
8192 Number of Data Set Pages 1 File Format
607 First Data Page 1 Max Obs
per Page 194 Obs in First Data Page
3 -----Alphabetic List of Variables and
Attributes----- Variable Type Len
Pos 5 TIME
Num 8 34 2 DIAGDATE Num
8 8 1 IDNUM Num 8 0 4
RX_GRP Char 10 24 3 SEX
Char 8 16
26
PROC PRINT
IDNUM DIAGDATE SEX RX_GRP TIME 1
-2670 F placebo 48 2
1780 M 300 mg. 36 3 -4286
F 600 mg. 52
27
SET statement
  • reading existing SAS data

28
DATA Step Compile
  • no input buffer
  • compiler reads descriptor portion of input SAS
    data set to build the LPDV
  • returns same variables/attributes, including new
    variables

29
SET
  • determine which SAS data set to be read
  • identify next observation to be read
  • copy variable values to LPDV

30
Execution Loop - SAS data
  • data sas_a
  • put _all_
  • set a
  • tot_rec 1
  • put _all_
  • run

31
Building LPDV from descriptor portion of old SAS
data set
logical program data vector
idnum diagdate sex rx_grp
time tot_rec
numeric numeric char
char numeric numeric
8 8
8 10 8
8
Building descriptor portion of new SAS data set
32
LPDV
IDNUM DIAGDATE SEX RX_GRP TIME TOT_REC
_N_
  • . .
    . 0 1
  • 1 -2670 F placebo 48 1
    1
  • 1 -2670 F placebo 48 1
    2
  • 2 1780 M 300 mg. 36 2
    2
  • 2 1780 M 300 mg. 36 2
    3
  • 3 -4286 F 600 mg. 52 3
    3
  • 3 -4286 F 600 mg. 52 3
    4
  • (over all executions of DATA step..)

33
LOG
  • idnum. diagdate. sex rx_grp time.
    tot_rec0 _ERROR_0 _N_1
  • idnum1 diagdate-2670 sexF rx_grpplacebo
    time48 tot_rec1 _ERROR_0 _N_1
  • idnum1 diagdate-2670 sexF rx_grpplacebo
    time48 tot_rec1 _ERROR_0 _N_2
  • idnum2 diagdate1780 sexM rx_grp300 mg.
    time36 tot_rec2 _ERROR_0 _N_2
  • idnum2 diagdate1780 sexM rx_grp300 mg.
    time36 tot_rec2 _ERROR_0 _N_3
  • idnum3 diagdate-4286 sexF rx_grp600 mg.
    time52 tot_rec3 _ERROR_0 _N_3
  • idnum3 diagdate-4286 sexF rx_grp600 mg.
    time52 tot_rec3 _ERROR_0 _N_4

34
PROC PRINT
  • IDNUM DIAGDATE SEX RX_GRP TIME
    TOT_REC
  • 1 -2670 F placebo
    48 1
  • 2 1780 M 300 mg. 36
    2
  • 3 -4286 F 600 mg. 52
    3

35
Logic of a MERGE
  • compile
  • execute

36
  • data left
  • input ID X Y
  • cards
  • 1 88 99
  • 2 66 77
  • 44 55

data right input ID A B cards 1
A14 B32 3 A53 B11
37
proc sort dataleft by ID run proc sort
dataright by ID run data both merge left
(ininleft) right (ininright)
by ID run
38
logical program data vector first iteration MATCH
ID X Y A B
INLEFT INRIGHT _N_ _ERROR_
1 88 99 A14 B32 1 1
1 0
39
logical program data vector second iteration NO
MATCH
ID X Y A B
INLEFT INRIGHT _N_ _ERROR_
2 66 77 1
0 2 0
40
logical program data vector third iteration MATCH
ID X Y A B
INLEFT INRIGHT _N_ _ERROR_
3 44 55 A53 B11 1 1
3 0
41
Lets try this again
  • data left
  • input ID X Y
  • cards
  • 1 88 99
  • 2 66 77
  • 44 55

data right input ID A B cards 1
A14 B32 3 A53 B11
42
proc sort dataleft by ID run proc sort
dataright by ID run data both merge left
(ininleft) right (ininright)
by ID (one-on-one merge) run
43
logical program data vector first iteration 11
MATCH
ID X Y A B
_N_ _ERROR_
1 88 99 A14 B32 1 0
1
OVERWRITTEN value came from data set right
44
logical program data vector second iteration 11
MATCH
ID X Y A B
_N_ _ERROR_
2 66 77 A53 B11 2 0
3
OVERWRITTEN value came from data set right
45
logical program data vector third iteration 11
NO MATCH
ID X Y A B
_N_ _ERROR_
3 44 55 3
0
MISSING no values from right
46
Output SAS data set
ID X Y A B 1 88 99
A14 B32 3 66 77 A53 B11 3
44 55
47
DATA Step Conclusions
  • Understanding internals and default activities
    allows you to
  • make informed coding decisions
  • write flexible and efficient code
  • debug and test effectively
  • interpret results readily

48
Remember
  • We have discussed DEFAULTS
  • As soon as you add options, statements, features,
    etc., the default actions change TEST them!
  • You can use these same tools to track whats
    happening.
Write a Comment
User Comments (0)
About PowerShow.com