Use the SET statement to:

About This Presentation

Title:

Use the SET statement to:

Description:

dm log 'clear'; dm output 'clear'; options ls=80; DATA sales; ... deviation of plant height for the 'pizza hut' and 'shell island' marshes. ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 15

Provided by: darganfr

Learn more at: http://people.uncw.edu

more less

Transcript and Presenter's Notes

Title: Use the SET statement to:

1

Use the SET statement to
create an exact copy of a SAS dataset
modify an existing SAS dataset by creating new
variables, subsetting (using a subsetting IF)
concatenate datasets i.e., stack them one on top
of the other. Most useful when the datasets
contain the same variables but different
observations
interleave datasets i.e., keep sorted datasets
sorted use a BY statement with the SET
statement
the general form is
DATA new_data_set
SET old_data_set
other statements as needed

make permanent SAS dataset from the data here
time of train, of cars, total of riders
then create new dataset with new variable
average number of riders per car see page 171
1010 6 21
1215 10 56
1530 10 25
1130 8 34
1315 8 12
1045 6 13
2030 6 32
2315 6 12

CONCATENATION OF TWO DATASETS - NOTE WHAT
HAPPENS WHEN THEY HAVE DIFFERENT VARIABLES
DATA southentrance
INPUT Entrance PassNumber PartySize Age _at__at_
DATALINES
S 43 3 27 S 44 3 24 S 45 3 2
PROC PRINT DATA southentrance TITLE 'South
Entrance Data'
DATA northentr ance
INPUT Entrance PassNumber PartySize Age Lot _at__at_
DATALINES
N 21 5 41 1 N 87 4 33 3 N 65 2 67 1 N 66 2 7 1
PROC PRINT DATA northentrance TITLE 'North
Entrance Data'
DATA both
SET southentrance northentrance
IF Age . THEN AmountPaid .
ELSE IF Age
ELSE IF Age
ELSE AmountPaid 12

INTERLEAVE datasets to keep an ordering in
them
The datasets to be INTERLEAVED must be SORTED
by the interleaving variable
DATA S SET southentrance
PROC SORT DATAS BY PassNumber run
PROC PRINT DATA S TITLE 'South Entrance Data'
DATA N SET northentrance
PROC SORT DATA N BY PassNumber
PROC PRINT DATA N TITLE 'North Entrance
Data'
now put them together in order of passnumber
DATA interleave
SET N S BY PassNumber
PROC PRINT DATA interleave
TITLE 'Both Entrances, By Pass Number' RUN
QUIT
Now concatenate the two therapy datasets creating
a new total number of patients variable
first, without sorting then interleave them, BY
month.

5
One-to-one match merge

Use the MERGE statement with the BY statement to
combine two datasets with a common so-called
matching variable (the BY-variable) to uniquely
identify each observation in the datasets...
DATA new MERGE old1 old2
BY matching_vars
NOTE If the two datasets have variables with the
same names (besides the BY variables) the
variables from the second dataset will overwrite
the values of the variables in the first dataset
with the same names. Also note that all
observations from both datasets are included in
the new dataset whether they had a match or not...

6
Try to merge the demographic information about
the patients with the dataset on the visits they
made to the doctors office (ID, age, sex, date
of birth) A001 31 M 05/22/75 A002 43 M
06/15/63 A003 34 F 08/17/72 A004 . F
03/27/69 A005 54 F 02/24/52 A007 49 M
11/01/57 Next is the visits data (ID, visit,
sysBP, diasBP, Weight, date of visit)
7
A001 1 140 85 195 01/05/01 A001 2 138 90
198 10/13/01 A001 3 145 95 200 07/14/02 A002
1 121 75 168 04/14/01 A003 1 118 68 125
08/12/01 A003 2 112 65 123 11/15/01 A004 1
143 86 204 03/30/01 A005 1 132 76 174
02/27/01 A005 2 132 78 175 08/02/01 A005 3
134 78 176 12/04/01 A008 1 126 80 182
05/22/01 Whats the merging variable? Do you
need to sort by that variable before merging?
Can you merge them in descending order?? Try the
next one too - see section 6.4 on p. 176-177.
8
dm log 'clear' dm output 'clear' options
ls80 DATA sales INPUT CodeNum 1-4 PiecesSold
6-7 DATALINES C865 15 K086 9 A536 21 S163
34 K014 1 A206 12 B713 29 DATA
descriptions INPUT CodeNum 1-4 Name 6-14
Description 15-60 DATALINES A206 Mokka
Coffee buttercream in dark chocolate A536 Walnoot
Walnut halves in bed of dark chocolate B713
Frambozen Raspberry marzipan covered in milk
chocolate C865 Vanille Vanilla-flavored rolled
in ground hazelnuts K014 Kroon Milk chocolate
with a mint cream center K086 Koning Hazelnut
paste in dark chocolate M315 Pyramide White with
dark chocolate trimming S163 Orbais Chocolate
cream in dark chocolate
9

PROC SORT DATA sales
BY CodeNum
PROC SORT DATA descriptions
BY CodeNum
Merge data sets by CodeNum
DATA chocolates
MERGE sales descriptions BY CodeNum
PROC PRINT DATA chocolates
TITLE "Today's Chocolate Sales" RUN quit
NOTE the SORTing of the two datasets by the
merging variable... all observations from both
datasets are included in the merged dataset,
whether they had a match or not.

10
One-to-many match merge

Same MERGE statement is used as in one-to-one ...
but the result will be different if you have many
observations in one dataset to be matched with a
single observation in the other...
A good example of this is merging summary
statistics with original data from which the
statistics were computed. Go over sections
6.5-6.6 to see how this is done... Also note the
output (p. 181) from the PROC PRINT when both a
BY and an ID statement are used together.
Make the two shoes programs work (in section
6.5 and 6.6).... use the code below and get the
data from the book datasets and put them into
files that the INFILE statements can read

11
dm log 'clear' dm output 'clear' options
ls80 DATA shoes INFILE '' INPUT Style
1-15 ExerciseType sales PROC SORT DATA
shoes BY ExerciseType DATA discount
INFILE '' INPUT ExerciseType Adjustment
Perform many-to-one match merge DATA prices
MERGE shoes discount BY ExerciseType NewPriceRO
UND(RegularPrice-(RegularPriceAdjustment),.01) P
ROC PRINT DATA prices TITLE 'Price List for
May' RUN quit
12

Now consider an important merging application
putting summary statistics back in the same
dataset with the data used to compute the
summaries. Note this is a many-to-one merge,
since there are only a few statistics computed
(one for each by-group value) and they are to be
merged back with the many individual values used
to compute those statistics
As an example, compute the sum of the sales
variable for each of the three classes of shoes
in the previous example, and then merge that
total back into the original dataset. After this
you can compute the percentage of the total that
each group represents, for example (get the
sales data from page 180)

PROC MEANS can be used to compute a grand total
for a variable, but then it can't be merged back
into the original dataset since you don't have a
common BY variable in the two datasets Thus we
must use a different technique (see section 6.7)
it turns out that the SET statement with an IF
- THEN will do the job
DATA new if _n_1 then SET summary_data_set
SET original_data_set
The original dataset has many obs., while the
summary dataset has only one. SAS reads this one
obs. with the SET statement, but only for the
first obs (i.e., when _n_ 1) this works
because SET automatically RETAINS the
observations from the first read - go over the
example given in section 6.7 Combining a Grand
Total with the Original Data

For Monday In the padgett data, get the mean
and standard deviation of plant height for the
pizza hut" and shell island" marshes. Merge
those statistics back into the original dataset
and then calculate a z-score for each plant in
the dataset. Recall that Z (X - mean)/(s.d.) ,
where X is the original height. There are two
means and two standard deviations, one for each
of the two marshes pizza hut" and shell island
and so you want to be sure that the correct mean
and s.d. gets used with plants from the
respective marshes