Title: Use the SET statement to:
1- Use the SET statement to
- create an exact copy of a SAS dataset
- modify an existing SAS dataset by creating new
variables, subsetting (using a subsetting IF) - concatenate datasets i.e., stack them one on top
of the other. Most useful when the datasets
contain the same variables but different
observations - interleave datasets i.e., keep sorted datasets
sorted use a BY statement with the SET
statement - the general form is
- DATA new_data_set
- SET old_data_set
- other statements as needed
2- make permanent SAS dataset from the data here
- time of train, of cars, total of riders
- then create new dataset with new variable
average number of riders per car see page 171
- 1010 6 21
- 1215 10 56
- 1530 10 25
- 1130 8 34
- 1315 8 12
- 1045 6 13
- 2030 6 32
- 2315 6 12
3- CONCATENATION OF TWO DATASETS - NOTE WHAT
HAPPENS WHEN THEY HAVE DIFFERENT VARIABLES - DATA southentrance
- INPUT Entrance PassNumber PartySize Age _at__at_
- DATALINES
- S 43 3 27 S 44 3 24 S 45 3 2
-
- PROC PRINT DATA southentrance TITLE 'South
Entrance Data' - DATA northentr ance
- INPUT Entrance PassNumber PartySize Age Lot _at__at_
- DATALINES
- N 21 5 41 1 N 87 4 33 3 N 65 2 67 1 N 66 2 7 1
-
- PROC PRINT DATA northentrance TITLE 'North
Entrance Data' - DATA both
- SET southentrance northentrance
- IF Age . THEN AmountPaid .
- ELSE IF Age
- ELSE IF Age
- ELSE AmountPaid 12
4- INTERLEAVE datasets to keep an ordering in
them - The datasets to be INTERLEAVED must be SORTED
by the interleaving variable - DATA S SET southentrance
- PROC SORT DATAS BY PassNumber run
- PROC PRINT DATA S TITLE 'South Entrance Data'
- DATA N SET northentrance
- PROC SORT DATA N BY PassNumber
- PROC PRINT DATA N TITLE 'North Entrance
Data' - now put them together in order of passnumber
- DATA interleave
- SET N S BY PassNumber
- PROC PRINT DATA interleave
- TITLE 'Both Entrances, By Pass Number' RUN
QUIT - Now concatenate the two therapy datasets creating
a new total number of patients variable - first, without sorting then interleave them, BY
month.
5One-to-one match merge
- Use the MERGE statement with the BY statement to
combine two datasets with a common so-called
matching variable (the BY-variable) to uniquely
identify each observation in the datasets... - DATA new MERGE old1 old2
- BY matching_vars
- NOTE If the two datasets have variables with the
same names (besides the BY variables) the
variables from the second dataset will overwrite
the values of the variables in the first dataset
with the same names. Also note that all
observations from both datasets are included in
the new dataset whether they had a match or not...
6Try to merge the demographic information about
the patients with the dataset on the visits they
made to the doctors office (ID, age, sex, date
of birth) A001 31 M 05/22/75 A002 43 M
06/15/63 A003 34 F 08/17/72 A004 . F
03/27/69 A005 54 F 02/24/52 A007 49 M
11/01/57 Next is the visits data (ID, visit,
sysBP, diasBP, Weight, date of visit)
7A001 1 140 85 195 01/05/01 A001 2 138 90
198 10/13/01 A001 3 145 95 200 07/14/02 A002
1 121 75 168 04/14/01 A003 1 118 68 125
08/12/01 A003 2 112 65 123 11/15/01 A004 1
143 86 204 03/30/01 A005 1 132 76 174
02/27/01 A005 2 132 78 175 08/02/01 A005 3
134 78 176 12/04/01 A008 1 126 80 182
05/22/01 Whats the merging variable? Do you
need to sort by that variable before merging?
Can you merge them in descending order?? Try the
next one too - see section 6.4 on p. 176-177.
8dm log 'clear' dm output 'clear' options
ls80 DATA sales INPUT CodeNum 1-4 PiecesSold
6-7 DATALINES C865 15 K086 9 A536 21 S163
34 K014 1 A206 12 B713 29 DATA
descriptions INPUT CodeNum 1-4 Name 6-14
Description 15-60 DATALINES A206 Mokka
Coffee buttercream in dark chocolate A536 Walnoot
Walnut halves in bed of dark chocolate B713
Frambozen Raspberry marzipan covered in milk
chocolate C865 Vanille Vanilla-flavored rolled
in ground hazelnuts K014 Kroon Milk chocolate
with a mint cream center K086 Koning Hazelnut
paste in dark chocolate M315 Pyramide White with
dark chocolate trimming S163 Orbais Chocolate
cream in dark chocolate
9- PROC SORT DATA sales
- BY CodeNum
- PROC SORT DATA descriptions
- BY CodeNum
- Merge data sets by CodeNum
- DATA chocolates
- MERGE sales descriptions BY CodeNum
- PROC PRINT DATA chocolates
- TITLE "Today's Chocolate Sales" RUN quit
- NOTE the SORTing of the two datasets by the
merging variable... all observations from both
datasets are included in the merged dataset,
whether they had a match or not.
10One-to-many match merge
- Same MERGE statement is used as in one-to-one ...
but the result will be different if you have many
observations in one dataset to be matched with a
single observation in the other... - A good example of this is merging summary
statistics with original data from which the
statistics were computed. Go over sections
6.5-6.6 to see how this is done... Also note the
output (p. 181) from the PROC PRINT when both a
BY and an ID statement are used together. - Make the two shoes programs work (in section
6.5 and 6.6).... use the code below and get the
data from the book datasets and put them into
files that the INFILE statements can read
11dm log 'clear' dm output 'clear' options
ls80 DATA shoes INFILE '' INPUT Style
1-15 ExerciseType sales PROC SORT DATA
shoes BY ExerciseType DATA discount
INFILE '' INPUT ExerciseType Adjustment
Perform many-to-one match merge DATA prices
MERGE shoes discount BY ExerciseType NewPriceRO
UND(RegularPrice-(RegularPriceAdjustment),.01) P
ROC PRINT DATA prices TITLE 'Price List for
May' RUN quit
12- Now consider an important merging application
putting summary statistics back in the same
dataset with the data used to compute the
summaries. Note this is a many-to-one merge,
since there are only a few statistics computed
(one for each by-group value) and they are to be
merged back with the many individual values used
to compute those statistics - As an example, compute the sum of the sales
variable for each of the three classes of shoes
in the previous example, and then merge that
total back into the original dataset. After this
you can compute the percentage of the total that
each group represents, for example (get the
sales data from page 180)
13- PROC MEANS can be used to compute a grand total
for a variable, but then it can't be merged back
into the original dataset since you don't have a
common BY variable in the two datasets Thus we
must use a different technique (see section 6.7)
it turns out that the SET statement with an IF
- THEN will do the job - DATA new if _n_1 then SET summary_data_set
- SET original_data_set
- The original dataset has many obs., while the
summary dataset has only one. SAS reads this one
obs. with the SET statement, but only for the
first obs (i.e., when _n_ 1) this works
because SET automatically RETAINS the
observations from the first read - go over the
example given in section 6.7 Combining a Grand
Total with the Original Data
14- For Monday In the padgett data, get the mean
and standard deviation of plant height for the
pizza hut" and shell island" marshes. Merge
those statistics back into the original dataset
and then calculate a z-score for each plant in
the dataset. Recall that Z (X - mean)/(s.d.) ,
where X is the original height. There are two
means and two standard deviations, one for each
of the two marshes pizza hut" and shell island
and so you want to be sure that the correct mean
and s.d. gets used with plants from the
respective marshes