Use the SET statement to:

About This Presentation
Title:

Use the SET statement to:

Description:

dm log 'clear'; dm output 'clear'; options ls=80; DATA sales; ... deviation of plant height for the 'pizza hut' and 'shell island' marshes. ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 15
Provided by: darganfr
Learn more at: http://people.uncw.edu

less

Transcript and Presenter's Notes

Title: Use the SET statement to:


1
  • Use the SET statement to
  • create an exact copy of a SAS dataset
  • modify an existing SAS dataset by creating new
    variables, subsetting (using a subsetting IF)
  • concatenate datasets i.e., stack them one on top
    of the other. Most useful when the datasets
    contain the same variables but different
    observations
  • interleave datasets i.e., keep sorted datasets
    sorted use a BY statement with the SET
    statement
  • the general form is
  • DATA new_data_set
  • SET old_data_set
  • other statements as needed

2
  • make permanent SAS dataset from the data here
  • time of train, of cars, total of riders
  • then create new dataset with new variable
    average number of riders per car see page 171
  • 1010 6 21
  • 1215 10 56
  • 1530 10 25
  • 1130 8 34
  • 1315 8 12
  • 1045 6 13
  • 2030 6 32
  • 2315 6 12

3
  • CONCATENATION OF TWO DATASETS - NOTE WHAT
    HAPPENS WHEN THEY HAVE DIFFERENT VARIABLES
  • DATA southentrance
  • INPUT Entrance PassNumber PartySize Age _at__at_
  • DATALINES
  • S 43 3 27 S 44 3 24 S 45 3 2
  • PROC PRINT DATA southentrance TITLE 'South
    Entrance Data'
  • DATA northentr ance
  • INPUT Entrance PassNumber PartySize Age Lot _at__at_
  • DATALINES
  • N 21 5 41 1 N 87 4 33 3 N 65 2 67 1 N 66 2 7 1
  • PROC PRINT DATA northentrance TITLE 'North
    Entrance Data'
  • DATA both
  • SET southentrance northentrance
  • IF Age . THEN AmountPaid .
  • ELSE IF Age
  • ELSE IF Age
  • ELSE AmountPaid 12

4
  • INTERLEAVE datasets to keep an ordering in
    them
  • The datasets to be INTERLEAVED must be SORTED
    by the interleaving variable
  • DATA S SET southentrance
  • PROC SORT DATAS BY PassNumber run
  • PROC PRINT DATA S TITLE 'South Entrance Data'
  • DATA N SET northentrance
  • PROC SORT DATA N BY PassNumber
  • PROC PRINT DATA N TITLE 'North Entrance
    Data'
  • now put them together in order of passnumber
  • DATA interleave
  • SET N S BY PassNumber
  • PROC PRINT DATA interleave
  • TITLE 'Both Entrances, By Pass Number' RUN
    QUIT
  • Now concatenate the two therapy datasets creating
    a new total number of patients variable
  • first, without sorting then interleave them, BY
    month.

5
One-to-one match merge
  • Use the MERGE statement with the BY statement to
    combine two datasets with a common so-called
    matching variable (the BY-variable) to uniquely
    identify each observation in the datasets...
  • DATA new MERGE old1 old2
  • BY matching_vars
  • NOTE If the two datasets have variables with the
    same names (besides the BY variables) the
    variables from the second dataset will overwrite
    the values of the variables in the first dataset
    with the same names. Also note that all
    observations from both datasets are included in
    the new dataset whether they had a match or not...

6
Try to merge the demographic information about
the patients with the dataset on the visits they
made to the doctors office (ID, age, sex, date
of birth) A001 31 M 05/22/75 A002 43 M
06/15/63 A003 34 F 08/17/72 A004 . F
03/27/69 A005 54 F 02/24/52 A007 49 M
11/01/57 Next is the visits data (ID, visit,
sysBP, diasBP, Weight, date of visit)
7
A001 1 140 85 195 01/05/01 A001 2 138 90
198 10/13/01 A001 3 145 95 200 07/14/02 A002
1 121 75 168 04/14/01 A003 1 118 68 125
08/12/01 A003 2 112 65 123 11/15/01 A004 1
143 86 204 03/30/01 A005 1 132 76 174
02/27/01 A005 2 132 78 175 08/02/01 A005 3
134 78 176 12/04/01 A008 1 126 80 182
05/22/01 Whats the merging variable? Do you
need to sort by that variable before merging?
Can you merge them in descending order?? Try the
next one too - see section 6.4 on p. 176-177.
8
dm log 'clear' dm output 'clear' options
ls80 DATA sales INPUT CodeNum 1-4 PiecesSold
6-7 DATALINES C865 15 K086 9 A536 21 S163
34 K014 1 A206 12 B713 29 DATA
descriptions INPUT CodeNum 1-4 Name 6-14
Description 15-60 DATALINES A206 Mokka
Coffee buttercream in dark chocolate A536 Walnoot
Walnut halves in bed of dark chocolate B713
Frambozen Raspberry marzipan covered in milk
chocolate C865 Vanille Vanilla-flavored rolled
in ground hazelnuts K014 Kroon Milk chocolate
with a mint cream center K086 Koning Hazelnut
paste in dark chocolate M315 Pyramide White with
dark chocolate trimming S163 Orbais Chocolate
cream in dark chocolate
9
  • PROC SORT DATA sales
  • BY CodeNum
  • PROC SORT DATA descriptions
  • BY CodeNum
  • Merge data sets by CodeNum
  • DATA chocolates
  • MERGE sales descriptions BY CodeNum
  • PROC PRINT DATA chocolates
  • TITLE "Today's Chocolate Sales" RUN quit
  • NOTE the SORTing of the two datasets by the
    merging variable... all observations from both
    datasets are included in the merged dataset,
    whether they had a match or not.

10
One-to-many match merge
  • Same MERGE statement is used as in one-to-one ...
    but the result will be different if you have many
    observations in one dataset to be matched with a
    single observation in the other...
  • A good example of this is merging summary
    statistics with original data from which the
    statistics were computed. Go over sections
    6.5-6.6 to see how this is done... Also note the
    output (p. 181) from the PROC PRINT when both a
    BY and an ID statement are used together.
  • Make the two shoes programs work (in section
    6.5 and 6.6).... use the code below and get the
    data from the book datasets and put them into
    files that the INFILE statements can read

11
dm log 'clear' dm output 'clear' options
ls80 DATA shoes INFILE '' INPUT Style
1-15 ExerciseType sales PROC SORT DATA
shoes BY ExerciseType DATA discount
INFILE '' INPUT ExerciseType Adjustment
Perform many-to-one match merge DATA prices
MERGE shoes discount BY ExerciseType NewPriceRO
UND(RegularPrice-(RegularPriceAdjustment),.01) P
ROC PRINT DATA prices TITLE 'Price List for
May' RUN quit
12
  • Now consider an important merging application
    putting summary statistics back in the same
    dataset with the data used to compute the
    summaries. Note this is a many-to-one merge,
    since there are only a few statistics computed
    (one for each by-group value) and they are to be
    merged back with the many individual values used
    to compute those statistics
  • As an example, compute the sum of the sales
    variable for each of the three classes of shoes
    in the previous example, and then merge that
    total back into the original dataset. After this
    you can compute the percentage of the total that
    each group represents, for example (get the
    sales data from page 180)

13
  • PROC MEANS can be used to compute a grand total
    for a variable, but then it can't be merged back
    into the original dataset since you don't have a
    common BY variable in the two datasets Thus we
    must use a different technique (see section 6.7)
    it turns out that the SET statement with an IF
    - THEN will do the job
  • DATA new if _n_1 then SET summary_data_set
  • SET original_data_set
  • The original dataset has many obs., while the
    summary dataset has only one. SAS reads this one
    obs. with the SET statement, but only for the
    first obs (i.e., when _n_ 1) this works
    because SET automatically RETAINS the
    observations from the first read - go over the
    example given in section 6.7 Combining a Grand
    Total with the Original Data

14
  • For Monday In the padgett data, get the mean
    and standard deviation of plant height for the
    pizza hut" and shell island" marshes. Merge
    those statistics back into the original dataset
    and then calculate a z-score for each plant in
    the dataset. Recall that Z (X - mean)/(s.d.) ,
    where X is the original height. There are two
    means and two standard deviations, one for each
    of the two marshes pizza hut" and shell island
    and so you want to be sure that the correct mean
    and s.d. gets used with plants from the
    respective marshes
Write a Comment
User Comments (0)