Title: Use the UPDATE statement to:
1- Use the UPDATE statement to
- update a master dataset with new transactions
(e.g. a bank account updated regularly with
deposits and withdrawals). Not used a lot, but
when you need it, its exactly what you need - the general form is
- DATA master_data_set
- UPDATE master_data_set transaction_data_s
et - BY variable_list
2- Notes on the UPDATE statement
- only two datasets can be specified (master
transactions) - both sets must be SORTed by their common
variables - the values of the BY variables must by unique in
the master set (e.g., only one account per
account number in the master bank datasetcould
be many transactions per account though) - missing values in the transaction dataset dont
overwrite existing values in the master dataset.
3- Go over the example in section 6.8 on page
194-195 - LIBNAME perm 'c\MySASLib'
- DATA perm.patientmaster INFILE fill in here
- INPUT Account LastName 8-16 Address 17-34
- BirthDate MMDDYY10. Sex InsCode 48-50 _at_52
LastUpdate MMDDYY10. - RUN
- / Second Program /
- LIBNAME perm 'c\MySASLib'
- DATA transactions INFILE fill in here
- INPUT Account LastName 8-16 Address 17-34
BirthDate MMDDYY10. - Sex InsCode 48-50 _at_52 LastUpdate
MMDDYY10. - PROC SORT DATA transactions
- BY Account
- Update patient data with transactions
- DATA perm.patientmaster
- UPDATE perm.patientmaster transactions BY
Account - PROC PRINT DATA 'c\MySASLib\patientmaster'
- FORMAT BirthDate LastUpdate MMDDYY10. TITLE
'Admissions Data' RUN
4- There are many SAS dataset OPTIONS. The list in
section 6.9 is not comprehensive, but gives a
flavor of whats possible - RENAME (oldvariable_name newvariable_name)
- this changes a variables name
- FIRSTOBS n
- this tells SAS the observation number on which to
begin reading - OBS n
- this tells SAS the observation number on which to
stop reading - IN new_variable_name
- this tells SAS to create a new variable
(temporarily) to track whether an observation
comes from that dataset (value1) or not
(value0). Lets try the example in section 6.10
5Heres the customer data 101 Murphy's Sports
115 Main St. 102 Sun N Ski 2106
Newberry Ave. 103 Sports Outfitters 19
Cary Way 104 Cramer Johnson 4106
Arlington Blvd. 105 Sports Savers 2708
Broadway Heres the orders data 102 562.01 104
254.98 104 1642.00 101 3497.56 102 385.30
6Heres the SAS code to find the customers who
didnt place any orders DATA customer INFILE
fill-in TRUNCOVER INPUT CustomerNumber Name
5-21 Address 23-42 DATA orders INFILE why
no TRUNCOVER? INPUT CustomerNumber
Total PROC SORT DATA orders BY
CustomerNumber Combine the data sets using the
IN option DATA noorders MERGE customer
orders (IN Recent) BY CustomerNumber IF
Recent 0 PROC PRINT DATA noorders TITLE
'Customers with No Orders in the Third
Quarter' RUN
7- Now modify the code so you can see the effect of
the IN statement - take out the subsetting IF statement
- create a new variable whose values are those of
the variable RECENT (why do I have to do this?) - PRINT the entire dataset including this new one
made from RECENT to see its effect. - We may use the OUTPUT statement to create more
than one dataset e.g., DATA X Y Z INPUT - This will create 3 identical datasets (named
WORK.X, WORK.Y, and WORK.Z.). The next example
uses IF THEN statements to create different
datasets with the OUTPUT statement.
8/ Heres the zoo data with feeding time as the
last column. Create two datasets using the OUTPUT
statement, one for each of the feeding times
morning and evening - be sure to put the animals
in both datasets if they are fed at both times
/ bears Mammalia E2 both elephants Mammalia
W3 am flamingos Aves W1 pm frogs Amphibia
S2 pm kangaroos Mammalia N4 am lions Mammalia
W6 pm snakes Reptilia S1 pm tigers Mammalia
W9 both zebras Mammalia W2 am
9- DATA morning afternoon
- INFILE fill-in here
- INPUT Animal 1-9 Class 11-18 Enclosure
FeedTime - IF FeedTime 'am' THEN OUTPUT morning
- ELSE IF FeedTime 'pm' THEN OUTPUT
afternoon - ELSE IF FeedTime 'both' THEN OUTPUT
- PROC PRINT DATA morning
- TITLE 'Animals with Morning Feedings'
- PROC PRINT DATA afternoon
- TITLE 'Animals with Afternoon Feedings'
- RUN
- We may also use OUTPUT statements to generate our
own data and to create datasets from raw data
formatted in unusual ways (see section 6.12 and
below)
10dm log 'clear' dm output 'clear' options
ls80 DATA generate DO x1 to 10 yx2
zsqrt(x) OUTPUT END PROC PRINT DATAgenerate
run quit / Put this into a raw datafile / Jan
Varsity 56723 Downtown 69831 Super-6 70025 Feb
Varsity 62137 Downtown 43901 Super-6 81534 Mar
Varsity 49982 Downtown 55783 Super-6 69800 now
read it in properly DATA theaters INFILE
fill-in INPUT Month Location Tickets _at_
OUTPUT INPUT Location Tickets _at_
OUTPUT INPUT Location Tickets
OUTPUT PROC PRINT DATA theaters TITLE
'Ticket Sales' RUN
11/ We may also convert observations to variables
and vice versa / PROC TRANSPOSE DATAold
OUTnew BY var_list ID variable VAR
var_list / go over the example on p.194 -
heres the data team name, player , type of
data, value of the salary or b.a. / Garlics 10
salary 43000 Peaches 8 salary 38000 Garlics 21
salary 51000 Peaches 10 salary 47500 Garlics 10
batavg .281 Peaches 8 batavg .252 Garlics 21
batavg .265 Peaches 10 batavg .301
12/ Heres the SAS code / DATA baseball INFILE
fill-in here INPUT Team Player Type
Entry PROC SORT DATA baseball BY Team
Player PROC PRINT DATA baseball TITLE
'Baseball Data After Sorting and Before
Transposing' Transpose data so salary batavg
are vars PROC TRANSPOSE DATA baseball OUT
flipped BY Team Player ID Type VAR
Entry PROC PRINT DATA flipped TITLE
'Baseball Data After Transposing' RUN
13BY variables are included in the new dataset, not
transposed. There will be one obs. for each BY
level per variable transposed. ID variables
values become the names of the variables in the
newly transposed dataset. The ID variables
values must be unique within the BY-values. VAR
statement names the variables whose values are
going to be transposed. SAS creates a new
variable (_NAME_) whose value(s) is the name of
the VAR variable(s). SEE THE PREVIOUS EXAMPLE
AND THE GRAPHIC ON THE TOP OF P.194
14There are several variables that SAS creates
automatically when you create a new dataset, but
because they are temporary, you never see them.
A short list is given on page 196 _N_ the
number of times SAS has looped through the DATA
step _ERROR_ 0 or 1 depending upon whether
there is a data error for that particular
observation. FIRST.variable and LAST.variable are
created when you use a BY statement in the DATA
step. FIRST.variable has the value 1 when SAS is
processing the first occurrence of a new value of
the BY variable and 0 otherwise. The
LAST.variable is similar - it has the value 1
when SAS is processing the last occurrence of a
value of the BY variable and 0 otherwise. See the
example program on pages 196-197
15Heres the data (entry , age group, finishing
time). We want to create a new variable whose
value is the overall place that the person
finished. Note that the value of place can be
determined from the _N_ variable if the new
dataset is being created from a dataset sorted by
finishing time. The second part of the program
uses the FIRST.agegroup automatic variable to
pick the top finisher in each age category. 54
youth 35.5 21 adult 21.6 6 adult 25.8 13 senior
29.0 38 senior 40.3 19 youth 39.6 3 adult 19.0 25
youth 47.3 11 adult 21.9 8 senior 54.3 41 adult
43.0 32 youth 38.6
16DATA walkers INFILE fill in here INPUT Entry
AgeGroup Time _at__at_ /note gt1 obs per
line/ PROC SORT DATA walkers BY Time
Create a new variable, Place DATA ordered SET
walkers Place _N_ PROC PRINT DATA
ordered TITLE 'Results of Walk' PROC SORT DATA
ordered BY AgeGroup Time Keep first
observation in each age group DATA winners SET
ordered BY AgeGroup IF FIRST.AgeGroup
1 PROC PRINT DATA winners TITLE 'Winners
in Each Age Group' RUN