Title: The Information Delivery Process
1The Information Delivery Process
Manage
Organize
Exploit
2Turning Data Into Information
DATA Step
Data
Data
PROC Steps
Data
SAS Data Sets
PROC Steps
Information
Information
3Turning Data Into Information
Process of delivering meaningful information
80 Data-related
- Access
- Scrub
- Transform
- Manage
- Store and retrieve
20 Analysis
4The Raw Data
Partial fixed-column raw data file
1
1
2
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
5Browsing the Data Values
- Listing of Flight Crew
Employees - Obs empid lastname firstname jobcode
salary - 1 0031 GOLDENBERG DESIREE PILOT1
50221.62 2 0040
WILLIAMS ARLENE M. FLTAT1 23666.12
3 0071 PERRY
ROBERT A. FLTAT1 21957.71
4 0082 MCGWIER-WATTS CHRISTINA
PILOT3 96387.39
5 0091 SCOTT HARVEY F. FLTAT2
32278.40 6 0106
THACKER DAVID S. FLTAT1 24161.14
7 0275 GRAHAM
DEBORAH S. FLTAT2 32024.93
8 0286 DREWRY SUSAN
PILOT1 55377.00
9 0309 HORTON THOMAS L. FLTAT1
23705.12 10 0334
DOWN EDWARD PILOT1 56584.87
11 0347 CHERVENY
BRENDA B. FLTAT2 38563.45
12 0355 BELL THOMAS B.
PILOT1 59803.16
13 0366 GLENN MARTHA S. PILOT3
120202.38
14 0385 HOLMAN GREGORY A. PILOT2
93001.09 15 0390
NOE BARBARA E. FLTAT2 37101.32
6Reading a Raw Data File
Raw Data File
SAS Data Set
7Reading Raw Data Files
Raw Data File
SAS Data Set
DATA Step
empid lastname firstname 0031 GOLDENBERG
DESIREE0040 WILLIAMS ARLENE M. 0071 PERRY
ROBERT A. 0082 MCGWIER-WATTS CHRISTINA
data . . . infile . . . input . . .run
8Reading Raw Data Files
- In order to create a SAS data set from a raw data
file, you must - start a DATA step and name the SAS data set
being created (DATA statement) - identify the location of the raw data file to
read (INFILE statement) - describe how to read the data fields from the
raw data file (INPUT statement).
9Creating a SAS Data Set with the DATA Statement
- General form of the DATA statement
- This DATA statement creates a SAS data set called
WORK.EMPDATA - data work.empdata
DATA SAS-data-set(s)
10Pointing to a Raw Data File with the INFILE
Statement
- General form of the INFILE statement
- Examples
- OS/390 infile edc.prog1.employee
- UNIX infile /user/prog1/employee.dat
- WIN infile C\workshop\winsas\
prog1\employee.dat
INFILE filename ltoptionsgt
11Reading Raw Data Using Column Input
- General form of column input
-
- To read raw data values with column input,
- 1. name the SAS variable you want to create
- 2. use a dollar sign, , if the SAS variable is
character - 3. specify the starting column, a dash, and the
ending column of the raw data field.
INPUT variable startcol-endcol
12Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
- 0031GOLDENBERG DESIREE PILOT1 50221.62
- input empid 1-4
- lastname 5-17
13Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
- 0031GOLDENBERG DESIREE PILOT1 50221.62
- input empid 1-4
- lastname 5-17
- firstname 18-30
14Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
- 0031GOLDENBERG DESIREE PILOT1 50221.62
- input empid 1-4
- lastname 5-17
- firstname 18-30
- jobcode 31-36
15Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
- 0031GOLDENBERG DESIREE PILOT1 50221.62
- input empid 1-4
- lastname 5-17
- firstname 18-30
- jobcode 31-36
- salary 37-45
16Reading Raw Data Using Column Input
2
1
1
2
3
3
4
4
1---5----0----5----0----5----0----5----0----5
- 0031GOLDENBERG DESIREE PILOT1 50221.62
- input empid 1-4
- lastname 5-17
- firstname 18-30
- jobcode 31-36
- salary 37-45
17Business Scenario
- International Airlines is preparing to review its
flight crew. The immediate goal is to read the
Excel spreadsheet and create a SAS data set.
18What is the Import Wizard?
- A point-and-click graphical interface that
enables you to create a SAS data set from several
types of external files including - dBASE file (.DBF)
- Excel 97 Spreadsheet (.XLS)
- Microsoft Access Table
- Delimited file (.)
- Comma Separated Values (.CSV)
19The Raw Data
- The aircraft data is stored in a fixed-column
raw data file
Partial data
last maintenance date
aircraft ID
aircraft model
date in service
20Using Formatted Input
Raw Data File
JetCruise LF5200 030003 04/05/1994
03/11/2001 JetCruise LF5200 030005 02/15/1999
07/05/2001 JetCruise LF5200 030008 03/06/1996
04/02/2002
DATA Step
- The raw data file will be read with formatted
input.
data sas-data-set-name infile raw-filename
input pointer-control variable
informat-namerun
SAS Data Set
21What is a SAS Format?
- A format is an instruction that the SAS System
uses to write data values. - SAS formats have the following form
- ltgtformatltwgt.ltdgt
22SAS Formats
- Selected SAS formats
- w.d standard numeric format
- w. standard character format
- COMMAw.d commas in a number 12,234.21
- DOLLARw.d dollar signs and commas in a
- number 12,234.41
23SAS Formats
24Using Formatted Input
- General form of the INPUT statement with
formatted input - Pointer control
- _at_n moves the pointer to column n.
- n moves the pointer n positions.
INPUT pointer-control column informat ...
25Using Formatted Input
- Formatted input can be used to read non-standard
data values by - moving the input pointer to the starting position
of the field - specifying a column name
- specifying an informat.
- An informat specifies the width of the input
field and how to read the data values that are
stored in the field.
26Using Formatted Input
- General form of an informat
- indicates a character format.
- informat-name names the informat.
- w is an optional field width.
- . is the required delimiter.
- d optionally, specifies a decimal for
numeric informats.
informat-namew.d
27Selected Informats
- 7. or 7.0 reads seven columns of numeric
data. - 7.2 reads seven columns of numeric data and
inserts a decimal point in the data value. - 5. reads five columns of character data and
removes leading blanks. - CHAR5. reads five columns of character data
and preserves leading blanks.
28Selected Informats
COMMA7. reads seven columns of numeric data and
removes selected nonnumeric characters, such as
dollar signs and commas. PD4. reads four
columns of packed decimal data. MMDDYY10. reads
dates of the form 01/20/2000.
29Working with Date Values
- The raw data file contains date values. These
date values will be read with the MMDDYY10.
informat - Jetcruise LF5200 030003 04/05/1990 3/11/2001
- Jetcruise LF5200 030005 02/15/1990 7/05/2001
- Jetcruise LF5200 030008 03/06/1990 4/02/2002
1
1
2
3
2
3
4
4
1---5----0----5----0----5----0----5----0----5
30Converting Dates to SAS Date Values
- SAS uses date informats to read and convert dates
to SAS date values. For example, - Stored Value Informat Converted Value
- 10/29/1999 MMDDYY10. 14546
- 29OCT1999 DATE9. 14546
- 29/10/1999 DDMMYY10. 14546
31SAS Formats
- Selected SAS date formats
- MMDDYYw. 101692 (MMDDYY6.)
- 10/16/92 (MMDDYY8.)
- 10/16/1992 (MMDDYY10.)
- DATEw. 16OCT92 (DATE7.)
- 16OCT1992 (DATE9.)
32Locating and Browsing the Raw Data File
- Browse the raw data file and determine the column
layout and type
Partial raw data file
last maintenance date
aircraft ID
aircraft model
date in service
33Starting the DATA Step
- Use the DATA statement to begin the DATA step and
name the SAS data set - data work.aircraft
- other SAS statements
- run
- Use the INFILE statement to identify the input
raw data file - data work.aircraft infile aircraft.dat
other SAS statementsrun
34Writing the INPUT Statement
- Use the INPUT statement and pointer control to
read the record starting with the first column.
Read the value with the 16. informat and assign
it to the variable MODEL. - JetCruise LF5200 030003 04/05/1994 03/11/2001
- data work.aircraft infile aircraft.dat
input _at_1 model 16. other SAS statementsrun
35Writing the INPUT Statement
- Use the INPUT statement and pointer control to
read the record starting with column 18. Read
the value with the 6. informat and assign the
value to AIRCRAFTID. -
- JetCruise LF5200 030003 04/05/1994 03/11/2001
- data work.aircraft infile aircraft.dat
input _at_1 model 16. _at_18 aircraftid 6. other
SAS statementsrun
36Writing the INPUT Statement
- Use the INPUT statement and pointer control to
read the record starting with column 25. Read
the value with the MMDDYY10. informat and assign
the value to INSERVICE. - JetCruise LF5200 030003 04/05/1994 03/11/2001
- data work.aircraft infile aircraft.dat
input _at_1 model 16. _at_18 aircraftid 6.
_at_25 inservice mmddyy10. other SAS
statementsrun -
37Writing the INPUT Statement
- Use the INPUT statement and pointer control to
read the record starting with column 36. Read
the value with the MMDDYY10. informat and assign
the value to LASTMAINT. - JetCruise LF5200 030003 04/05/1994 03/11/2001
- data work.aircraft infile aircraft.dat
input _at_1 model 16. _at_18 aircraftid 6.
_at_25 inservice mmddyy10. _at_36 lastmaint
mmddyy10.run
38SAS Syntax Rules
SAS statements are free-format.
- They can begin and end in any column.
- One or more blanks or special characters can be
used to separate words. - A single statement can span multiple lines.
- Several statements can be on the same line.
Unconventional spacing
data work.mech_pilot infile 'c\coursedata\empli
st.dat' input lastname 1-20 firstname
21-30 jobtitle 36-43 salary 54-59 run proc
means datawork.mech_pilot n mean class
jobtitle var salaryrun
39SAS Syntax Rules
SAS statements are free-format.
- They can begin and end in any column.
- One or more blanks or special characters can be
used to separate words. - A single statement can span multiple lines.
- Several statements can be on the same line.
Unconventional spacing
data work.mech_pilot infile 'c\coursedata\empli
st.dat' input lastname 1-20 firstname
21-30 jobtitle 36-43 salary 54-59 run proc
means datawork.mech_pilot n mean class
jobtitle var salaryrun
40SAS Syntax Rules
- SAS statements
- usually begin with an identifying keyword
- always end with a semicolon.
data work.mech_pilot infile
'c\coursedata\emplist.dat' input lastname
1-20 firstname 21-30 jobtitle 36-43
salary 54-59 run proc print datawork.mech_pilo
t run proc means datawork.mech_pilot n mean
class jobtitle var salary run
41Adding a New Variable
Create a new variable by extracting the
four-digit year values from the SAS date values.
Aircraft Service Records
42Using an Assignment Statement
- An assignment statement evaluates an expression
and assigns the resulting value to a variable. - General syntax of an assignment statement
variableexpression
43Using Operators
- Selected operators for basic arithmetic
calculations in an assignment statement
44Using SAS Functions
- A SAS function is a routine that returns a value
that is determined from specified arguments. - General syntax of a SAS function
function-name(argument1,argument2, . . .)
45Using SAS Functions
- SAS functions
- perform arithmetic operations
- compute statistics (for example, mean)
- manipulate SAS dates and process character values
- perform many other tasks.
46Creating a Vertical Bar Chart
- Use the GCHART procedure and the VBAR statement
to create a vertical bar chart. - proc gchart datawork.aircraft vbar
yrbeg_service title 'Aircraft In Service, by
Year'run
47Reading a Subset of Raw Data
- Use the DATA step that was written earlier. Add
a subsetting IF statement to process only the
subset in which the value of AGE is at least 15. - data work.aircraft infile aircraft.dat
input _at_1 model 16. _at_18 aircraftid 6.
_at_25 inservice mmddyy10. _at_36 lastmaint
mmddyy10. yrbeg_serviceyear(inservice)
ageyear(today())-yrbeg_service if
agegt15run
48What Is a SAS Data Library?
49What Is a SAS Data Library?
- Regardless of which host operating system you
use, you identify SAS data libraries by assigning
each one a libref.
libref
50What Is a SAS Data Library?
- By default, SAS creates two SAS data libraries
- a temporary library called WORK
- a permanent library called SASUSER.
WORK
SASUSER
51SAS Data Libraries
You can think of a SAS data library as a drawer
in a filing cabinet and a SAS data set as one of
the file folders in the drawer.
52SAS Data Libraries
When you invoke SAS, you automatically have
access to a temporary and a permanent SAS data
library.
- SASUSER - permanent library
You can create and access your own permanent
libraries.
53Reading a SAS Data Set
- Input data set Output data set
- SET statement DATA statement
Temporary SAS data set
Temporary SAS data set
Permanent SAS data set
Permanent SAS data set
54Two-level SAS Filenames
Every SAS file has a two-level name.
libref.filename
- The first name (libref) refers to the library.
The data set MECH_PILOT is a SAS file in the WORK
library.
- The second name (filename) refers to the file in
the library.
55Browsing the Data Portion
- The PRINT procedure displays the data portion of
a SAS data set. - By default, PROC PRINT displays
- all observations
- all variables
- OBS column on the left-hand side.
56Browsing the Data Portion
- General form of the PRINT procedure
-
- Example
- proc print datawork.empdatarun
PROC PRINT DATASAS-data-setRUN
57Objectives
- Generate list reports using the PRINT procedure.
- Display selected variables in a list report using
the VAR statement. - Display selected observations in a list report
using the WHERE statement. - Sort the observations in a SAS data set using the
SORT procedure.
58Creating a List Report
PROC Step
proc print datawork.empdata var empid salary
jobcode run
59Formatting Data Values
proc print datawork.empsort format salary
dollar11.2 run
60Creating a Frequency Report
PROC Step
61Creating a Frequency Report
- The FREQ procedure displays frequency counts of
the data values in a SAS data set. - General form of a simple PROC FREQ step
-
- PROC FREQ DATASAS-data-set
- RUN
- Example
- proc freq datawork.empsort
- run
62Creating a One-Way Frequency Report
- Only variables listed on the TABLES statement are
included in the frequency counts. These are
typically variables that have a limited number of
distinct values. - General form of a PROC FREQ step
-
- PROC FREQ DATASAS-data-set
- TABLES SAS-variables
- RUN
63Calculating Job Code Frequencies
- Job Code Frequency Report
- The FREQ Procedure
- Job Code
-
Cumulative Cumulative - Job_Code Frequency Percent
Frequency Percent - ------------------------------------------------
----------------------- - Flight Attendant 199 71.33
199 71.33 - Pilot 80 28.67
279 100.00
64Calculating Salary Frequencies
- Salary Frequency Report
- The FREQ Procedure
- Annual Salary
-
- Cumulative
Cumulative - Salary Frequency Percent Frequency
Percent - ------------------------------------------------
--------------------- - Low to 25,000 41 14.70
41 14.70 - 25,000 to 50,000 172 61.65
213 76.34 - 50,000 and up 66 23.66
279 100.00
65Calculating Job Code/Salary Frequencies
- The FREQ Procedure
- Table of Job_Code by
Salary - Job_Code(Job Code) Salary(Annual
Salary) - Frequency
- Percent
- Row Pct
- Col Pct Low to 25,000
50,000 Total - 25,000 to
50,0and up - 00
- ---------------------------------------
----- - Flight Attendant 41 158
0 199 - 14.70 56.63
0.00 71.33 - 20.60 79.40
0.00 - 100.00 91.86
0.00 - ---------------------------------------
-----
66Creating a Frequency Report
- By default, PROC FREQ
- analyzes every variable in the SAS data set
- displays each distinct data value
- calculates the number of observations in which
each data value appears (and corresponding
percentage) - indicates for each variable how many
observations have missing values.
67Calculating Summary Statistics
- The MEANS procedure displays simple descriptive
statistics for the numeric variables in a SAS
data set. - General form of a simple PROC MEANS step
- PROC MEANS DATASAS-data-set RUN
- Example
- proc means dataia.aircraftcap
- run
68Calculating Summary Statistics
69Calculating Summary Statistics
- By default, PROC MEANS
- analyzes every numeric variable in the SAS
data set - prints the statistics N, MEAN, STD, MIN, and
MAX - excludes missing values before calculating
statistics.
70Selecting Variables
71Grouping Observations
proc means dataia.aircraftcap maxdec2 var
totpasscap class model run
72Calculating Capacity Statistics for Each Type of
Plane
- The SAS System
- The MEANS Procedure
- Analysis Variable totpasscap
- N
- size Obs N Mean Std Dev Minimum
Maximum - ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒ - Large 16 16 230.13 32.39 207.00
290.00 - Medium 9 9 178.56 11.40 165.00
188.00 - Small 39 39 132.64 18.85 97.00
150.00 - ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
ƒƒƒƒ