Title: SAS Training
1SAS Training
2Agenda
Introduction to SAS Software Program
Data preparation Tabulation
Test of Difference T-test, and ANOVA Test of Association Correlation Regression Analysis
3INTRODUCTION TO SAS SOFTWARE PROGRAM
4SAS
- From traditional statistical analysis of variance
and predictive modeling to exact methods and
statistical visualization techniques, SAS/STAT
software is designed for both specialized and
enterprise wide analytical needs. SAS/STAT
software provides a complete, comprehensive set
of tools that can meet the data analysis needs of
the entire organization.
5SAS Components
SAS Enterprise Guide
Graphical user interface application for some
common basic data analysis tasks.
SAS 9.2
Command-based application for a wide variety of
data analysis tasks.
6SAS Enterprise Guide
- To open the statistical software package
SAS go to the Start Menu gtgtgt All Programs
gtgtgt SAS gtgtgt SAS Enterprise Guide 4.3
7SAS 9.2
- To open the statistical software package
SAS go to the Start Menu gtgt All Program gtgt SAS
gtgt SAS 9.2 (English)
8What Is SAS Enterprise Guide?
- What Is SAS Enterprise Guide? SAS Enterprise
Guide is an easy-to-use Windows client
application that provides these features
- access to much of the functionality of SAS
- an intuitive, visual, customizable interface
- transparent access to data
- ready-to-use tasks for analysis and reporting
- easy ways to export data and results to other
applications - scripting and automation
- a program editor with syntax completion and
built-in function help
9Explore the Main Windows
1
2
3
10Create a Project for This Tutorial
- If SAS Enterprise Guide is not open, start it
now. In the Welcome window, select New Project. - If SAS Enterprise Guide is already open,
select File gtgt New Project. If you already had a
project open in SAS Enterprise Guide, you might
be prompted to save the project. Select the
appropriate response. - The new project opens with an empty Process Flow
window.
111. The Project Tree
- You can use the Project Tree window to manage the
objects in your project. You can delete, rename,
and reorder the items in the project. You can
also run a process flow or schedule a process
flow to run at a particular time.
122. Workspace and Process Flow Windows
- You can have one or more process flows in your
project. When you create a new project, an empty
Process Flow window opens. As you add data, run
tasks, and generate output, an icon for each
object is added to the process flow. -
- The process flow displays the objects in a
project, any relationships that exist between the
objects, and the order in which the objects will
run when you run the process flow.
133. The Task List
- You can use tasks to do everything from
manipulating data, to running specific analytical
procedures, to creating reports. - Many tasks are also available as wizards, which
contain a limited number of options and can
provide a quick and easy way to use some of the
tasks.
14Add SAS Data to the Project
- You can add SAS data files and other types of
files, including OLAP cubes, information maps,
ODBC-compliant data, and files that are created
by other software packages, such as Microsoft
Word or Microsoft Excel.
15- SAS Enterprise Guide requires all data that it
accesses to be in table format. A table is a
rectangular arrangement of rows (also called
observations) and columns (also called
variables).
Name Gender Age Weight
Jones M 48 128.6
Laverne M 58 158.3
Jaffe F . 115.5
Wilson M 28 170.1
16- a column's type is important because it affects
how the column can be used in a SAS Enterprise
Guide task. A column's type can be either
character or numeric. - Character variables, such as Name and Gender in
the preceding data set, can contain any values.
Missing character values are represented by a
blank. - Numeric variables, such as Age and Weight in the
preceding data set, can contain only numeric
values. Currency, date, and time data is stored
as numeric variables. Missing numeric values are
represented by a period.
Name Gender Age Weight
Jones M 48 128.6
Laverne M 58 158.3
Jaffe F . 115.5
Wilson M 28 170.1
17Local and Remote Data
- When you open data in SAS Enterprise Guide, you
must select whether you want to look for the data
on your local computer, a SAS server, or in a SAS
folder.
18Local and Remote Data (Cont)
- If you click My Computer, you can browse the
directory structure of your computer. You can
open any type of data file that SAS Enterprise
Guide can read. - If you click Servers, you can look for your data
on a server. A server can either be a local
server if SAS software is installed on your own
computer, or it can be a remote server if SAS
software is installed on a different computer.
19Open Data from Server
- Within each server there are icons that you can
select for Libraries and Files. Libraries are
shortcut names for directory locations that SAS
knows about. Some libraries are defined by SAS,
and some are defined by SAS Enterprise Guide.
Libraries contain only SAS data sets. - The Files folder on a server enables you to
access data files in the directory structure on
the computer where the SAS server is running. For
example, if you wanted to open a Microsoft Excel
file on a server that is defined in your
repository, you would use the Files node to
locate and open the file.
20Open Data from SAS Folders
- If you click SAS Folders, you can browse the
list of SAS folders that you can access. SAS
folders are defined in the SAS Metadata Server
and can be used to provide a central location for
your stored processes, information maps, and
projects so that they can be shared with other
SAS applications. SAS folders can also contain
content that is not in the SAS Metadata Server,
such as data files.
21Add SAS Data from Your Local Computer
- Select File gtgt Open gtgt Data. In the Open Data
window, select My Computer. - Open the SAS Enterprise Guide samples directory
and double-click Data. By default, the sample
programs, projects, and data are located in - C\Program Files\SAS\EnterpriseGuide\4.3\Sample.
- By default, all file types are displayed in the
window. Files with the icon are SAS data
sets. Press CTRL and select Orders.sd2 and
Products.sas7bdat, and then click Open.
22Add SAS Data from Your Local Computer (Cont)
- Shortcuts to the Products and Orders
- tables are added to the project, and the data
sets open in data grids. - By default, the tables open in read-only mode. In
this mode, you can browse, resize column widths,
hide and hold columns and rows, and copy columns
and rows to a new table. - You cannot edit the data in the table unless you
change to edit mode. Select Edit gtgt Remove
Protect Data
23View the Properties of a Data Set
- In the project tree, right-click Products and
select Properties from the pop-up menu. The
Properties for Products window opens. You can see
information about general properties such as the
physical location of the data and the date it was
last modified.
24View the Properties of a Data Set (Cont)
- In the selection pane, click Columns. Here you
can view a list of columns in your data and the
column attributes.
25Add Data from a SAS Library
- Select File gtgt Open gtgt Data. In the Open Data
window, select Servers. - Double-click Libraries, and then
double-click SASHELP. As you can see, only SAS
data sets are stored in libraries - Scroll in the window and double-click
the PRDSALE data set. A shortcut to the data is
added to the project and the data opens in the
data grid.
26Save the Project
- Select File gtgt Save Project As.
- The Save window opens and prompts you to choose
whether to save the project on your computer or
on a server. Select My Computer. - In the Save window, select a location for the
project. In the File name box, type your file
name. Project files are saved with the
extension .egp. - Click Save.
27Data preparation Tabulation
28Data Input
- There are two main simple tasks for data input
- Manually Input Data
- Import from an External File
29Manually Input Data
- Create a SAS Library
- Create a SAS Data Set
- Input data
30What is a SAS Data Library?
- A SAS data library is a collection of one or more
SAS files that are recognized by SAS and can be
referenced and stored as a unit. Each file is a
member of the library. SAS data libraries help to
organize your work. For example, if a SAS program
uses more than one SAS file, then you can keep
all the files in the same library. Organizing
files in libraries makes it easier to locate the
files and reference them in a program.
31Telling SAS Where the SAS Data Library Is Located
- directly specify the operating environment's
physical name for the location of the SAS data
library. - assign a SAS libref (library reference), which is
a SAS name that is temporarily associated with
the physical location name of the SAS data
library.
32Using Librefs for Temporary and Permanent
Libraries
- When you start a SAS session, SAS automatically
assigns the libref WORK to a special SAS data
library. Normally, the files in the WORK library
are temporary files. - Files that are stored in any SAS data library
other than the WORK library are usually permanent
files that is, they endure from one SAS session
to the next. Store SAS files in a permanent
library if you plan to use them in multiple SAS
sessions.
33Create a SAS Library
- Tools gtgt Assign Project Library
34Create a SAS Library Step 1
- Specify name and server for the library
35Create a SAS Library Step 2
- Specify the engine for the library
36Create a SAS Library Step 3
- Specify options for the library
37Create a SAS Library Step 4
- Click Test Library, checking its OK to create
this library - Press Finish to create the library
38Create a SAS Library
- Check created library at Server List
- When a libref is assigned to a SAS data library,
you can use the libref throughout the SAS session
to access the SAS files that are stored in that
library or to create new files.
39Create SAS Data Set
40Create SAS Data Set Step 1
- Specify name TEST and location DEMO
41Create SAS Data Set Step 2
- Create columns and specify their properties
Name Gender Age Weight
Jones M 48 128.6
Laverne M 58 158.3
Jaffe F . 115.5
Wilson M 28 170.1
42Input Data
43Import from an External File
- The Import Data wizard enables you to create SAS
data sets from text, HTML, or PC-based database
files (including Microsoft Excel, Microsoft
Access, and other popular formats). When you use
the Import Data wizard, you can specify import
options for each file that you import.
44Import Data
45Import Data (Cont)
- Desktop gtgt SAS Training gtgt Data Advising
Survey.xls
46Import Data (Cont)
47Import Data (Cont)
48Import Data (Cont)
49Import Data (Cont)
50Import Data Result
51Import SPSS file
52Import SPSS file Step 1
- Select an SPSS file to import
53Import SPSS file Step 2
- Specify a name for the imported table
54Import SPSS file Result
55Create Format
- Tasks gtgt Data gtgt Create Format
56Create Format (Cont)
- Set Format Name GENDER
- Select Library - SASUSER
- Select Format Type Character
57Define Formats
- Click New Label and type a name of a label
- Click New Range and select type of values and
type a value according to the specified label - Repeat the steps
- Click Run
58Applying User-Defined Formats
- Open a SAS Data Set
- Unprotect Data Edit gtgt Unprotect Data
59Applying User-Defined Formats (Cont)
- Right-click the column
- Select Properties
60Applying User-Defined Formats (Cont)
- In the left pane, select Formats
- In Categories box, select User Defined
- In Formats box, select the desired Formats
61Applying Formats in Tasks
- Custom formats can be applied in the same places
that formats defined in SAS can be used.
62SAS Tasks
- After you have data in your project, you can
create reports and run analyses on the data. - To do this, you select a SAS task from the Task
List or from the Tasks menu. Some tasks have
wizards to guide you through the decisions that
you need to make. Wizards are available from
menus or from a link next to the related task in
the Task List.
63Using Tasks in SAS Enterprise Guide
- The icon next to each variable represents the
variable's type. Country is a character variable
( ). Year is a numeric variable ( ). Month
is a numeric variable in date-and-time format (
). Actual and Predict are numeric variables in
currency format ( ).
64One-Way Frequencies Task
- We should create One-Way Frequencies (tables and
graphs) to check our data set one last time
before we intensively analyze the data.
65One-Way Frequencies
- Under Data, select Q1-Q19, Gender, Nation, Year,
and Major for Analysis variables.
66One-Way Frequencies
- Under Plots, check Vertical for Bar chart.
67One-Way Frequencies
- Check Frequency Tables and/or Bar charts for any
errors (e.g., typo). Make necessary correction(s).
68Filter and Sort
- Use Tasks gtgt Data gtgt Filter and Sort... or Sort
data... to help you find the error(s).
69Summary Statistics Task
- The Summary Statistics task can be used to
calculate summary statistics based on groups
within the data. You can produce reports, graphs,
and data sets as output.
70Summary Statistics Task
- The Summary Statistics task has both a wizard and
the standard task dialog box that can be used to
set up the results.
71Summary Statistics Task Roles
- Use the wizard to assign variables to roles.
Compute statisticsfor each numericvariable in
the list.
Specify variables whosevalues define subgroups.
72Summary Statistics Statistics and Results
- Choose statistics and results to include,
including a report, graphics, and an output data
set.
73Summary Statistics Advanced View
- Opening the task in Advanced View enables
additional options to further modify the output.
74Summary Tables
- The Summary Tables wizard or task can be used to
generate a tabular summary report.
75Summary Tables Wizard
- The Summary Tables wizard enables you to select
analysis variable(s) and statistics, assign
classification variables to define rows and
columns, and specify totals.
76Summary Tables Wizard
77Test of Difference T-test, ANOVA, and others
78One-Sample t-Test
- Tasks gtgt ANOVA gtgt t Test
79 80- Under Data, choose Q19 as the Analysis variable
task role and Gender as the Group analysis by.
81- Under Analysis, input H0 3.
82T-Test Output
Since p-value is less than 0.05, it can be
concluded that average female students consider
themselves as a well-prepared students for
advising appointment (significantly higher than
3).
Since p-value is less than 0.05, it can be
concluded that average male students also
consider themselves as a well-prepared students
for advising appointment
83Two-Sample t-Test
- Tasks gtgt ANOVA gtgt t Test
84 85- Under Data, choose Q6 as the analysis variable
task role and Gender as the classification
variable.
86- Under Plots, check Summary plot, Confidence
interval plot, and Normal quantile-quantile (Q-Q)
plot.
87T-Test Output
Equaled variance is assumed. Pooled method is
used. Since p-value is greater than 0.05, it
cannot be concluded that there is significant
difference in Advisor Satisfaction between male
and female students.
the probability is greater than 0.05. So there is
evidence that the variances for the two groups,
female students and male students, are not
different.
88One-Way ANOVA
- Tasks gtgt ANOVA gtgt One-Way ANOVA
89- Under Data, assign Q6 and Year to the task roles
of Dependent variable and Independent variable,
respectively.
90- Under Tests, click Levenes test
91- Under Means Comparison, check Bonferroni t test,
Duncans multiple-range test, and Scheffes
multiple comparison procedure for Post Hoc tests
92- Under Plots, check Means for Plots Types.
- Then, click Run.
93One-Way ANOVA results
Since p-value is greater than 0.05, it can be
concluded that there is no significant
difference in average Advisor Satisfaction among
year(s) of study. Therefore, there is no need to
check the Post Hoc tests.
94Post Hoc Test Bonferroni t Tests
95Post Hoc Test Scheffes Tests
96ANOVA Means Plot of Q6 by Year
97Test of Association Correlation Regression
Analysis
98Data Exploration, Correlations, and Scatter Plots
- Tasks gtgt Multivariate gtgt Correlations
99- With Data selected at the left, assign Q1, Q2,
Q3, Q4, and Q5 to the task role of Analysis
variables and Q6 to the role of Correlate with.
100Correlation Types
101- In Results, check the box for Create a scatter
plot for each correlation pair. Also, check the
box at the right for Show correlations in
decreasing order of magnitude and uncheck the box
for Show statistics for each variable.
102Correlation Analysis
- Since p-values are less than 0.05, there are
significant (positive) relationships between Q6
(Overall satisfaction on Advisor) and Q1, Q2, Q3,
Q4, Q5.
103Linear Regression
- Tasks gtgt Regression gtgt Linear Regression
104- Drag Q6 to the dependent variable task role and
Q1, Q2, Q3, Q4, Q5. to the explanatory variables
task role.
105Regression Model
- Model Selection Method Full model fitted (by
default)
106Regression Statistics
- Under Details on estimates, check Standardized
regression coefficients - Perform some Diagnostics
107Regression Diagnostics
- Unusual and Influential data (Outliers/Leverage)
- Tests on Normality of Residuals
- Tests on Nonconstant Error of Variance
(Heteroscedasticity) - Tests on Correlations among Predictors
(Multicollinearity) - Tests on Nonlinearity
- Tests on Dependence of Residuals
(Autocorrelation) - Model Specification
108Diagnostics Collinearity Analysis
- This option requests a detailed analysis of
collinearity among the regressors. This includes
eigenvalues, condition indices, and decomposition
of the variances of the estimates with respect to
each eigenvalue.
109Diagnostics Collinearity Analysis
- Check Tolerance (1/VIF) or Variance Inflation
(VIF) - Some researchers use the more lenient cutoff of
5.0 or even 10.0 to signal when multicollinearity
is a problem. The researcher may wish to drop the
variable with the highest VIF if
multicollinearity is indicated and theory
warrants. - The condition indices are the square roots of the
ratio of the largest eigenvalue to each
individual eigenvalue. The largest condition
index is the condition number of the
scaled X matrix. Belsey, Kuh, and Welsch (1980)
suggest that, when this number is around 10, weak
dependencies might be starting to affect the
regression estimates. When this number is larger
than 100, the estimates might have a fair amount
of numerical error (although the statistical
standard error almost always is much greater than
the numerical error).
110Diagnostics Heteroscedasticity Test
- This option tests that the first and second
moments of the model are correctly specified. - Asymptotic covariance matrix. This option
displays the estimated asymptotic covariance
matrix of the estimates under the hypothesis of
heteroscedasticity.
111Diagnostics Durbin-Watson Statistic
- The Durbin-Watson statistic shows whether or not
the errors have first-order autocorrelation.
(This test is appropriate only for time series
data.) The sample autocorrelation of the
residuals is also produced. - The value of d ranges from 0 to 4. Values close
to 0 indicate extreme positive autocorrelation
close to 4 indicates extreme negative
autocorrelation and close to 2 indicates no
serial autocorrelation. As a rule of thumb, d
should be between 1.5 and 2.5 to indicate
independence of observations. Positive
autocorrelation means standard errors of the b
coefficients are too small. Negative
autocorrelation means standard errors are too
large.
112- Under Plots, select Custom list of plots under
Show plots for regression analysis. In the menu
that appears, uncheck the box for Diagnostic
plots and check the box for Histogram plot of the
residual, Normal quartile plot of the residual
and Residual plots.
113Regression Analysis
- These are the F Value and p-value, respectively,
testing the null hypothesis that the Model does
not explain the variance of the response
variable.
R-Square defines the proportion of the total
variance explained by the Model.
114Regression Analysis
- These are the t Value and p-value, respectively,
testing the null hypothesis that the coefficients
are significantly equal to 0.
115Regression Diagnostics
- Might suggest violation of normality of residuals
assumption
116Regression Diagnostics
- Might suggest violation of normality of residuals
assumption
117Regression Diagnostics
118