Title: Data Preparation for Analytics Using SAS
1Data Preparation for AnalyticsUsing SAS
- Gerhard Svolba, Ph.D.
- Reviewed by
- Madera Ebby, Ph.D.
2What is the purpose of this book?
- Introduces the reader to data preparation
- Why data preparation is not only important but a
must prior to data analysis - From data preparation process to data analytics
3The Analysis Path From raw data to results that
can be implemented
Data sources Data Preparation Analytic Modeling Results and Actions
Different Data Sources Merges, Denormalization Modeling, Parameter Estimation, Tuning Usage of Results
Relational Models, Star Schemes Derived Variables Transpositions, Aggregations Predictions, Classifications or Clustering Profiling Interpretations
4The Analysis Path From raw data to results that
can be implemented
Clever Modeling
Adequate Preparation
Data availability
5Four Dimensionsfor Analytic Data Preparation
Business and Process Knowledge
Analytical Knowledge
Analytic Data Preparation
Efficient SAS coding
Documentation and Maintenance
6Business question How did students who met the
provincial standard in grade 3 perform in grade
6?
- Generates many other questions
- Work with people in other departments such as IT
to carry out a data analytic process
7Why is this author qualified or not qualified to
address this topic?
- He is an experienced SAS user as exemplified in
the many Macros - He addresses issues by presenting examples from
different background
8What are the strengths or weaknesses of this book?
- The book is written clearly and is easy to read
- Provides the reader with a lot of examples of
codes, input and outputs
9Would you recommend this book? If so, who would
you recommend it to and for what purpose?
- Those who prepare data marts for statistics or
data mining or time series analyses - Those who provide data used in creating data
marts IT and data warehousing - Both new and experienced SAS users who perform
data analyses using data marts - Those who prepare data in relational databases
with SQL
10Does the book achieve its purpose?
- Absolutely! It enables one to
- Understand the business environment in which data
preparation occurs - Extract and structure your data
- Create derived variables from different tables
- Program SAS in an efficient way
11What is the best tip or technique addressed in
this book?
- There are many new techniques that I learnt from
this book. For example - Examine the mean scores for math by board mident
12Continued
- Proc means datadatalib.boards noprint nway
- class board_mident
- var Math_score
- output outdatalib.aggr_static(drop_type_
_freq_) - Mean Sum N STD MIN MAX /Autoname
- run
13Continued
- To run analysis by board_mident, we use a CLASS
statement. A BY statement could also be used but
data would have to be sorted by board_mident - NWAY suppresses grand total mean and all other
totals so that output data contains only rows for
5 boards which are the analysis subjects - The NOPRINT in order to suppress the printed
output from the log, which can be thousands of
descriptive measures even for a small sample of 5
observations - In the OUTPUT statement we specify the
statistics that will be calculated . The AUTONAME
option creates the new variable names in the form
of VARIABLENAME_ STATISTIC - If we want to calculate different statistics for
different input variables we can specify it on
the OUTPUT statement e.g SUM(VARIABLE)sum_variab
le - In the OUTPUT statement we drop the _TYPE_ and
_FREQ_vaiables, although we could keep the _FREQ_
and omit N from the statistics list. - Chapter 18, Multiple Interval-Scaled Observations
per subject, page 183.
14CONTINUED
15Are there other books (or sources of information)
available with similar content?
- Yes, but tend to present bits and pieces of
information - E.g. Resources on the internet
- The Little SAS Book by Delwiche and Slaughter
- If so, how does this book compare?
- Comprehensive, well illustrated presentation of
material
16What will your SAS log look like?
17or
18or
19or