Data Preparation for Analytics Using SAS - PowerPoint PPT Presentation

About This Presentation
Title:

Data Preparation for Analytics Using SAS

Description:

Data Preparation for Analytics Using SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D. What is the purpose of this book? Introduces the reader to data ... – PowerPoint PPT presentation

Number of Views:553
Avg rating:3.0/5.0
Slides: 20
Provided by: Made75
Category:

less

Transcript and Presenter's Notes

Title: Data Preparation for Analytics Using SAS


1
Data Preparation for AnalyticsUsing SAS
  • Gerhard Svolba, Ph.D.
  • Reviewed by
  • Madera Ebby, Ph.D.

2
What is the purpose of this book?
  • Introduces the reader to data preparation
  • Why data preparation is not only important but a
    must prior to data analysis
  • From data preparation process to data analytics

3
The Analysis Path From raw data to results that
can be implemented
Data sources Data Preparation Analytic Modeling Results and Actions
Different Data Sources Merges, Denormalization Modeling, Parameter Estimation, Tuning Usage of Results
Relational Models, Star Schemes Derived Variables Transpositions, Aggregations Predictions, Classifications or Clustering Profiling Interpretations
4
The Analysis Path From raw data to results that
can be implemented
  • Good Results

Clever Modeling
Adequate Preparation
Data availability
5
Four Dimensionsfor Analytic Data Preparation
Business and Process Knowledge
Analytical Knowledge
Analytic Data Preparation
Efficient SAS coding
Documentation and Maintenance
6
Business question How did students who met the
provincial standard in grade 3 perform in grade
6?
  • Generates many other questions
  • Work with people in other departments such as IT
    to carry out a data analytic process

7
Why is this author qualified or not qualified to
address this topic?
  • He is an experienced SAS user as exemplified in
    the many Macros
  • He addresses issues by presenting examples from
    different background

8
What are the strengths or weaknesses of this book?
  • The book is written clearly and is easy to read
  • Provides the reader with a lot of examples of
    codes, input and outputs

9
Would you recommend this book?  If so, who would
you recommend it to and for what purpose? 
  • Those who prepare data marts for statistics or
    data mining or time series analyses
  • Those who provide data used in creating data
    marts IT and data warehousing
  • Both new and experienced SAS users who perform
    data analyses using data marts
  • Those who prepare data in relational databases
    with SQL

10
Does the book achieve its purpose?
  • Absolutely! It enables one to
  • Understand the business environment in which data
    preparation occurs
  • Extract and structure your data
  • Create derived variables from different tables
  • Program SAS in an efficient way

11
What is the best tip or technique addressed in
this book?
  • There are many new techniques that I learnt from
    this book. For example
  • Examine the mean scores for math by board mident

12
Continued
  • Proc means datadatalib.boards noprint nway
  • class board_mident
  • var Math_score
  • output outdatalib.aggr_static(drop_type_
    _freq_)
  • Mean Sum N STD MIN MAX /Autoname
  • run

13
Continued
  • To run analysis by board_mident, we use a CLASS
    statement. A BY statement could also be used but
    data would have to be sorted by board_mident
  • NWAY suppresses grand total mean and all other
    totals so that output data contains only rows for
    5 boards which are the analysis subjects
  • The NOPRINT in order to suppress the printed
    output from the log, which can be thousands of
    descriptive measures even for a small sample of 5
    observations
  • In the OUTPUT statement we specify the
    statistics that will be calculated . The AUTONAME
    option creates the new variable names in the form
    of VARIABLENAME_ STATISTIC
  • If we want to calculate different statistics for
    different input variables we can specify it on
    the OUTPUT statement e.g SUM(VARIABLE)sum_variab
    le
  • In the OUTPUT statement we drop the _TYPE_ and
    _FREQ_vaiables, although we could keep the _FREQ_
    and omit N from the statistics list.
  • Chapter 18, Multiple Interval-Scaled Observations
    per subject, page 183.

14
CONTINUED
15
Are there other books (or sources of information)
available with similar content? 
  • Yes, but tend to present bits and pieces of
    information
  • E.g. Resources on the internet
  • The Little SAS Book by Delwiche and Slaughter
  • If so, how does this book compare?
  • Comprehensive, well illustrated presentation of
    material

16
What will your SAS log look like?
17
or
18
or
19
or
Write a Comment
User Comments (0)
About PowerShow.com