ISQS 3358, Business Intelligence Data Preparation for Analytics Using SAS

1 / 29
About This Presentation
Title:

ISQS 3358, Business Intelligence Data Preparation for Analytics Using SAS

Description:

Two-levels of SAS files names. Libref.fielname. ISQS 6347, Data & Text Mining. 18 ... Producing list reports (ch4) Enhancing output (ch5) Creating data sets (ch6) ... –

Number of Views:143
Avg rating:3.0/5.0
Slides: 30
Provided by: zlinB
Category:

less

Transcript and Presenter's Notes

Title: ISQS 3358, Business Intelligence Data Preparation for Analytics Using SAS


1
ISQS 3358, Business IntelligenceData Preparation
for AnalyticsUsing SAS
  • Zhangxi Lin
  • Texas Tech University

2
Outline
  • Exercise 9
  • An overview of data preparation for analytics
  • SAS Programming Essentials
  • Running SAS programs
  • Mastering fundamental concepts
  • SAS program debugging
  • Exercise 10

3
Exercise 9 Creating Summarized Output
  • Chapter 5 exercises p A-15 to A-20
  • Producing summary statistics
  • Creating a custom format
  • Producing a summary table
  • Optional Displaying percentage in a table
  • Producing a bar chart
  • Deliverables
  • Screenshot of the Project Designer
  • Screenshots of the above html outputs (not
    necessary to show the complete output)

4
Structure and Components of Business Intelligence
5
Overview From Data Warehousing to Data Analysis
  • Previous major topics in data warehousing (using
    SQL Server 2005)
  • Dimensional model design
  • ETL
  • Cubes design and OLAP
  • Data analysis topics (using SAS)
  • Data preparation
  • Analytic business questions
  • Data format and data conversion
  • Data quality
  • Data exploratory
  • Data analysis

6
US Car Theft
  • The number of U.S. motor vehicle thefts decreased
    by 1.9 percent from 2003 to 2004, the first
    decrease since 1999. In 2004, the value of stolen
    motor vehicles was 7.6 billion, down from 8.6
    billion in 2003. The average value of a motor
    vehicle reported stolen in 2004 was 6,143,
    compared with 6,797 in 2003.

7
2004 Theft Statistics
  • Every 26 seconds, a motor vehicle is stolen in
    the United States. The odds of a vehicle being
    stolen were 1 in 190 in 2003. The odds are
    highest in urban areas.
  • U.S. motor vehicle thefts fell 1.9 percent in
    2004 from 2003, according to the FBI's Uniform
    Crime Reports. In 2004, 1,237,114 motor vehicles
    were reported stolen.
  • The West was the only region with an increase in
    motor vehicle thefts from 2003 to 2004, up 3.2
    percent. Thefts fell 9.7 percent in the
    Northeast, 4.4 percent in the Midwest and 2.9
    percent in the South.
  • Nationwide, the 2004 motor vehicle theft rate per
    100,000 people was 421.3, down 2.9 percent from
    433.7 in 2003.
  • Only 13.0 percent of thefts were cleared by
    arrests in 2004.
  • Carjackings occur most frequently in urban areas.
    They account for only 3.0 percent of all motor
    vehicle thefts.
  • The average comprehensive insurance premium in
    the U.S. rose 11.2 percent from 1999 to 2003

8
Business Question
  • If the number of used Honda Accord thefts is
    ranked the top in auto theft, should the premium
    of insurance for Honda Accord be high enough than
    other brand of cars? Should the insurance for a
    user Honda higher than a brand new Honda?
  • Why?

9
Other Analytic Business Questions
  • How do factors such as time, branch, promotion,
    and price influence the sale of a soft drink?
  • Which customers have a high cancellation risk in
    the next month?
  • How can customers be segmented based on their
    purchase behavior?
  • Statistics showed that an online recommendation
    system may increase the sale 20, and the
    accuracy rate of the system is 40. A newer
    algorithm can increase the accuracy rate to 50.
    Should the sale be promoted to 20125 25?
  • The airline companies are considering allowing
    seats over-booked because certain percentage of
    customers will cancel their flight at the last
    minute. If the average cancellation rate is 10,
    should the over-booking rate be 10 as well? If a
    cancellation is charged 5 of the fare and how
    much should the penalty for sold-out situation
    with over-booking?

10
Analysis Process
  • Selecting an analysis method
  • Identify data source
  • Prepare the data (collecting, cleansing,
    reorganizing, extracting transforming, loading)
  • Execute the analysis
  • Interpret the analysis
  • Automate data preparation and execution of
    analysis, if the business question has to be
    answered more than once
  • ETL
  • Stored procedures
  • The above steps can also be iterated, not
    necessarily performed in sequential order
  • We focus on the data preparation step

11
Characteristics of Analytic Business Questions
  • Analysis complexity real analysis or reporting
  • Analysis paradigm statistics or data mining
  • Data preparation paradigm as much as data as
    possible or business knowledge first
  • Analysis method supervised or unsupervised
    analysis
  • Scoring needed yes/no
  • Periodicity of analysis one-shot or re-run
  • Historic data needed, yes/no
  • Data structure one row or multiple rows per
    subject
  • Complexity of the analysis team

12
Components of the SAS System
Reporting And Graphics
Data Access And Management
User Interface
Base SAS
Analytical
Application Development
Visualization And Discovery
Business Solutions
Web Enablement
13
SAS Programming Essentials
  • Find more information from
  • http//support.sas.com

14
Data-driven Tasks
  • The functionality of the SAS System is built
    around four data-driven tasks common to virtually
    any applications
  • Data access
  • Data management
  • Data analysis
  • Data presentation

15
Turning Data into Information
  • Process of delivery meaningful information
  • 80 data-related
  • Access
  • Scrub
  • Transform
  • Mange
  • Store and retrieve
  • 20 analysis

16
SAS Programs
A SAS program is a sequence of steps that the
users submits for execution
DATA steps are typically used to create SAS data
sets
Raw Data
SAS Data Set
Report
DATA Step
PROC Step
SAS Data Set
PROC steps are typically used to process SAS data
sets
17
Preparation of SAS Programming
  • Data sets \SAS-Programming
  • Create a user defined library reference
  • Statement
  • LIBNAME libref SAS-data-library ltoptionsgt
  • Example
  • LIBNAME ia c\workshop\winsas\prog1
  • Two-levels of SAS files names
  • Libref.fielname

18
SAS Programming Essentials
  • Demon c02s2d1
  • Exercise c02ex1

19
SAS Data Sets
  • SAS data sets have a description portion and a
    data portion.
  • The description portion contains the general
    information about the SAS data set and variable
    attributes
  • The CONTENTS procedure displays the descriptor
    portion of a SAS data set
  • PROC CONTENTS DATASAS-data-set
  • RUN

20
SAS Data Sets Data Portion
  • The PRINT procedure displays the data portion of
    a SAS data set
  • PROC PRINT DATASAS-data-set
  • RUN

21
SAS Data Set Terminology
  • SAS data set SAS table
  • Variable Column
  • Observation Row

22
SAS Comments
  • / begins a comment
  • / ends a comment

23
SAS Program Debugging
  • Demon c02s4d1
  • Exercise c02ex7

24
Outline of SAS Programming Level I
  • Fundamentals (ch1-3)
  • Producing list reports (ch4)
  • Enhancing output (ch5)
  • Creating data sets (ch6)
  • Data step programming (ch7)
  • Reading data
  • Creating variables
  • Conditional processing
  • Keeping and dropping variables
  • Reading Excel files
  • Combining SAS data sets (ch8)
  • Producing summary reports (ch9)
  • SAS graphing (ch10)

25
Making Use of SAS Enterprise Guide Code
  • Import a text file
  • Example Orders.txt
  • Import an Excel file
  • Example SupplyInfo.xls

26
Learn from Examples
  • SAS Help
  • Contents -gt Learning to use SAS -gt Sample SAS
    Programs -gt Base SAS
  • Base Usage Guide Examples
  • Chapter 3, 4

27
Import an Excel Sheet
  • proc import outwork.commrex
  • datafile "C\Lin\Shared\ISQS6339\Commrex_3358.xl
    s" dbmsexcel replace
  • sheet"Company"
  • getnamesyes
  • mixedno
  • scantextyes
  • usedateyes
  • scantimeyes
  • run
  • proc print datawork.commrex
  • run

28
Excel SAS/ACCESS LIBNAME Engine
  • libname xlsdata 'C\Lin\Shared\ISQS6339\Commrex_33
    58.xls'
  • proc print dataxlsdata.New1
  • run

29
Exercise 10 SAS Data Step Programming
  • Import and print text file Userlog00.txt
  • Import and print Excel sheet customers in
    Afs_customers.xls
  • Send the Word file containing the Screenshots of
    two outputs to zhangxi.lin_at_hotmail.com
Write a Comment
User Comments (0)
About PowerShow.com