Title: ISQS 3358, Business Intelligence Data Preparation for Analytics Using SAS
1ISQS 3358, Business IntelligenceData Preparation
for AnalyticsUsing SAS
- Zhangxi Lin
- Texas Tech University
2Outline
- Exercise 9
- An overview of data preparation for analytics
- SAS Programming Essentials
- Running SAS programs
- Mastering fundamental concepts
- SAS program debugging
- Exercise 10
3Exercise 9 Creating Summarized Output
- Chapter 5 exercises p A-15 to A-20
- Producing summary statistics
- Creating a custom format
- Producing a summary table
- Optional Displaying percentage in a table
- Producing a bar chart
- Deliverables
- Screenshot of the Project Designer
- Screenshots of the above html outputs (not
necessary to show the complete output)
4Structure and Components of Business Intelligence
5Overview From Data Warehousing to Data Analysis
- Previous major topics in data warehousing (using
SQL Server 2005) - Dimensional model design
- ETL
- Cubes design and OLAP
- Data analysis topics (using SAS)
- Data preparation
- Analytic business questions
- Data format and data conversion
- Data quality
- Data exploratory
- Data analysis
6US Car Theft
- The number of U.S. motor vehicle thefts decreased
by 1.9 percent from 2003 to 2004, the first
decrease since 1999. In 2004, the value of stolen
motor vehicles was 7.6 billion, down from 8.6
billion in 2003. The average value of a motor
vehicle reported stolen in 2004 was 6,143,
compared with 6,797 in 2003.
72004 Theft Statistics
- Every 26 seconds, a motor vehicle is stolen in
the United States. The odds of a vehicle being
stolen were 1 in 190 in 2003. The odds are
highest in urban areas. - U.S. motor vehicle thefts fell 1.9 percent in
2004 from 2003, according to the FBI's Uniform
Crime Reports. In 2004, 1,237,114 motor vehicles
were reported stolen. - The West was the only region with an increase in
motor vehicle thefts from 2003 to 2004, up 3.2
percent. Thefts fell 9.7 percent in the
Northeast, 4.4 percent in the Midwest and 2.9
percent in the South. - Nationwide, the 2004 motor vehicle theft rate per
100,000 people was 421.3, down 2.9 percent from
433.7 in 2003. - Only 13.0 percent of thefts were cleared by
arrests in 2004. - Carjackings occur most frequently in urban areas.
They account for only 3.0 percent of all motor
vehicle thefts. - The average comprehensive insurance premium in
the U.S. rose 11.2 percent from 1999 to 2003
8Business Question
- If the number of used Honda Accord thefts is
ranked the top in auto theft, should the premium
of insurance for Honda Accord be high enough than
other brand of cars? Should the insurance for a
user Honda higher than a brand new Honda? - Why?
9Other Analytic Business Questions
- How do factors such as time, branch, promotion,
and price influence the sale of a soft drink? - Which customers have a high cancellation risk in
the next month? - How can customers be segmented based on their
purchase behavior? - Statistics showed that an online recommendation
system may increase the sale 20, and the
accuracy rate of the system is 40. A newer
algorithm can increase the accuracy rate to 50.
Should the sale be promoted to 20125 25? - The airline companies are considering allowing
seats over-booked because certain percentage of
customers will cancel their flight at the last
minute. If the average cancellation rate is 10,
should the over-booking rate be 10 as well? If a
cancellation is charged 5 of the fare and how
much should the penalty for sold-out situation
with over-booking?
10Analysis Process
- Selecting an analysis method
- Identify data source
- Prepare the data (collecting, cleansing,
reorganizing, extracting transforming, loading) - Execute the analysis
- Interpret the analysis
- Automate data preparation and execution of
analysis, if the business question has to be
answered more than once - ETL
- Stored procedures
- The above steps can also be iterated, not
necessarily performed in sequential order - We focus on the data preparation step
11Characteristics of Analytic Business Questions
- Analysis complexity real analysis or reporting
- Analysis paradigm statistics or data mining
- Data preparation paradigm as much as data as
possible or business knowledge first - Analysis method supervised or unsupervised
analysis - Scoring needed yes/no
- Periodicity of analysis one-shot or re-run
- Historic data needed, yes/no
- Data structure one row or multiple rows per
subject - Complexity of the analysis team
12Components of the SAS System
Reporting And Graphics
Data Access And Management
User Interface
Base SAS
Analytical
Application Development
Visualization And Discovery
Business Solutions
Web Enablement
13SAS Programming Essentials
- Find more information from
- http//support.sas.com
14Data-driven Tasks
- The functionality of the SAS System is built
around four data-driven tasks common to virtually
any applications - Data access
- Data management
- Data analysis
- Data presentation
15Turning Data into Information
- Process of delivery meaningful information
- 80 data-related
- Access
- Scrub
- Transform
- Mange
- Store and retrieve
- 20 analysis
16SAS Programs
A SAS program is a sequence of steps that the
users submits for execution
DATA steps are typically used to create SAS data
sets
Raw Data
SAS Data Set
Report
DATA Step
PROC Step
SAS Data Set
PROC steps are typically used to process SAS data
sets
17Preparation of SAS Programming
- Data sets \SAS-Programming
- Create a user defined library reference
- Statement
- LIBNAME libref SAS-data-library ltoptionsgt
- Example
- LIBNAME ia c\workshop\winsas\prog1
- Two-levels of SAS files names
- Libref.fielname
18SAS Programming Essentials
- Demon c02s2d1
- Exercise c02ex1
19SAS Data Sets
- SAS data sets have a description portion and a
data portion. - The description portion contains the general
information about the SAS data set and variable
attributes - The CONTENTS procedure displays the descriptor
portion of a SAS data set - PROC CONTENTS DATASAS-data-set
- RUN
20SAS Data Sets Data Portion
- The PRINT procedure displays the data portion of
a SAS data set - PROC PRINT DATASAS-data-set
- RUN
21SAS Data Set Terminology
- SAS data set SAS table
- Variable Column
- Observation Row
22SAS Comments
- / begins a comment
- / ends a comment
23SAS Program Debugging
- Demon c02s4d1
- Exercise c02ex7
24Outline of SAS Programming Level I
- Fundamentals (ch1-3)
- Producing list reports (ch4)
- Enhancing output (ch5)
- Creating data sets (ch6)
- Data step programming (ch7)
- Reading data
- Creating variables
- Conditional processing
- Keeping and dropping variables
- Reading Excel files
- Combining SAS data sets (ch8)
- Producing summary reports (ch9)
- SAS graphing (ch10)
25Making Use of SAS Enterprise Guide Code
- Import a text file
- Example Orders.txt
- Import an Excel file
- Example SupplyInfo.xls
26Learn from Examples
- SAS Help
- Contents -gt Learning to use SAS -gt Sample SAS
Programs -gt Base SAS - Base Usage Guide Examples
- Chapter 3, 4
27Import an Excel Sheet
- proc import outwork.commrex
- datafile "C\Lin\Shared\ISQS6339\Commrex_3358.xl
s" dbmsexcel replace - sheet"Company"
- getnamesyes
- mixedno
- scantextyes
- usedateyes
- scantimeyes
- run
- proc print datawork.commrex
- run
28Excel SAS/ACCESS LIBNAME Engine
- libname xlsdata 'C\Lin\Shared\ISQS6339\Commrex_33
58.xls' - proc print dataxlsdata.New1
- run
29Exercise 10 SAS Data Step Programming
- Import and print text file Userlog00.txt
- Import and print Excel sheet customers in
Afs_customers.xls - Send the Word file containing the Screenshots of
two outputs to zhangxi.lin_at_hotmail.com