Title: Predictive Tax Compliance
1Predictive Tax Compliance
SPSS Benjamin Chard Senior Solution
Engineer bchard_at_spss.com Sarah Mattingly IRS
Account Executive smattingly_at_spss.com
SRA Ted Fischer Project Manager ted_fischer_at_sra.c
om or theodore.i.fischer_at_irs.gov 301-731-3534
2Agenda
- Introduction to Data Mining
- Predictive Tax Compliance
- Using Clementine for Audit Selection
- Whats New in Clementine Version 11.1
- IRS Refund Fraud Detection Project Case Study
3Where Does Data Mining Fit?
- Operational Setting
- Reporting
- Case Mgt
- Claim Scoring
Build Models Data Mining Workbench
- Existing Data
- Historical Claims
- Current Claims
4Data Mining vs. Query/Reporting
- Reporting (Tables, Graphics, OLAP)
- Provide you with a very good view of what is
happening, but within a limited view of the data
and only in models defined by the user
5Statistics vs. Data Mining Statistics
Hypothesis Testing
6- Three classes of data mining algorithms
What events occur together? Given a series of
actions what action is likely to occur next?
7Predictive Tax Compliance
8Predictive Tax Compliance
Register
Assess
Collect
- Tax Collection
- Risk Models
- Audit Selection
- Audit Models
- Non-Filer Discovery
- Soft-Matching
- Prioritization Models
DATA MINING PREDICTIVE ANALYTICS TOOLS
DATA WAREHOUSE
- Right work to the right resources at the right
time
9Predictive Modeling
- Building a predictive profile of the claim that
after investigation was flagged as an improper
payment regardless of amount. - Select positive investigations Maximize those
claims with the highest dollar adjustment found
per audit hour. - Minimize the number of no-change audits.
10Anomaly Detection
- Find emerging trends in claims data. Use data
mining to show the emerging patterns in current
year data. Reported results will present specific
cases that either - Exhibit a common pattern or
- Exhibit an unusual pattern
- Unusual cases are deployed to the field
investigators for further analysis.
11Case Study Audit Selection Goals
- Build models to predict different outcomes.
- Positive Adjustment (Y/N).
- DPH group membership.
- Actual Adjustment.
- Historical Cases selected for model build
- Cases with Prior audit prior audit and
organizational data. - All Cases organizational data only.
- Deployment
- For each outcome combine predictions for those
with and without previous audit data . - For each outcome predict using organizational
data only.
12Clementine Workbench
13Case Study Results
14Text Mining and Linguistic Extraction
15 Text Mining Timeline Text Extraction
Mr. Smith aka Mr. Ahmed was seen on the corner
of Church St. and Magnolia Ave. on Nov 13th
Bag of Words extraction
Mr. Smith (Person) -gt aka (Alias) -gt Mr. Ahmed
(Person) was seen (location) -gt Church and
Magnolia (address) -gt November 13 (Date)
Expressions extraction
Mr. Ahmed in database wanted for questioning
Suspect -gt send agent to this location
Mr. Smith aka was seen with Ahmed on the corner of
Church Etc.
Named Entities extraction
Mr. Smith was seen Mr. Ahmed corner Church
St. Magnolia Ave. Nov 13th
Events/Sentiment Extraction
Mr. Smith -gt Person Mr. Ahmed-gt Person aka -gt
Alias was seen -gt location Church St. -gt
Address Magnolia Ave. -gt Address Nov 13th -gt Date
Combined with structured data
16Text Mining Management
- General Dictionaries
- Organization, Location, Name, Phone Number, etc
- Custom Built Subject Dictionaries
- Tax Code, Form Names, Commodity, Business, etc
- Interactive Synonym Dictionaries
- Exclude Dictionaries
- NEW! Classification algorithms enable you to
aggregate concepts from a wide variety of
unstructured text data and group them into a
small number of categories.
17Whats New
18Binary Classifier Automation of Many Models
- Sophisticated users hundreds of models
(scripting) - Binary Classifier Node imitates this
- but easily, with a pre-built node
19Time Series Algorithm
- ARIMA Exponential Smoothing
- Expert Modeler finds best model automatically
- Forecast Multiple Series at once
- Data Preparation Tools
20Optimal Binning
- Splitting up numeric data into sub-ranges
- New capability to make this optimal for prediction
New Capability Optimal bins
Existing Capability Equal bins
21SPSS Reporting
- SPSS Statistics and Graphs Within Clementine
22Configuration Management
- Predictive Enterprise
- Services (PES) Top Four
23Deployment and Integration
- Exporting Data, Models and Streams
241. Improve Collaboration
- In single project there is the potential to
create a large number of models and versions of
models - different out variables
- different algorithms
- different settings
- different training samples.
- X different data sets
- X different users
- X different locations.
252. Improve Transparency
- Provide information on which models are run on
which data. - For audit standards, track who has made changes
to the model and when.
Your analytics team from their desktop can see
which models were most recently run on data, so
that they would be able to provide this for
internal audits.
263. Automate Process
- Combine Clementine, SPSS, SAS other processes
274. Centralize and Control Access
28Contact information
- Project personnel
- Ted Fischer ted_fischer_at_sra.com or
theodore.i.fischer_at_irs.gov, 301-731-3534 - Anthony Colyandro anthony_colyandro_at_sra.com or
anthony.colyandro_at_irs.gov, 301-731-3524 - SRA Director of Business Intelligence
- Dave Vennergrund dave_vennergrund_at_sra.com,
703-803-1614
29How do I get SPSS software?
IRS Cathy J. Allen Enterprise System
Management Software Management Section Idea
Branch - MS 5850 (304) 264-7279 - voice (304)
279-5309 - cell (304) 260-3033 -
fax cathy.j.allen_at_irs.gov
SPSS Contacts Account Executive Sarah
Mattingly Email smattingly_at_spss.com W
703-740-2446 C 703-389-6485 Account Manager
Matt Madden W - 312 651 3894
30Predictive Tax Compliance
SPSS Benjamin Chard Senior Solution
Engineer bchard_at_spss.com Sarah Mattingly IRS
Account Executive smattingly_at_spss.com
SRA Ted Fischer Project Manager ted_fischer_at_sra.c
om or theodore.i.fischer_at_irs.gov 301-731-3534