Title: Data and Text Mining
1- Data and Text Mining
- 2006 Digital Now
- Orlando, FL
-
- Kevin Whorton, Principal
- Whorton Marketing Research
- Columbia, Maryland
- info_at_kwhorton.com
2Overview
- Case Studies SHRM, TMS, CRS
- General Principles
3SHRM
4SHRM Business Intelligence Implementation
- Ongoing, 4-year, iterative process
- Required extensive up-front buy-in and definition
building - Forward-thinking core issues driving the BI
Environment - What problems are we trying to solve?
- What questions are we trying to answer?
5SHRM User Audiences for BI
- Different layers of interaction
- Executive/C-Suite
- Business leaders/Program Managers
- End consumer/Operational-level users
- These groups will have different perspectives on
what the BI Environment is and what it does, and
will create different motivations for its growth
6SHRM Interfaces/Tools
- Microsoft DTS as the ETL tool
- Cognos BI Toolset for reporting and analysis
- Embedded in MS Sharepoint
- Designed as a one stop shop for business
intelligence
7SHRM Status and Lessons Learned
- Outlay has been substantial (over 2MM)
- ROI has been good
- Learning experiences
- Starting with "empowered users"
- Leaving with somewhat defined queries to ensure
consistency of information - Information "depth of use" much greater
8TMA
9Texas Medical Association Data
Warehouse/Mart/Mining
- Data warehouse
- Organize the information scattered among
different sources and store it in a data
warehouse. - Extract scattered/incompatible data from
different sources, transform by cleaning, making
consistent - Data mart
- More specialized cut from data warehouse
- Source files of transactional records
- Contain all data needed for related group of
analyses - Data is summarized in tables at appropriate
levels - Data Mining
- Enhance decision support by adding tools to
access/analyze contents - Process of selecting, exploring, and modeling
large amounts of data to uncover previously
unknown patterns
10Comparison Traditional, New Data Management
- Current Report Environment
- Are your reports the same as what you used 5
years ago? - Do they lead you to ask new questions?
- If you think of new questions, do you ever get
the answers? - Decision Cubes
- Ask questions and receive immediate answers
- New information views, prompting new questions
- Compare historical information
11TMA Decision Cubes
- Multi-dimensional data representation
- Can be viewed from different perspectives.
- Enables manipulation of parameters
- Derive metrics about your operations/association
- Displays results immediately
- Includes graphs and charts directly from the
data
A cube aggregates the facts in each level of each
dimension
12TMA Implementation
- System Requirements
- Microsoft SQL Server includes analysis services
as part of license - If unfamiliar, technology staff probably hasnt
installed it! - Up-front costs
- Direct cost of one week of decision cube
consultants time - 8,000 to 10,000 - Indirect cost for IT staff 2 weeks effort up
front, ongoing future enhancements
13TMA Analytic applications
- Applications enable users to access and
manipulate warehouse data for better-informed
decisions - Demand forecasting, pricing, competitor analysis
and customer segmentation - Tools ProClarity, Excel, Cognos
- Costs
- 700 to 35,000 ProClarity single user and
browser based, multiple-user systems
14 ProClarity Illustration
15TMA Analysis Excel Illustration
16TMA Status and Lessons Learned
- Outlay also substantial
- Return has been even greater
- TMA's choice but associations can achieve same
impact with just the data visualization tools
described - Sophisticated target marketing ensures greater
acquisition - Market penetration steadily increasing
- Value in drilling down 45,000 total physicians
in TX - Ability to tailor, promote targeted CME and other
services - Odd that no other associations choose to use
these tools - It really is easy to implement in far smaller
associations
17CRS
18Need Driven by Market Size/Access
Total US - 273 Million
Total Catholic - 65 Million
Typically Attend Mass - 20 Million
CRS Aware 14 Million
Donors - 400,000
19CRS Large-Charity Illustration
- Large repository of data to analyze
- Contacts 12MM acquisition, 8MM house annual
contacts - Categorized by content, media, vehicle
- Response 750k gifts
- Data available vehicle (mail, phone, online),
method of payment, one-time and monthly gifts - Donations, not memberships varying amounts,
upgrade/downgrade behavior - Weaknesses/impediments
- Lack of strategic technical advisors
- Poor service donor management system vendor,
staff DBMS - Limited executive understanding of issues
- Hostile environment for direct marketing in
fundraising/marketing mix - Hard to hire experienced analysts in
Baltimore/non-profit market - Limited discretionary budgets for technology
20CRS Assets Available to Us
- Solid report writing
- All campaign level results available by RFM
segments (recency, frequency, monetary value) - Existing staff very tactical, great memories
- Periodic investments in "snapshot"
analyses/decision tools - Target Analysis Group individual program
assessment and benchmarking - Amergent list life cycle analysis (LTV of
acquired donors) - Great latitude for action
- My charge "take 50 million program and double
it in two years" - Free to create new positions, retrain and retitle
existing staff - Strong research and branding support
- Disaster relief allowed us to make periodic
major changes
21"Oops" areas Focusing on the Doable
- Established staff of "data kids"
- Self-funding moved merge purge in-house to save
250k per year in expenses - Able to refine/diagnose archaic methods of
campaign level data extraction over time - Well-trained able to master, embed SQL queries
- Used research to provide behavioral insights
- Link giving to attitudes, motivations, other
behaviors - Necessary to generate new ideas, guide new
campaigns - Outsourced predictive models
- Worked with Genalytics to score new acquisition
files, based on 40 million past acquisition
contacts
22Analysis Program Applications
- Ad hoc capabilities make new programs possible
- Upgrade/downgrade analysis
- Creation of custom gift arrays thorny issue with
emergency giving - All "asks" (gift arrays) built off historical
giving - Tsunami yielded average 280 temporary upgrade
- Record selection for mid-level programs
- Special, expensive multi-step direct mail
campaigns intentionally asking for 3-5x highest
prior contribution - Originally based on seasonal and lifetime giving
- Over time, added overlays to measure capacity to
give
23Analysis Donor Management
- Strong tradition of test vs. control in direct
mail - Difficult to draw comparisons with house file
mailings - Examining mail frequency
- We mailed best house file names 24 times per
year - Attrition on an individual level vs. maximizing
net revenue - Applying "sweet spot" analysis to donors
- Mining complaint/comment data
- Donor research
- Focus groups to determine why people open read
- Laddering interviews to determine positioning
- Online panel survey of universe to segment
acquisition market - Surveying donors to define expectations,
satisfaction
24Improved Donor/Member Relations Jury Rigged
Use of "Interest Codes"
- Software allowed us to code member by
"interest" - Because we could export data, we categorized
donors by many different variables - After manipulating data, imported back into
system - Allowed us to be more flexible/responsive -
extending our RFM selects- develop profiles of
respondents to specific mailings- cross-tabbing
donor types to learn more- proactively
identify otherwise apparently good donors for
de-selection
25Analysis Areas Never Analyzed
- Integration of online data
- Over time 50,000 completely new tsunami givers
- Little link between online visits, e-newsletter
reading - Understanding true payback analysis
- Our technology investment paid itself back at
least tenfold in first two years however - Modeling the cost/effort/results of contacts with
first-time donors - Never understood what truly drives attrition of
one-time givers - Linking CRS behavior to world at large
- Rented names from Target cross-organization
"model" - Never understood "share of wallet" or reasons for
defection
26Traditional DM Analysis
- Graphing linear relationships finding sweet
spots - Very limiting if relationships are non-linear,
changing over time
- And knowing when the relationships really are
linear/predictive.
27Other Analytical Tools Common Techniques
Answers
- Cross-tabulations Visual representation showing
simple relationships between variables/segments - Complex "grids" allow analysis, audience
selection - Correlations measures relationships between two
variables - Regressions most powerful tool
- Xf(x,y,z) or Membershipfunction of dues level,
presence of competition, penetration, service mix - R2 is a measure that explains relationship
between one variable and everything driving it - Helpful for projections and forecast models
- Logistic regressions allow prediction of yes/no
outcomes - Logarithms yield coefficients explaining
percentage contributions - Dummy variables measure seasonality, time
trends, one-time shifts
28Illustration Regression Analysis
- Example renewal program model avg response rate
of 4.25, avg gift 36.25, revenue/name mailed of
1.54 - Final equation is PRaRbFcMdO
- Or predicted revenue is a function of donors
recency of giving, frequency, aggregate value,
and other stuff" - Final results for a sample donor who is exactly
average is 1.54-0.068(6.5) 0.215(2.4)
0.00465(156) 0.0087(85) - Confusing, but once the formula is derived
- Real output scored file prioritizes every
prospect, helps control your spend/return