Title: Know your Constituents Through Effective Data Mining
1Know your Constituents Through Effective Data
Mining
- Karen Matheson
- MR Strategic Services
- CASE District VIII Conference, February 23, 2006
2Overview
- Definition
- Benefits
- Data Organization
- Data Collection Methods
- Data Profiling vs Data Modeling
- RFM Analysis
- Data Modeling Case Study
- Questions
3Definition of Data Mining
Looking for meaning and patterns within the
information in your organizations database so
you can unlock potential value - Jennifer
Shimp-Bowerman, Bucknell University
4Whats So Hot About Data Mining?
- It provides a way to organize your fundraising
time and resources to cultivate the most
promising major gift candidates - With data mining, youre able to identify traits
and characteristics of your top donors - It allows you to identify the events and
activities that most contribute to developing
positive relationships with your University or
College - Its a more reliable way of assessing
constituents than subjective impressions
5Data Mining Example
- University of Oregon - major gift predictive
model and test outcome
6How Can You Make a Numerical Judgment About a
Person?
- Its all about the data collection and storage
- Successful data mining involves having data to
from which to mine and being able to
quantitatively analyze it
7Data Storage Show Me the Data!
- Best Database Practices
- Centralize all advancement constituent data into
one database - Use a database with which you can create custom
fields - Coordinate yearly data dumps with organizations
across campus - Graduation Records
- Scholarships
- Student Life/Activities
- Individual Schools and Colleges
- Use the central database to track events and
constituent contact information - Build web forms to make the database more user
friendly
8My Shoes Are in the Dresser and My Clothes Are on
the Floor!
- Its extremely important for the data mining
process that data is stored in the correct place - Data in the wrong places could lead to the
following problems - Data could be difficult to find
- Data could be in the wrong format
- Data could be easily entered incorrectly, leaving
hours of data clean-up - Data could be completely worthless
9Common Data Storage Errors
- Inconsistent data entry methods
- Using place holders with data for empty fields
- Storing numeric variables in text format
- Storing variables in the wrong fields
10Common Data Storage Error Examples
- Storing financial information in a text field
allows all types of formatting on a variable that
should only allow a currency value - Data entry errors such as leaving off the dollar
sign or substituting a period for a comma can
lead to hours of data clean-up - How do we quantify 1,000,000 or lt500,000?
11Common Data Storage Error Examples
- The Alumni Lunch does not belong in the student
activities variable because it was not an
activity that a student could have participated
in - Because the student activities field contains
both student and non-student activities, we are
not able to tally the number of student
activities as a way of looking at a constituents
record - A data miner, database manager, or database user
might not know or remember which activities were
student activities
The Alumni 2003 Lunch was an event for Alumni
Association members (not current students)
12What Types of Data Are Useful For Data Mining?
- Constituents history with the institution
- Graduation year
- Student activities
- Current involvement
- Demographics
- Age
- Gender
- Income
- Households with children
- Professional and Community Involvement
- Board memberships
- Club/Activity memberships
- Donor history
13Where Do I Find All This Data?
- Alumni Database
- Donor Database
- Individual Schools and Colleges
- Online/Offline Surveys
- Census Data
- Voter File Data
- Data Vendors
14Overview
- Definition
- Benefits
- Data Organization
- Data Collection Methods
- Data Profiling vs Data Modeling
- RFM Analysis
- Data Modeling Case Study
- Questions
15Data Profiling vs. Data Modeling
- Data Profiling
- Can be used to describe segments from your
organizations database - All medical school alumni
- Donors who have made gifts of 25,000 or more
- Uses descriptive statistics such as means
(averages), medians (mid-points), and modes (most
common data points) - Data Modeling
- Uses inferential statistics to draw inferences
(conclusions) - Uses a random sample from your organizations
database
16Donor Profiling Steps
- Figure out the set of constituents we want to
study - Example All donors who made past gifts totaling
25,000 to the music school - What demographic information do we want to look
at? - Example Age, Gender, Average Income
- What constituent information do we want to
include? - Example Type of degree, event attendee
- Analyze demographic values according to giving
factors such as - Gift total for group, average lifetime giving,
median gift total, last campaign giving
17Demographic Profile Age
- Examining the age of the 1,000 25,000 music
school donors
18Recency, Frequency, and Monetary Value (RFM)
Analysis
RFM is a useful strategy that can be used to
predict the future value of donors and their
likelihood to respond to direct mail marketing
efforts.
- A scoring or ratings system based on
- How recently has a donor contributed?
- How often does a donor give?
- How much does a donor give?
19Limitations of RFM
- Works best on larger databases
- It only rates donors
- Not designed to rate non-donors
- It is a fluid rating system that needs to be
continually updated
20RFM Scoring System
- RFM analysis produces "scores" that rank donors
relative to each other for the likelihood that
they will repeat whatever action is being
profiled. - Sample Scoring System
- Each category Recency, Frequency, Monetary
Value receives a score of 1 to 5 based on their
quintile ranking - Bottom 20 of each category would receive a 1,
top 20 would receive a 5
21Predictive Modeling
- Predictive modeling uses inferential statistics
to predict future behavior - It is a useful tool for segmenting and ranking
large amounts of data in order to - Identify annual/major/planned giving prospects
- Ensure strategic planning
22Before You Even Start
- You should ask the following questions
- What do we want to predict?
- Whats the budget?
- Do I have the statistical ability?
- What other offices/departments need to be
involved? - How much time will this take?
- How will this accomplish our goals?
- What does our donor database look like?
- How would we implement the results of a model?
23University of Oregon Model Example
- What do we want to predict?
- Who is most likely to give a major gift to the
University - Whats our budget?
- Didnt have one
- Do I have the statistical ability?
- Sure, I took some statistics courses in college
and have some dusty text books - What other offices or departments need to be
involved? - Information Services
- Records and Receipting
24University of Oregon Model Example
- How much time do we have?
- One year
- How will this accomplish our goals?
- Identification of new prospects
- Assist fundraisers with prospect prioritization
- What does our database look like?
- Not pretty, but workable
- How would we implement the results of a model?
- Fill in regional fundraiser appointment schedules
with top rated prospects - Conduct research on top rated individuals not
currently managed
25Data Modeling Steps
- Obtain random sample
- Split the sample into two and set second half
aside - Using the first half of the sample, identify
variables that affect major giving - Develop a scale of measurement (model)
- Test model on second half of the sample
26Obtain Random Sample
- Decide what data youd like to pull out of the
database so you can test it against the outcome
you want to predict - Figure out how many people youll need in your
sample Id recommend using the sample size
calculator on this website - http//www.surveysystem.com/sscalc.htm
- When you get the random sample back, put half of
the sample aside for testing the model
27Random Sample
Out of the University of Oregons 300,000 person
database, a random sample of 10,000 individuals
with their subsequent data was drawn
28Random Sample
29Variables Galore!
- There are a couple of types of variables were
going to come across in our sample - Quantitative Variables
- Variables that are measured as a number for which
meaningful arithmetic operations make sense - Examples Height, age, GPA, salary, temperature
- Categorical Variables
- Variables with values that are one of several
possible categories. Categorical variables have
no numerical meaning - Examples gender, political affiliation, field of
study - Ordinal Variables
- A special type of categorical variable for which
the levels can be naturally ordered - Examples Taco Bell Hot Sauce (medium, hot, fire)
30Statistics Software
- Slap all of the data from the obtained random
sample into statistics software - Excel Data Analysis (Microsoft Office)
- DataDesk (www.datadesk.com)
- SPSS (www.spss.com)
- SAS (www.sas.com)
- Minitab (www.minitab.com)
31Start Testing All of the Variables
- How do I do that? Heres where those dusty
statistics text books come in handy - Correlation
- Use this test for quantitative variables
- Examples age, number of gifts
- Independent Samples t-Test
- Use this test for categorical variables with only
two categories - Examples gender, membership in the Greek system
- One-Way ANOVA
- Use this test for categorical variables with more
than two categories - Examples different states, taco bell hot sauce
- Linear Regression
32So We Tested and Found
- Through correlation testing it was found that UO
major giving was moderately correlated with age,
number of UO events attended and number of gifts
and pledges. - Through t-tests it was found that there were
significant differences in the major gift
averages of the following groups Greek
membership, Alumni Association membership, former
membership in one or more student activities, and
current or past membership in one or more post
college UO volunteer organizations/boards.
33Identify Scale of Measurement
A simple way of developing a rating system is by
using a nominal scale
- In this method, a one or a zero is assigned for
inclusion/exclusion of each category included in
the model - Add each individuals categories of ones and
zeros together for a total score
34Assigning Values for Categorical Data
- In the UO example, an individual would receive a
score of one for each of the following - Greek membership
- Alumni Association membership
- Former membership in at least one student
activity - Current or past membership in at least one post
college university volunteer/board membership
35Assigning Values for Quantitative Data
- Split the Quantitative Variable into groups
- a. Example Age groups - 45-54, 55-64
- Determine of the value against the total amount
of individuals in the random sample (A) - 3. Determine of how many in the value are
20,000 donors (B) - 4. Assign a score of 1 to those values in which
the B is higher than the A - 5. Assign a score of 0 to those values in which
the B is lower than the A
36Quantitative Data Example
37Testing Predictive Model
After assigning values to all predictor
variables, add up the scores for each individual
and report on your findings. UO Example
38Testing Predictive Model
39Implement Predictive Model
- Make predictive model rating easy to understand
- Example A great major donor prospects, B
good major donor prospects, C unlikely major
donor prospects, D extremely unlikely major
donor prospects - Present model to staff
- Keep presentation simple, avoid presenting
confusing jargon leadership and/or fundraisers
typically just want to know predictors how they
impact giving - Work with Information Services department to put
rating where staff can easily access it - Test model in a year or two to evaluate how well
it worked
40Questions?
- Contact Information
- Karen Matheson
- MR Strategic Services
- 615 2nd Avenue Suite 550
- Seattle, WA 98104
- (206) 447-9089
- kmatheson_at_mrss.com
41Sources/Bibliography
Lisa Howley and Karen Matheson. Data Mining
Using Queries and Statistics to Discover New
Donors (2005). APRA International
Conference. StatSoft, Inc. (2004). Electronic
Statistics Textbook. Tulsa, OK StatSoft.
lthttp//www.statsoft.com/textbook/stathome.htmlgt A
rchambault, Susan (2000). Psychology Department,
Wellesley College. Wellesley, MA.
lthttp//www.wellesley.edu/Psychology/Psych205/inde
pttest.htmlgt Information Technology Services
(2002). The University of Texas at Austin.
lthttp//www.utexas.edu/its/rc/tutorials/stat/spss/
spss2/gt Preston, Scott (2005). Oswego State
University of New York. lthttp//www.utexas.edu/it
s/rc/tutorials/stat/spss/spss2/gt Howell, David
(2002). Statistical Methods for Psychology.
Wadsworth Group Pacific Grove, CA.