Title: Predict student behavior to increase retention
1Predict student behavior to increase retention
- Online seminar presented by
- Jing Luan, Ph.D., Cabrillo College
- Bob Valencic, SPSS Inc.
- August 22, 2002
2Seminar agenda
- Business issues in higher education
- How to predict student behavior and increase
retention? - Data mining concepts
- Data mining methods
- Case studies
- Getting started on data mining
- QA
3Higher education business issues
- Institutional effectiveness
- Student learning outcome assessment
- Enrollment management
- Achieving optimum attraction, retention and
persistence goals - Marketing
- Increasing competition for students
- Alumni
How can data mining help?
4Institutional effectiveness
Getting to know your students
- Which students make greatest use of institutional
services? - What courses provide high full-time equivalent
students (FTES) and allow better use of space? - What are the patterns in course taking?
- What courses tend to be taken as a group?
5Enrollment management
Helping your students succeed
- Who are our best students?
- Where do our students come from?
- Who is most likely to return for another
semester? - Who is most likely to fail or drop out?
6Marketing
Making the best use of tight budgets
- Who is most likely to respond to our new
campaign? - Which type of marketing/recruiting works best?
- Where should we focus our advertising and
recruiting?
7Alumni
Continuing the relationship
- What are the different types/groups of alumni?
- Who is likely to pledge, for how much, and when?
- Where and on whom should we focus our fundraising
drives?
8Our focus today Predicting student behavior
- Acquiring new students
- Retaining students
- Increasing persistence to and beyond graduation
9Data mining defined
- The process of discovering meaningful new
correlations, patterns, and trends by sifting
through large amounts of data stored in
repositories and by using pattern recognition
technologies as well as statistical and
mathematical techniques. - The Gartner Group
10Another definition
- Simply put, data mining is used to discover
patterns and relationships in your data in order
to help you make better business decisions. - Robert Small, Two Crows
11CRISP-DM
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
12Two types of data mining
- Supervised
- Purpose For classification and estimation
- Algorithms
- C5.0
- CRT
- Neural
- Network, etc.
- Unsupervised
- Purpose For clustering and association
- Algorithms
- Kohonen
- Kmeans
- TwoStep
- GRI, etc.
13Algorithm vs. model
- Algorithm
- A technical term describing a specific
mathematically driven data mining function
- Model
- A set of representative rules, behaviors or
characteristics against which data are analyzed
to find similarities
14Neural networks
- Synonymous with Machine Learning
- Identifies complex relations
- Somewhat difficult to interpret
- Long computation times
15Decision trees
- Easy to interpret
- - income lt 40K
- job gt 5 yrs then yes
- job lt 5 yrs then no
- - income gt 40K
- high debt then no
- low debt then yes
16Apriori
- Discovers events that occur together
- Often called market basket analysis
- Example What groups classes do certain students
take in the same semester that may impact
facilities and course scheduling?
17Kohonen network
- Seeks to describe dataset in terms of natural
clusters of cases - Example identify similar groups of students
18Case study using Clementine
- Predicting student persistence
19Examining data
20Clustering using TwoStep
21Building models for persistence in streams
A node is being executed (notice the red arrows
denoting the flow of data.
22Seeing the work of neural thinking
Graphic display showing an ANN is learning the
data.
23Results of neural node
These are the outputs of the Neural Networks.
Overall accuracy and significance of features
(left). Predicted number of policies using fresh
data vs. known data (above).
24Examining C5.0
The control panel of the C5.0 node, (Expert)
25Results of C5.0 node
View the prediction by individual records (PNXT
vs. C-PNXT).
View the overall prediction accuracy.
26Comparing CRT and C5.0
Use the Analysis node to examine the difference
in accuracy for CRT and C5.0.
27Which one is betterCRT C5.0
C5.0 has an accuracy rate of 66.3 and CRT
63.7. They agree 72 of the time.
28Visualizing Results
29Visualizing Results
30Scoring new data
Moment of truth. The most powerful feature of
data mining is to use learned rules to predict
(score) using fresh data for business purposes.
Shown here is the change of dataset to a fresh
data set unseen by Clementine before now.
31Using models to score new data
Model Results
Scored Results
32Additional case study
Predicting the behavior of transfer students
- How best to identify future transfer students so
college can groom them? - What can a community college do to increase
transfer rates? - Using decision tree models, the top rule for
successful transfers was taking more than 12
units, taken less than 5 non-transfer courses,
must have taken at least one math course.
33Getting started
Evaluate data mining software
- Company stability and customer feedback
- User interface
- Scalability
- Server/Client
- Modeling capacities
- Learning curve
- Join a listserv, such as CLUG
- Cost
34Getting started
Develop a data mining plan for your institution
- Determine business needs
- Determine technology infrastructure and
management support - Identify mining area and business problems
- Determine data source(s)
- Invite an expert to jump start
- Pilot test mining results
- CRISP-DM and Real-time data mining, Knowledge
Discover in Databases (KDD)
35Want to Learn More?
- Full training course descriptions at
- www.spss.com/training
- Contact us or one of our other data mining
experts by calling 800-543-5815. - Check out the Knowledge Management/Data Mining
Discussion Group - http//www.kdl1.com/kmdm
- Obtain the book, Knowledge Management Building
A Competitive Advantage in Higher Education,
published by Jossey-Bass - http//josseybass.com/cda/product/0,,0787962910,00
.html - Bob Valencic rvalencic_at_spss.com
- Jing Luan jing_at_cabrillo.edu
36Thank you!
- Predict student behavior to increase retention
- 2nd Annual Public Sector Roadshow
- October 15 in Washington, D.C.
- www.spss.com/psroadshow