Title: Improving quality of graduate students by data mining
1Improving quality of graduate students by data
mining
- Asst. Prof. Kitsana Waiyamai, Ph.D.
- Dept. of Computer Engineering
- Faculty of Engineering, Kasetsart University
- Bangkok, Thailand
2Content
- PART I
- Introduction to data mining
- Data mining technique association rule discovery
- Data mining technique data classification
- PART II
- Improving quality of graduate students by data
mining - Conclusion
3What Is Data Mining ?
- Knowledge Discovery from Data KDD (Data Mining)
- The process of nontrivial extraction of patterns
from data. Patterns that are - implicit,
- previously unknown, and
- potentially useful
- Patterns must be comprehensible for human users.
4Knowledge Discovery Process Iterative
Interactive Process
Mining Objective
5What kind of data can be mined?
- Relational databases
- Data warehouses
- Transactional databases and Flat files
- Advanced DB systems and information repositories
- Object-oriented and object-relational databases
- Spatial databases
- Time-series data and temporal data
- Text databases, multimedia databases
- Heterogeneous and legacy databases
- World Wide Web
- Bioinformatic data
6Two modes of data mining
- Predictive data mining
- Predict behavior based on historic data
- Use data with known results to build a model that
can be later used to explicitly predict values
for different data - Methods classification, prediction, etc.
- Descriptive data mining
- Describe patterns in existing data that may be
used to guide decisions - Methods Associations rule discovery, Sequence
pattern discovery, Clustering, etc.
7Data Mining Techniques
- Data Clustering
- Association rule discovery
- Data Classification
- Outlier detection
- Data regression
- Etc.
8(No Transcript)
9Data Classification
- Classification is the process of assigning new
objects to predefined categories or classes - Given a set of labeled records
- Build a model
- Predict labels for future unlabeled records
- Example
- Age, Educational background, Annual income,
Current debts, Housing location gt Making
Decision - DegreeMaster and Income7500 gt
CreditExcellent
10Three-Step Process of Classification
Training Data
Model construction
Testing Data
Classifier Model
Model Evaluation
Unseen Data
Classifier Model
Classification
11Data Mining Tools
- ANGOSS KnowledgeStudio
- IBM Intelligent Miner
- Metaputer PolyAnalyst
- SAS Enterprise Miner
- SGI Mineset
- SPSS Clementine
- Many others
- More at http//www.kdnuggets.com/software
12Data Mining Projects
- Checklist
- Start with well-defined questions
- Define measures of success and failure
- Main difficulty No automation
- Understanding the problem
- Data preparation
- Selection of the right mining methods
- Interpretation
13Using Data Mining for Improving Qualityof
Engineering Graduates
- Objective
- Discover knowledge from large databases of
engineering student records. - Discovered knowledge are useful in
- - Assisting in development of new curricula,
- - Improvement of existing curricula,
- - Helping students to select the appropriate
major
14Using a data mining technique to help students in
selecting their majors
- Motivation
- - Student major selection is very important
factor for his/her success. - - Lack of experience and information on each
major. - Solution
- - Find out the profiles of good students for
each major using student profile database and
course enrollment student databases (10 years) - - Determine the most appropriate major for each
student
15A Data Mining based Approach for Improving
Quality of Engineering Graduates
Data Mining Tool
User
student profile database
Java Servlet
course enrollment student databases
16Data for Data Mining
Stu_code Sex Address Sch_GPA ..... GPA
37058063 male Bangkok 2.5 ..... 2.3
37058167 male Songkla 3.4 ..... 3.2
........... .... ....... ...... .... ....
Student profile database
Stu_code Sub_code Term Year Grade
37058063 204111 1 2537 C
37058063 403111 1 2537 D
37058063 208111 1 2537 B
course enrollment student databases
17Data preparation a classification model
Stu_code Sex Address Sch_GPA ..... GPA
37058063 male Bangkok 2.5 ..... 2.3
37058167 male Songkla 3.4 ..... 3.2
........... .... ....... ...... .... ....
Stu_code Sub_code Term Year Grade
37058063 204111 1 2537 C
37058063 403111 1 2537 D
37058063 208111 1 2537 B
Stu_code Sex 204111 403111 GPA
37058063 male Medium Low .... 2.3
37058167 male High High ..... 3.2
....... ..... ...... ....... ..... ......
18Global Classification Model
Global Decision Tree which determines which
majors should be appropriate to which
students. Each internal node represents a test
on students profile. Each leaf node represents
an appropriate major to be selected
19Drawbacks of Global Classification Model
- - Low Precision 50 due to the large number of
majors - - Number of students is different in each
department gt the model cannot predict correctly
the best major to be selected. - - The model proposes a unique major to be
selected, a set of possible majors ordered by
appropriateness score would be preferred.
20Classification Model for Each Major
- - Decision tree predicts whether a student is
likely to be a good student in a given major. - Good students are those that graduate within 4
years and are at the first 40 ranking in a given
major. - - Leaf nodes represent two class Good and
Bad
21Advantage of Majors Classification Model
- Good precision 80
- The model predicts the best major to be selected
even if number of students in each major is
different - Its proposes a set of possible majors to be
selected ordered by appropriateness score.
Encountered problems
- Database size
- Other factors that could affect students
decision - Teacher Preference, etc.
22Presentation of Discovered Knowledge
23Applying Association rule discovery for Grade
prediction
24Grade Prediction for the Coming Term
25Presentation of Discovered Knowledge
26Conclusion Future works
- Application of data mining in Education
- Use data mining techniques for improving quality
of engineering students - Apply data mining techniques to several other
educational domains.