Title: Assessing Students using Multivariate Statistical Tools
1Assessing Students using Multivariate Statistical
Tools
- A Statistics Senior Seminar Presented by
- Peter Butler
2Overview
- Introduction
- Objectives and Questions
- Techniques Used, Application, and examples
- PCA
- Discriminent Analysis
- Logisitic Discrimination
- Tree Analysis
- Correspondence Analysis
- Results Discovered
- Conclusion
3Introduction to Data and Variables to Consider
- Sexmale/female
- HSRHigh School Ranking
- ACT ACT Score
- English
- Math
- Reading
- Science
- Composition
- TypeType of Student
- NHSnew high school
- NASnew advanced standing
- IUTinter-university transfer
- NPSnew post secondary
- GPACumulative GPA
- TOTCRDTotal Number of Credits
- ProfessionalThose planning on going to a
professional school - DecidedStudents who are or are not decided upon
their major
4Two Important Questions
- Questions
- What are the characteristics of different types
of students? - How can we assess what we know about a student
before they come to UMM, to tell how they will
perform here?
5Principal Components Analysis
- Basic Idea
- To describe the variation of a set of
multivariate data in terms of a set of
uncorrelated variables, each of which is a linear
combination of the original variables. - Purpose
- To see whether the first few components account
for most of the variation in the original data. - Summarize data with little loss of information.
- Reduce dimensionality, which can simplify future
analysis. - Algebraic Representation of Principal Component
Analysis
6PCA Relationships
- HSR ACTE ACTM
ACTRD ACTSR ACTCOMP CUMGPA TOTCRD -
- HSR 1.000
- ACTE 0.450 1.000
- ACTM 0.501 0.799 1.000
- ACTRD 0.399 0.859 0.749
1.000 - ACTSR 0.434 0.832 0.838
0.841 1.000 - ACTCOMP 0.481 0.901 0.871
0.900 0.907 1.000 - CUMGPA 0.422 0.326 0.315
0.297 0.269 0.327 1.00 - TOTCRD 0.287 0.184 0.187
0.195 0.203 0.196 0.419 1.000 - Strong Relationships Between
- All ACT variables
- HSR and ACTM
- CUMGPA and HSR
7PCA Scree Plot
- Our scree plot shows
- Evidence that most
- Of our data from
- PCA can be explained
- By the first three
- Principal components.
8Principal Component Analysis
- Component loadings
-
- 1 2 3
4 5 6
7 8 -
- HSR 0.605 -0.383 -0.579
-0.380 -0.087 0.020 0.003
0.005 - ACTE 0.920 0.165 0.048
0.061 -0.141 -0.277 0.147
0.045 - ACTM 0.900 0.130 -0.042
-0.044 0.375 -0.090 -0.136
0.052 - ACTRD 0.903 0.188 0.130
0.067 -0.272 0.095 -0.198
0.071 - ACTSR 0.919 0.194 0.098
-0.029 0.091 0.241 0.196
0.052 - ACTCOMP 0.960 0.174 0.045
0.021 -0.009 0.012 -0.020
-0.211 - CUMGPA 0.460 -0.683 -0.156
0.544 0.028 0.027 0.009
0.003 - TOTCRD 0.320 -0.742 0.515
-0.284 0.007 -0.022 -0.004
-0.004 - We can support evidence found in our scree plot
with the data shown here. Notice how much higher
the values are with the first few PCs, compared
to the rest. This means that most of our data
can be explained by these first few PCs. -
9Defining Classical Discrimination
- Definition
- Classical Discrimination is a process where we
have groups that are known prior to investigation
and the goal is to devise rules which can
allocate previously unclassified objects or
individuals into these groups in an optimal
fashion.
10Fishers Linear Discriminant Function
- Only 60 years ago Fisher devised a solution to
the discrimination problem for two groups with
this linear function - Where the ratio of the between-group variance of
y to its within-group variance is maximized.
11Classical Discrimination for Professional
- Group means
- no yes
- HSR 74.809 80.574
- ACTE 22.657 22.859
- ACTM 22.523 23.365
- ACTRD 24.289 24.201
- ACTSR 23.304 23.744
- ACTCOMP 23.437 23.701
- CUMGPA 2.933 2.842
- TOTCRD 113.148 120.385
12Classical Discrimination for Professional
- Canonical discriminant functions
-
- Constant 0.289
- HSR 0.027
- ACTE -0.000
- ACTM 0.108
- ACTRD -0.062
- ACTSR 0.019
- ACTCOMP -0.060
- CUMGPA -1.017
- TOTCRD 0.006
13An Example of Classifying Students
- Canonical scores of group means
- no -0.075
- yes 0.305
- Now we calculate our
- y-value from Fishers
- equation.
- If ylt0.115 then Group 1
- If ygt0.115 then Group 2
14Classifying a Student
- Evaluating Student 7
- -0.19lt0.115, so student 7 is classified in
- Group 1.
15Classical Discrimination for Decided
- Group means
- no yes
- HSR 67.940 78.591
- ACTE 20.743 23.343
- ACTM 20.777 23.322
- ACTRD 22.261 24.937
- ACTSR 21.408 24.047
- ACTCOMP 21.543 24.132
- CUMGPA 2.787 2.958
- TOTCRD 101.338 118.951
-
16Classical Discrimination for Decided
- Canonical discriminant functions
-
- Constant -3.860
- HSR 0.017
- ACTE 0.027
- ACTM -0.009
- ACTRD -0.029
- ACTSR 0.038
- ACTCOMP 0.072
- CUMGPA -0.059
- TOTCRD 0.004
17Classical Discrimination for type
- Group means
- IUT NAS NHS NPS
- CUMGPA 2.989 2.829 2.929 2.971
- TOTCRD 124.413 137.464 111.406 63.951
- HSR 54.356 63.039 78.887 75.933
- ACTE 18.169 19.573 23.383 23.457
- ACTM 18.068 19.303 23.459 22.314
- ACTRD 20.102 21.071 24.988 24.343
- ACTSR 19.000 20.277 24.099 23.267
- ACTCOMP 18.915 20.349 24.202 23.429
18Classical Discrimination for type
- Canonical discriminant functions
- 1 2 3 4
- Constant 2.649 0.688 -1.893 -1.274
- CUMGPA 0.090 0.944 0.813 0.219
- TOTCRD 0.010 -0.012 -0.000 0.000
- HSR -0.022 -0.007 -0.004
-0.031 - ACTE -0.017 0.067 -0.211 -0.006
- ACTM -0.025 -0.028 0.198 -0.036
- ACTRD 0.010 0.018 0.094 0.111
- ACTSR -0.024 0.021 0.002
0.063 - ACTCOMP -0.044 -0.141 -0.092
-0.011
19Logistic Discrimination
- When we have non-normal cases or cases with
binary variables we need a different approach for
our analysis. This approach uses a logistic
function to model the probability directly on an
observation that is a member of each group.
20Logistic Discrimination
- With two groups the model is as follows
- The parameters for alpha in this model are
estimated by maximum likelihood.
21Logistic Discrimination
- After estimation of the parameters, the
allocation rule is to assign to Group 1 if, - Assign to Group 2 if,
22Logistic Discrimination Decided
-
- Parameter Estimate S.E.
t-ratio p-value - 1 CONSTANT 0.543 0.154
3.533 0.000 - 2 HSR -0.007
0.001 -4.947 0.000 - 3 ACTE -0.012 0.012
-0.972 0.331 - 4 ACTM 0.005 0.011
0.456 0.649 - 5 ACTRD 0.013 0.011
1.185 0.236 - 6 ACTSR -0.018 0.013
-1.397 0.162 - 7 ACTCOMP -0.029 0.019
-1.570 0.116 - 8 CUMGPA 0.012 0.051
0.237 0.813 - 9 TOTCRD -0.002 0.001
-3.710 0.000
23Interpreting Logistic Discrimination for Decided
- Constructing a logistic discriminant model for
decided
24Applying Logistic Discrimination to a Student
- Student 11
- Since,
- We classify student 11 to Group 2.
25Logistic Discrimination Professional
- Parameter Estimate S.E.
t-ratio p-value - 1 CONSTANT 1.474 0.188
7.847 0.000 - 2 HSR -0.013 0.002
-6.166 0.000 - 3 ACTE -0.003 0.014
-0.185 0.853 - 4 ACTM -0.043 0.013
-3.235 0.001 - 5 ACTRD 0.022 0.013
1.691 0.091 - 6 ACTSR -0.010 0.015
-0.707 0.480 - 7 ACTCOMP 0.032 0.028
1.134 0.257 - 8 CUMGPA 0.406 0.058
6.938 0.000 - 9 TOTCRD -0.002 0.001
-3.646 0.000
26Logistic DiscriminationType
- Choice Group IUT
-
- Parameter Estimate S.E.
t-ratio p-value - 1 CONSTANT 1.916 1.619
1.183 0.237 - 2 HSR 0.050 0.018
2.732 0.006 - 3 ACTE -0.086 0.229
-0.375 0.708 - 4 ACTM 0.243 0.190
1.280 0.200 - 5 ACTRD 0.068 0.201
0.340 0.734 - 6 ACTSR -0.009 0.228
-0.040 0.968 - 7 ACTCOMP -0.121 0.238
-0.510 0.610 - 8 CUMGPA 0.614 0.621
0.989 0.323 - 9 TOTCRD -0.016 0.009
-1.709 0.087
27Logistic Discrimination Type
- Further analysis showed that NAS and NPS had no
significant variables. - NHS
- HSR, p-value0.000
- ACTM, p-value0.047
- TOTCRD, p-value0.000
28Tree Analysis
- Tree modeling
- An exploratory technique for uncovering structure
in data. - Uses a series of classification rules that are
derived from the data by a procedure known as
recursive partitioning and the result is a
classification tree.
29Tree Analysis Decided
30Interpreting the Decided Tree
- -First we select a group, say the first red box
group on the left. - If a students ACTCOMP is less than 8, then (s)he
is classified as an undecided student. - -If a students ACTCOMP is greater than 8 and
TOTCRD is greater than 63.5, then (s)he is is
classified as a decided student.
31Tree Analysis Professional
32Tree Analysis Type
33Interpreting the Type Tree
- Again, we select a group first (say the first red
group on the left). A student is classified as a
NAS here if their ACTCOMP is less than 8 and
TOTCRD is greater than 31.5.
34Correspondence Analysis
- Correspondence Analysis is a technique for
displaying the associations among a set of
categorical variables in a type of scatter plot
or map. - Objectives
- To reveal characteristics of the data.
- To generate hypotheses from the data to then
test.
35Correspondence Analysis (cont)
- Correspondence Analysis can be viewed as a method
for decomposing the chi-squared statistic for a
contingency table into components corresponding
to different dimensions.
36Correspondence Analysis
- From our correspondence plot we can see GPA,
Total Credits, HSR, and ACTM are the most
significant variables. - Most of the ACT variables are correlated, but not
as significant.
37Basic Results and Conclusion
- HSR plays the largest role in determining how
well a student will perform at UMM. - If a student is decided upon their major they are
more likely to perform better at UMM, however we
didnt show enough evidence to make any
conclusions about professional. - We could predict what numerical variables had a
significant effect on type, but more analysis was
required to tell if type had a significant effect
on how well a student will do here.
38References
- References
- (1) Everitt and Dunn
- 2001 Applied Multivariate Data Analysis. Arnold
MPG Books - (2) D. Wright
- 2002 Exploring Multivariate Statistics
- University Press
- (3) M. Hveisen
- 2004 Discrimination Models. McGraw Hill
- (4) All graphics and some analysis were done by
SYSTAT 11
39THANK YOU!
- Engin Sungur
- and
- Jon E. Anderson