Assessing Students using Multivariate Statistical Tools - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Assessing Students using Multivariate Statistical Tools

Description:

Classical Discrimination is a process where we have groups that are known prior ... Logistic Discrimination ... Logistic Discrimination. With two groups the ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 40
Provided by: stat275
Category:

less

Transcript and Presenter's Notes

Title: Assessing Students using Multivariate Statistical Tools


1
Assessing Students using Multivariate Statistical
Tools
  • A Statistics Senior Seminar Presented by
  • Peter Butler

2
Overview
  • Introduction
  • Objectives and Questions
  • Techniques Used, Application, and examples
  • PCA
  • Discriminent Analysis
  • Logisitic Discrimination
  • Tree Analysis
  • Correspondence Analysis
  • Results Discovered
  • Conclusion

3
Introduction to Data and Variables to Consider
  • Sexmale/female
  • HSRHigh School Ranking
  • ACT ACT Score
  • English
  • Math
  • Reading
  • Science
  • Composition
  • TypeType of Student
  • NHSnew high school
  • NASnew advanced standing
  • IUTinter-university transfer
  • NPSnew post secondary
  • GPACumulative GPA
  • TOTCRDTotal Number of Credits
  • ProfessionalThose planning on going to a
    professional school
  • DecidedStudents who are or are not decided upon
    their major

4
Two Important Questions
  • Questions
  • What are the characteristics of different types
    of students?
  • How can we assess what we know about a student
    before they come to UMM, to tell how they will
    perform here?

5
Principal Components Analysis
  • Basic Idea
  • To describe the variation of a set of
    multivariate data in terms of a set of
    uncorrelated variables, each of which is a linear
    combination of the original variables.
  • Purpose
  • To see whether the first few components account
    for most of the variation in the original data.
  • Summarize data with little loss of information.
  • Reduce dimensionality, which can simplify future
    analysis.
  • Algebraic Representation of Principal Component
    Analysis

6
PCA Relationships
  • HSR ACTE ACTM
    ACTRD ACTSR ACTCOMP CUMGPA TOTCRD
  • HSR 1.000
  • ACTE 0.450 1.000
  • ACTM 0.501 0.799 1.000
  • ACTRD 0.399 0.859 0.749
    1.000
  • ACTSR 0.434 0.832 0.838
    0.841 1.000
  • ACTCOMP 0.481 0.901 0.871
    0.900 0.907 1.000
  • CUMGPA 0.422 0.326 0.315
    0.297 0.269 0.327 1.00
  • TOTCRD 0.287 0.184 0.187
    0.195 0.203 0.196 0.419 1.000
  • Strong Relationships Between
  • All ACT variables
  • HSR and ACTM
  • CUMGPA and HSR

7
PCA Scree Plot
  • Our scree plot shows
  • Evidence that most
  • Of our data from
  • PCA can be explained
  • By the first three
  • Principal components.

8
Principal Component Analysis
  • Component loadings
  • 1 2 3
    4 5 6
    7 8
  • HSR 0.605 -0.383 -0.579
    -0.380 -0.087 0.020 0.003
    0.005
  • ACTE 0.920 0.165 0.048
    0.061 -0.141 -0.277 0.147
    0.045
  • ACTM 0.900 0.130 -0.042
    -0.044 0.375 -0.090 -0.136
    0.052
  • ACTRD 0.903 0.188 0.130
    0.067 -0.272 0.095 -0.198
    0.071
  • ACTSR 0.919 0.194 0.098
    -0.029 0.091 0.241 0.196
    0.052
  • ACTCOMP 0.960 0.174 0.045
    0.021 -0.009 0.012 -0.020
    -0.211
  • CUMGPA 0.460 -0.683 -0.156
    0.544 0.028 0.027 0.009
    0.003
  • TOTCRD 0.320 -0.742 0.515
    -0.284 0.007 -0.022 -0.004
    -0.004
  • We can support evidence found in our scree plot
    with the data shown here. Notice how much higher
    the values are with the first few PCs, compared
    to the rest. This means that most of our data
    can be explained by these first few PCs.

9
Defining Classical Discrimination
  • Definition
  • Classical Discrimination is a process where we
    have groups that are known prior to investigation
    and the goal is to devise rules which can
    allocate previously unclassified objects or
    individuals into these groups in an optimal
    fashion.

10
Fishers Linear Discriminant Function
  • Only 60 years ago Fisher devised a solution to
    the discrimination problem for two groups with
    this linear function
  • Where the ratio of the between-group variance of
    y to its within-group variance is maximized.

11
Classical Discrimination for Professional
  • Group means
  • no yes
  • HSR 74.809 80.574
  • ACTE 22.657 22.859
  • ACTM 22.523 23.365
  • ACTRD 24.289 24.201
  • ACTSR 23.304 23.744
  • ACTCOMP 23.437 23.701
  • CUMGPA 2.933 2.842
  • TOTCRD 113.148 120.385

12
Classical Discrimination for Professional
  • Canonical discriminant functions
  • Constant 0.289
  • HSR 0.027
  • ACTE -0.000
  • ACTM 0.108
  • ACTRD -0.062
  • ACTSR 0.019
  • ACTCOMP -0.060
  • CUMGPA -1.017
  • TOTCRD 0.006

13
An Example of Classifying Students
  • Canonical scores of group means
  • no -0.075
  • yes 0.305
  • Now we calculate our
  • y-value from Fishers
  • equation.
  • If ylt0.115 then Group 1
  • If ygt0.115 then Group 2

14
Classifying a Student
  • Evaluating Student 7
  • -0.19lt0.115, so student 7 is classified in
  • Group 1.

15
Classical Discrimination for Decided
  • Group means
  • no yes
  • HSR 67.940 78.591
  • ACTE 20.743 23.343
  • ACTM 20.777 23.322
  • ACTRD 22.261 24.937
  • ACTSR 21.408 24.047
  • ACTCOMP 21.543 24.132
  • CUMGPA 2.787 2.958
  • TOTCRD 101.338 118.951

16
Classical Discrimination for Decided
  • Canonical discriminant functions
  • Constant -3.860
  • HSR 0.017
  • ACTE 0.027
  • ACTM -0.009
  • ACTRD -0.029
  • ACTSR 0.038
  • ACTCOMP 0.072
  • CUMGPA -0.059
  • TOTCRD 0.004

17
Classical Discrimination for type
  • Group means
  • IUT NAS NHS NPS
  • CUMGPA 2.989 2.829 2.929 2.971
  • TOTCRD 124.413 137.464 111.406 63.951
  • HSR 54.356 63.039 78.887 75.933
  • ACTE 18.169 19.573 23.383 23.457
  • ACTM 18.068 19.303 23.459 22.314
  • ACTRD 20.102 21.071 24.988 24.343
  • ACTSR 19.000 20.277 24.099 23.267
  • ACTCOMP 18.915 20.349 24.202 23.429

18
Classical Discrimination for type
  • Canonical discriminant functions
  • 1 2 3 4
  • Constant 2.649 0.688 -1.893 -1.274
  • CUMGPA 0.090 0.944 0.813 0.219
  • TOTCRD 0.010 -0.012 -0.000 0.000
  • HSR -0.022 -0.007 -0.004
    -0.031
  • ACTE -0.017 0.067 -0.211 -0.006
  • ACTM -0.025 -0.028 0.198 -0.036
  • ACTRD 0.010 0.018 0.094 0.111
  • ACTSR -0.024 0.021 0.002
    0.063
  • ACTCOMP -0.044 -0.141 -0.092
    -0.011

19
Logistic Discrimination
  • When we have non-normal cases or cases with
    binary variables we need a different approach for
    our analysis. This approach uses a logistic
    function to model the probability directly on an
    observation that is a member of each group.

20
Logistic Discrimination
  • With two groups the model is as follows
  • The parameters for alpha in this model are
    estimated by maximum likelihood.

21
Logistic Discrimination
  • After estimation of the parameters, the
    allocation rule is to assign to Group 1 if,
  • Assign to Group 2 if,

22
Logistic Discrimination Decided
  • Parameter Estimate S.E.
    t-ratio p-value
  • 1 CONSTANT 0.543 0.154
    3.533 0.000
  • 2 HSR -0.007
    0.001 -4.947 0.000
  • 3 ACTE -0.012 0.012
    -0.972 0.331
  • 4 ACTM 0.005 0.011
    0.456 0.649
  • 5 ACTRD 0.013 0.011
    1.185 0.236
  • 6 ACTSR -0.018 0.013
    -1.397 0.162
  • 7 ACTCOMP -0.029 0.019
    -1.570 0.116
  • 8 CUMGPA 0.012 0.051
    0.237 0.813
  • 9 TOTCRD -0.002 0.001
    -3.710 0.000

23
Interpreting Logistic Discrimination for Decided
  • Constructing a logistic discriminant model for
    decided

24
Applying Logistic Discrimination to a Student
  • Student 11
  • Since,
  • We classify student 11 to Group 2.

25
Logistic Discrimination Professional
  • Parameter Estimate S.E.
    t-ratio p-value
  • 1 CONSTANT 1.474 0.188
    7.847 0.000
  • 2 HSR -0.013 0.002
    -6.166 0.000
  • 3 ACTE -0.003 0.014
    -0.185 0.853
  • 4 ACTM -0.043 0.013
    -3.235 0.001
  • 5 ACTRD 0.022 0.013
    1.691 0.091
  • 6 ACTSR -0.010 0.015
    -0.707 0.480
  • 7 ACTCOMP 0.032 0.028
    1.134 0.257
  • 8 CUMGPA 0.406 0.058
    6.938 0.000
  • 9 TOTCRD -0.002 0.001
    -3.646 0.000

26
Logistic DiscriminationType
  • Choice Group IUT
  • Parameter Estimate S.E.
    t-ratio p-value
  • 1 CONSTANT 1.916 1.619
    1.183 0.237
  • 2 HSR 0.050 0.018
    2.732 0.006
  • 3 ACTE -0.086 0.229
    -0.375 0.708
  • 4 ACTM 0.243 0.190
    1.280 0.200
  • 5 ACTRD 0.068 0.201
    0.340 0.734
  • 6 ACTSR -0.009 0.228
    -0.040 0.968
  • 7 ACTCOMP -0.121 0.238
    -0.510 0.610
  • 8 CUMGPA 0.614 0.621
    0.989 0.323
  • 9 TOTCRD -0.016 0.009
    -1.709 0.087

27
Logistic Discrimination Type
  • Further analysis showed that NAS and NPS had no
    significant variables.
  • NHS
  • HSR, p-value0.000
  • ACTM, p-value0.047
  • TOTCRD, p-value0.000

28
Tree Analysis
  • Tree modeling
  • An exploratory technique for uncovering structure
    in data.
  • Uses a series of classification rules that are
    derived from the data by a procedure known as
    recursive partitioning and the result is a
    classification tree.

29
Tree Analysis Decided
30
Interpreting the Decided Tree
  • -First we select a group, say the first red box
    group on the left.
  • If a students ACTCOMP is less than 8, then (s)he
    is classified as an undecided student.
  • -If a students ACTCOMP is greater than 8 and
    TOTCRD is greater than 63.5, then (s)he is is
    classified as a decided student.

31
Tree Analysis Professional
32
Tree Analysis Type
33
Interpreting the Type Tree
  • Again, we select a group first (say the first red
    group on the left). A student is classified as a
    NAS here if their ACTCOMP is less than 8 and
    TOTCRD is greater than 31.5.

34
Correspondence Analysis
  • Correspondence Analysis is a technique for
    displaying the associations among a set of
    categorical variables in a type of scatter plot
    or map.
  • Objectives
  • To reveal characteristics of the data.
  • To generate hypotheses from the data to then
    test.

35
Correspondence Analysis (cont)
  • Correspondence Analysis can be viewed as a method
    for decomposing the chi-squared statistic for a
    contingency table into components corresponding
    to different dimensions.

36
Correspondence Analysis
  • From our correspondence plot we can see GPA,
    Total Credits, HSR, and ACTM are the most
    significant variables.
  • Most of the ACT variables are correlated, but not
    as significant.

37
Basic Results and Conclusion
  • HSR plays the largest role in determining how
    well a student will perform at UMM.
  • If a student is decided upon their major they are
    more likely to perform better at UMM, however we
    didnt show enough evidence to make any
    conclusions about professional.
  • We could predict what numerical variables had a
    significant effect on type, but more analysis was
    required to tell if type had a significant effect
    on how well a student will do here.

38
References
  • References
  • (1) Everitt and Dunn
  • 2001 Applied Multivariate Data Analysis. Arnold
    MPG Books
  • (2) D. Wright
  • 2002 Exploring Multivariate Statistics
  • University Press
  • (3) M. Hveisen
  • 2004 Discrimination Models. McGraw Hill
  • (4) All graphics and some analysis were done by
    SYSTAT 11

39
THANK YOU!
  • Engin Sungur
  • and
  • Jon E. Anderson
Write a Comment
User Comments (0)
About PowerShow.com