Title: Assessment
1Assessment
- Testing
- Interpreting Tests
2Two main roles
- Standardized Testing for evaluation of schools
- On-going classroom assessment
3Why we have standardized tests
- Because schools vary so much across the country
- an A in Alabama may not be the same as an
A in Conn. - There is so much grade inflation
- Designed to compare students in a fair way.
4Role of Evaluation
- Placement evaluation
- Diagnostic evaluation
- Summative evaluation
- Formative evaluation
5Placement evaluation
- Can test to see if the students need more help on
basics - Can informally figure it out as well
- May need to recommend remedial instruction for
some students
6Diagnostic evaluation
- More detailed evaluation of a student
- May require the services of specialized personnel
- May need to devise a plan to remedy serious
learning problems
7Formative evaluation
- Ongoing feedback during instruction
- Can tell what they have and havent mastered
- Do NOT use these for grades
8Summative evaluation
- Occurs at the end of a unit
- Determines how well students mastered the
objectives - Can be teacher constructed tests
9Tests
- What makes a test a test? Right and wrong
answers. - Other instruments are not tests because there is
no right or wrong answer - e.g. personality instruments
10What makes a good test?
- Primarily reliability and validity.
- (personal opinion) lack of bias
- (personal opinion) suitability for intended
takers.
11Reliability
- Means the test is consistent
- Generally affected by the number of items the
test has. - Does the test yield the same or similar scores?
12Reliability
- All of these are expressed as correlation
coefficients - . 70, etc. Closer to 1.00 the more reliable
the test.
13Validity
- Extent to which a test measures a particular
phenomenon among a particular population. - In other words, does the test measure what it is
supposed to measure?
14Context
- Validity means the test is appropriate for the
test takers - Appropriate for some purposes
- E.g. the GRE may measure your readiness to enter
grad school, it certainly does not measure
whether you should go to college or not.
15Content Validity
- Should sample the course content or skill being
assessed well - For example, if a teacher teaches a sports
history course including histories of cycling,
football, basketball, and volleyball and then
tests only on cycling, the test is not valid.
16Concurrent criterion-related validity
- Often used to find a quicker test.
- Person claims their musical test works as well as
a longer test. - Have people take both tests and see how they
score.
17Martin Seligman Test
- 1. You forget your spouses (boy/girlfriend)
birthday. - A. I am not good at remembering things.
- B. I was preoccupied with other things.
- 2. You owe the library 10 for an overdue book.
- A. when I am really involved with what I am
reading, I often forget when its due. - B. I was so involved in writing the report, I
forgot to return the book.
18Seligman continued
- 3. You lose your temper with a friend. A. He or
she is always nagging me. - B. He or she was in a hostile mood.
- 4. You are penalized for returning your tax forms
late. - A. I always put off doing my taxes.
- B. I was lazy about getting my taxes done this
year.
19Seligman continued
- 5. Youve been feeling rundown.
- A.I never get a chance to relax.
- B. I was exceptionally busy this week.
- 6. A friend says something that hurts your
feelings. - A. She always blurts things out without thinking
of others. - B. My friend was in a bad mood and took it out on
me.
20Final Seligman questions
- 7.You fall down a great deal while skiing.
- A. Skiing is difficult.
- B. The trails were icy.
- 8. You gain wt over the holidays and cant lose
it. - A. diets dont work in the long run.
- B. The diet I tried didnt work.
21Construct Validity
- Attempts to measure some trait or characteristic
- Can be hypothetical idea (not observable
directly) - Patterns of behavior
22Validating Constructs
- Might have Zen masters with low blood pressure
take an anxiety test - Expect them to score low to validate your test.
- Find people known for the construct to take your
test. - Ex. Mother Teresa takes the Manuel empathy test.
23Need to find/not find relationships
- Optimism test
- Eeyore better score low.
- Pollyanna better score high.
- Love test Tinman better score low.
- Cupid better score high.
- Wizard test H.Potter better score high.
- Wizard of Oz, low.
24Relationship between validity and reliability
- A test that is not reliable can not be valid.
- Reliability is necessary but NOT SUFFICIENT for
validity.
25Norm-referenced vs criterion-referenced tests
26Norm-referenced Tests
- NRTs - how well a student performs in comparison
with others - CSAPS, GREs, ACTs
- Usually reported in percentiles
27NRTs
- Broad breadth of content
- Compares students to other students
- Items should be fairly hard-
- average of students who get an item
correct 50.
28Criterion-referenced tests
- Measure course content
- CRTs are used to make classroom decisions about
instruction - What you develop to test your students are CRTS
29CRTs
- Should aim to have about 80 of the students get
an item correct - Narrow, aimed at a few objectives
- Score means how many right and wrong
30The Normal Distribution
31Standard Deviation
- How variable the scores are
- How much each score differs on the average from
the mean
32The Normal Curve aka Bell
- All normal curves share certain properties
- The percentage of people falling within different
ranges of scores is always the same - That is, 68 of peoples scores fall within or
one SD
33Standard Scores
34Standard Scores
- Allow us to interpret a score relative to the
scores of others and/or - Compare a students scores on various subjects.
35Z scores
- Z scores always have a mean of 0
- Z scores SD1
- Z scores tell us how many SD a persons score is
from the mean
36Calculating a Z score
- Student has a score of 90.
- Test has a mean of 140.
- SD25.
- First think will this be a negative Z or not?
- Zraw score-mean/SD
- Z90-140/25 or -50/25 - 2
37T Scores
- Invented primarily by people who dont like
negative numbers - T10z 50
- For ex T 10 (2) 50 70
- Or T 10 (-1) 50 40
38Sample problem
- Test score is 49. Average is 40. SD3.
- Z3
- What is the T score?
39IQ example
- IQ tests have an average of 100, SD15, the
person scored 95. - What percentage of people are they smarter than?
(estimate) - What is their Z score?
- What is their T score?
40St. Nicholas School for Deer
- Rudy brags to Comet that he is better than Comet
in all their subjects but they are not in the
same class. If R has a geography score of 70,
mean is 60, SD10 Comet has a score of 60, mean
is 50, SD5, whose score is higher?
41Stanines
42Stanines
- Widely used in schools
- Ranges or bands within which fixed percentages of
scores fall - Each is one-half a SD
43Stanines
- Probably widely used because they avoid
overinterpretation of a score - Student is in 2nd, 4th, 9th stanine
44Interpret these Stanine scores
- Matteo has a 6 for vocabulary, 3 for reading
comprehension, 6 for math comprehension, 7 math
application, and a 4 for spelling. - Where are his strengths? Where does he need to
improve?
45Hogwarts Example
- Harry Potter has a 3 in Divination, a 4 in
Charms, 9 in Defense against the Dark Arts, and 7
in Potions. - About what percentile is he in in Potions?
Divination?
46Percentiles
47Reported Percentiles are NOT
- Percentages. They do not say how many you got
right or wrong. - They are a rank.
48True story
- Dr. M took a civil service test to be in the Park
Service and scored 77. - Didnt expect to get a jobbut was 3rd high in
all of Yellowstone. - Next year, scored a 95.
- And didnt get a job.
- Why? Presumably because the 77 was a high ranked
score, the 95 on an easy test was not ranked
high.
49National Aptitude Test Problem
- Parents come to you distraught because their son,
an A and B H.S. student, scored at the 65th
percentile on a college aptitude test. They tell
you he has never gotten a D in his life. What do
you tell them?
50GRE Problem
- A person gets a 710 verbal score on the GRE which
has a mean of 475 and a SD of 127. What
percentile is this person in (approximately)?
51National Deer Test
- On a nation wide standardized test, Rudy scores a
650 on test with a mean of 500, SD of 100.
Approximately what is Rudys percentile score?
52Grade equivalent Scores
- Avoid these!
- NO ONE UNDERSTANDS THEM!
53Grade Equivalent Scores
- A raw score on an NRT is converted to grade
equivalent score - Reported as 4.2 (4th grade, 2nd month)
- Which is average if the student is in the 4th
grade 2nd month.
54Grade equivalent continued
- Parents in particular do not get it
- Child in 3rd grade gets a 4.3
- Parent will say that she reads at a 4th grade
level - NOT SO, she got the score that a 4th grader in
the 3rd month would have on her 3rd grade test. - Confused yet?
55GES
- Does NOT mean the child has mastered the 4th
grade content - Does mean she reads well
- Use T scores for parents or Zs
56Standard Error of Measurement
- How sure are we of that score?
57Standard Error of Measurement
- True score is the score that would be obtained if
there were no sources of error - The more reliable the test, the less error
- The SEM creates a confidence interval around an
obtained score
58Practice SEM
- SEM is 6
- Student score is an 80
- Confidence interval for the true score is 74-86
(at 68).
59SEM Confidence Intervals
- 68 of the time the score will be within or 1
S.D. - 95 of the time the score will be within or 2
S.D. - 99 of the time the score will be within or 3
S.D.
60GRE problem
- GRE has a reliability of .90
- Candidate scored 600
- GRE has SEM of 32
- Construct confidence interval for 68 of his/her
scores - 95
- 99
612nd Applied Problem
- Parents ask you should our son take the ACT
again? He needs 30 in all 4 subject areas. - ACT has scores 1-36, SEM 2 for subject areas.
- Junior has scores of 27 in Eng, 22 in Math, 23 in
Science Reasoning, 26 in Reading.
623rd Applied Problem
- Jordan wants to get into Mensa
- J needs an IQ of 130
- J takes an IQ test and gets a score of 108
- SEM is 5
- 68 confidence level can J make Mensa?
63Test Bias
64Bias in Testing
- Students from lower SES and minority families
typically score lower than WMC students - Content of IQ and other tests may reflect
middle-class experiences
65SAT Analogy item
- Runner-marathon
- envoy-embassy
- Martyr-massacre
- Oarsman-regatta
- Referee-tournament
- Horse-stable
66Just a slight shift in culture
- What is a counterpane?
- What does primogeniture mean?
- What is a lorry?
- What is a toff?
- What is a vest?
- What are knickers?