Title: Developing Alternate Assessment Technical Adequacy
1Developing Alternate Assessment Technical Adequacy
- ASES DAATA Project
- Large-Scale Assessment Conference
- June 20, 2005
2Enhanced Assessment Instrument Grant
- Title VI, Part A, Subpart 1,
- Section 6112
3-
- West Virginia is the fiscal agent.
4Preview
- Overview and purposes of Project DAATA- Beth
Cipoletti - Role of ASES/CCSSO Sandra Warren
- Content Validity Paper/Generalizability
Study/Website Jerry Tindal - Hanbook Prospectus Pat Almond
- State Perspective Dan Farley
- Reflections/Questions and Answers Jan Barth
5- DAATA Project Overview
- Beth Cipoletti
- DAATA Project Director
- Office of Student Assessment Services, WVDE
6Overview of Project DAATA
- Develop systems for states to engage in
self-studies in order to determine the degree of
technical merit in their assessment instruments - Assemble exemplary measures, forms and protocols
to help states develop or adopt the requisite
instrumentation to conduct a self-study on their
alternate assessments - Propose a reporting system to communicate results
and to use them to inform decisions
7Purposes of Project DAATA
- Support state efforts to prepare for and respond
to the NCLB reviews - Develop a process for states to ensure the
technical adequacy of their alternate assessments
so that measurement has implications for
instruction - Address three types of alternate assessments
(performance events, portfolios, and
observations) to assure products are applicable
to all states - Disseminate products via NASDSE, NCEO, RRCs and
CCSSO
8Work Scope
9DAATA Project Topics for Review
- Content validity
- The development of measures that are aligned with
learning and lead to valid inferences within
specific domains of performance - Generalizabilty
- The partitioning of variance that helps explain
performance and often refers to them as facets
which can include the type of task used to
measure students, the raters who judge them, or
the multiple occasions students are tested
10Topics (cont.)
- Reliability
- The consistency of measures over multiple
administrations and scorings (over time,
occasion, form, person, or item) and is a
necessary prerequisite to establishing validity
of interpretations - Criterion and predictive validity
- The relationship among measures of performance
that form a coherent and interpretable construct
and help define the meaning of a measure in both
convergent and divergent ways
11Topics (cont.)
- Consequential validity
- The impact of measures within the context of
practice that addresses individuals and social
outcomes from the use of assessment systems to
make decisions about students, teachers,
administrators, institutions and programs
12Accomplishments to Date
13Content Validity Study
- Active Participating States (APS) included
- Arkansas, Maryland, Michigan, New Mexico,
Washington, and West Virginia - The APS shared the following alternate assessment
materials - administration manual, scoring manual, technical
reports and samples of alternate assessments.
14Content Validity (cont.)
- The actual study of content related validity
evidence supporting alternate assessments was
conducted in April and May. Approximately 65
teachers in 7 states collected classroom
artifacts (instructional program plans, student
work samples, perception surveys) and submitted
alternate assessments. These materials are being
analyzed for alignment and consistencies in
opportunity to learn.
15Case Studies
- Researchers at the University of Oregons
Behavioral Research and Teaching (BRT) analyzed
the data collected from the APS to assemble case
studies documenting the breadth and depth of
different states alternate assessments. - Each APS verified the accuracy of the
researchers analysis during the January 2005
meeting in Orlando.
16Case Studies (cont.)
- Each case study includes the following
- Section 1 Test Development, Administration and
Scoring - Perspective/theory background
- Overview purpose
- Definitions/glossary key words and terms
- Type of Assessment
- Domain-Sampling Plan description of all
possible tasks - Test Specifications/Blue Print description of
target population, format of tasks and content
17Case Studies (cont.)
- Administrative Procedures directions to collect
student work, purpose - Items/Tasks (format and amount) setting,
context - Scoring method to assign value to students
response - Score Metric method to aggregate and implement
decision rules to combine scores
18Case Studies (cont.)
- Section II Study of Alignment with State
Content Standards - Application of alignment procedures
- Categorical Concurrence
- Depth of Knowledge
- Range of Knowledge
- Balance of Representation
- Standard Setting
- Analysis and interpretation of findings from
study with recommendations for next steps
19Content Validity Study (cont.)
- The content validity technical paper was revised
to provide a more focused direction for states - Three dimensions were highlighted
- Domain for sampling tasks and behaviors,
including the specifications for representation - Alignment between alternate assessments and state
content standards - Linkages with classroom opportunity to learn
and the overlap or under lap between the tasks on
the alternate assessment and those practiced in
the classroom
20Content Validity Study (cont.)
- Draft content validity technical paper has
undergone an electronic review - Reviewers
- Eileen Ahearn, NASDSE,ASES
- Sue Bechard, Measured Progress, ASES
- Dan Farley, New Mexico, ASES
- Aran Felix, Alaska, ASES
- Carolee Gunn, Utah, ASES
- Laurie Davis, Pearson, TILSA
- Gretchen Ridgeway, DODEA, TILSA
21Content Validity Study (cont.)
- Content validity technical paper Focus Group
Reviewers - Eileen Ahearn, NASDSE,ASES
- Sue Bechard, Measured Progress, ASES
- Carolee Gunn, Utah, ASES
22Generalizability Study
- The study involved 65 teachers and 75 students
in seven states. The generalizability study will
provide estimates of reliability associated with
specific facets of our measurement process. A
careful empirical study of dominant types of
alternate assessments will provide comparative
estimates of measurement validity
23Generalizability Study
- Active Participating States Alaska, Iowa, New
Mexico, Oregon, Utah, Washington, and West
Virginia - Teachers were asked to collect the following
data - Copies of IEPs
- Completed Instructional Surveys
- Student classroom work
- Copies of Alternate Assessments
24Generalizability Study (cont.)
- Language Perception Assessment Survey
- Rates students on communication skills commonly
used during daily living and attendance at school - Four skill levels
- Traditional
- Beginning
- Emerging
- Pre-Emergent
- Modes of communication may include
verbalizations, sign language and or augmentative
and alternative communication systems
25Generalizability Study (cont.)
- Reading Performance Assessment
- Teachers administered to students in May
- Common set of tasks that reflect both receptive
and expressive dimensions - Letter and word recognition
- Comprehension
- Students will be administered both types of tasks
- Performance will be rated by trained judges on
proficiency of reading skill - Order of tasks and forms were randomly assigned
26Generalizability Study (cont.)
- Data obtained from study will be used to
- Correctly estimate the performance error term
- Identify the sources of error variance associated
with each of the study facets
27Reviewers for Generalizability Technical Paper
- Mary Roan, North Carolina
- Sharon Hall, Maryland
- Brian Touchette, Delaware
- Sue Bechard, Measured Progress
- Betsy Case, Harcourt
- Sheryl Lazarus, NCEO
28Reliability Study
- APS include Michigan, Connecticut, New Mexico,
West Virginia, Maryland, Texas, and Delaware - APS will participate in the following ways
- Supply samples of training and scoring materials
- Administer alternate assessment in the fall and
re-administer at a later date in order to measure
stability - Submit extant data stripped of personally
identifiable data
29Reliability Study (cont.)
- Results for each type of alternate assessment
(portfolio, observation and performance) will be
analyzed for the following reliability
characteristics - Quality Administration
- Fidelity of Administration
- Inter-rater Reliability
- Internal Consistency
30Reliability Study (cont.)
- Quality Administration
- Analysis of teacher training materials and
scoring training materials (where applicable) - Fidelity of Administration
- Administration procedural conformity
- Inter-rater Reliability
- Scoring Agreement
- Internal Consistency
- Item cohesiveness in measuring a single construct
31- Role of ASES and CCSSO
- Sandra Warren
- CCSSO Consultant
32Organization Chart
33ASES DAATA Project Members
- ASES Study Groups
- DAATA Project Director, Beth Cipoletti, West
Virginia - Researcher, Gerald Tindal, University of
Oregon/Behavioral Research and Teaching (BRT) - Researcher, Pat Almond
- BRT Research Staff
- EdProgress
- Technical Advisory Committee
- ASES Coordinator (CCSSO), Sandra Warren
- CCSSO, Mary Yakimowski
34DAATA Management Team
- Role
- Oversee research and work of the project
- Members
- Beth Cipoletti, Project Director
- Gerald Tindal, Researcher
- Pat Almond, Researcher
- Sandra Warren, CCSSO Consultant
35ASES Member Involvement
- Roles
- Study group members
- Research
- Professional Development and Communication
- Policy to Practice
- Active Participating States (APS)
- Reviewers
36DAATA Technical Advisory Committee
- Members
- Diane Browder, University of North Carolina,
Charlotte - Tom Haladyna, Arizona State University West
- Naomi Zigmond, University of Pittsburgh
37Next Steps
38Project DAATA Schedule
39- Content Validity,Research and Website
- Jerry Tindal
- BRT, University of Oregon
40Research Components of DAATA
41State and Student Case Studies
- Student level provides rich descriptions of
students within the context of state standards,
instructional programs and alternate assessments. - State level cases provide contextual information
on development and implementation of an alternate
assessment.
42State Case Study
- Assemblage of Content Evidence Descriptors
- Definition of (conceptual/theoretical) approach
- Definition of items
- Procedural evidence and documents
- Alignment of assessment with standards
43Evidence based on Content Student Case Studies
- Instructional Program Form
- Instructional Survey (21 items)
- Collection of Work Samples
- Alternate Assessments
- Language Survey
44Example Student Descriptors
- a happy child who is very expressive and willing
to try new activities. - interest in computers, movies, and music and
lives in a rural area where he has a horse. - spends almost half his day removed from the
general education classroom - has a seizure disorder and shunts that need to be
monitored at all times as well as the safety in
his environment. - requires a health plan and requires special
transportation
45Example Student Educational Program
- He spent 60 minutes per day in a special
education setting and was provided
accommodations, specially designed instruction,
supplementary aids and services, supports for
school personnel, and support for related
service. - given small group and individual instruction in
reading activities that were broken down and
repeated, an associate to provide guidance and
monitor his seizures and toileting. - Mike is given picture cues to help him transition
throughout the day
46Example Student Skills
- could recognize a few names of family members and
his own name but that he had to be prompted
because he wanted to just name the first letter. - could recognize 4 words at 80 accuracy (and had
a goal to identify 28 words).
47Example Student IEP
- 1. In 36 weeks in any given setting, __ will
transition from activities and places in the
building without exhibiting behaviors - 2. Student will demonstrate and state quantity,
special relationships, and attributes at 80 - 3. Student will identify the ending sounds and
sound out beginning reading words at 80. - 4. Student will answer comprehension questions
and be able to retell story in sequential order. - 5. Student will demonstrate and verbalize math
concepts beginning addition, telling time,
identify coins and values. - 6. Student will write upper case and lower case
letters without a model and write reading words. - 7. Student will follow the direction without 80
compliance throughout the school day.
48District Standards
- read words using suffixes, prefixes, and context
clues. - reads, interprets, and responds to a variety of
literacy and informational texts, with the
district age-appropriate grade level benchmark
requiring him to analyze story elements (e.g.,
characters and settings) finally, the benchmark
or extended benchmark was for student will
identify story elements
49Example Student Alt Assess
- Judgment of reading reflects no achievement in
reading breadth and depth age-appropriate and
curriculum-based in difficulty - exhibits 81-100 independence in use of
adaptations, demonstration of self-determination
and transfer or generalization to 4 or more
settings.
50Example Student Alt Assess
- A list of 6 words (fish, see, and, ball, car, and
yellow) - Six cards with both a phrase and a picture (a
yellow car, a car, a yellow horse, a horse and a
car, a horse, and a car and a horse). - demonstration sheet in reading showed that
student could attend to a literacy activity, read
the summary of the first section, find 4/5
different words, identify the main idea or
character, find 5 different words in the second
section 5/5 times, tell what the story was about,
and self-evaluate on finding words. - When given a passage and asked questions, student
had to be prompted to name two girls and boys, he
was correct in stating his age, the color of his
hair and shirt, and describe what he does in P.E
51Example Student Work Samples
- Two pages with a sentence (A boy and I see the
airplane) with words listed below in three
columns (box, green, in, chicken, little, put)
and nonsense words. - A sheet with the words and phrases written the
ball, I see a little car, a horse and a little
horse, yellow fish, a boy in yellow, fish, see
the airplane, a box, a boy and a horse, and a boy
and a little boy. - A sheet with a picture of a girl handing a horse
a carrot and a boy hold a chicken out of an open
cage. Two sentences appear below the picture A
put the chicken in the yellow box. I see the
little girl and a horse. - A sheet with a list of words little green,
airplane, see, chicken, yellow, the fish, put, a
girl, car, ball, I, and box. - A sheet with beginning consonants each set of 3
consonants had a picture above them. For example,
p, m, n had a caricature of the moon, v, t, s had
a saddle.
52Example Student Program Form
53Generalizability Study
- All students take all types (expressive,
receptive), all types have a both forms (A, B).
Six raters trained on state standards - Tasks Symbol meaning, letter names, word
reading, sentence reading, passage reading
(including syntax), and passage comprehension - Facets Tasks, Forms (occasions), and Raters
54State Standards
- Analyze words, recognize words, and learn to read
grade-level text fluently across the subject
areas - Listen to, read, and understand a wide variety of
informational and narrative text across the
subject areas at school and on own, applying
comprehension strategies as needed. - Demonstrate word knowledge through systematic
vocabulary development determine the meaning of
new words by applying knowledge of word origins,
word relationships, and context clues - Demonstrate general understanding of grade-level
informational text across the subject areas. - Develop an interpretation of grade-level
informational text across the subject areas . - Examine content and structure of grade-level
informational text across the subject areas
55Reliability
- Standard 2.4
- Each method of quantifying the precision or
consistency of scores should be described clearly
and expressed in terms of statistics appropriate
to the method. The sampling procedures used to
select examinees for reliability analyses and
descriptive statistics in these samples should be
reported (p. 32). - Standard 2.5
- A reliability coefficient or standard error of
measurement based on one approach should not be
interpreted as interchangeable with another
derived by a different technique unless their
implicit definitions of measurement error are
equivalent (p. 32).
56Reliability Studies
- Inter-judge agreements
- interview a sub-sample of teachers on the test
administration in the field. Administration - analyze results for each type of measure
(portfolio, observation, and performance) for
administration conditions. Administration - Each type of alternate assessment (portfolio,
observation, and performance) independently
obtain another sample. Test-Retest - rescore alternate assessment protocols. Alternate
Form
57Reliability Studies
- What kind of reliability evidence supporting
alternate assessments can be documented by
states (a) coefficients derived from parallel
forms in independent testing sessions
(alternate-form coefficients) (b) coefficients
obtained by administration of the same instrument
on separate occasions (test-retest or stability
coefficients) and (c) coefficients based on the
relationships among scores derived from
individual items or subsets of the items within a
test, all data accruing from a single
administration (internal consistency
coefficients) (Educational Standards, 1999, p.
27). - Agree to participate by taking a survey ALL
states in ASES - Agree to pony up a directory and sample of records
58Validity Studies
- Internal Structures
- Response Processes
- Nomological Networks
- Consequences
59WEBSITE
60(No Transcript)
61(No Transcript)
62(No Transcript)
63Web Site Information
- http//www.DAATA.org
- Final DAATA Documents
- Links to Measurement on Alternate Assessment
- Assessing Special Education Students (ASES)
Membership - Minutes from Leadership and Management Teams
(secure) - Calendar of Upcoming Events
- State Technical Adequacy Documents
64- Handbook Prospectus
- Patricia Almond, PhD
- Associate in Research
- Behavioral Research and Teaching
65Handbook
66Handbook Prospectus
- Handbook for
- Developing Alternate Assessment Technical
Adequacy (DAATA) - Producing Documentation for States
- Alternate Assessments for
- Students with Significant Cognitive Disabilities
67Purpose for Handbook
- To assist states in documenting technical
adequacy alternate assessments to - respond to the technical adequacy requirements in
federal legislation - establish an ongoing continuous improvement cycle
with evidence to monitor assessment quality
68Intended Audience
- Several specific groups were considered in
developing the contents of this handbook - State Education Offices or Divisions Responsible
for Large-Scale Assessment - Special Education Offices or Division
- Assessment Technical Advisory Committees
- Special Education Advisory Committees
- State Vendors
69Table of Contents
- Section I Technical Documentation for Alternate
Assessment - Section II What to do and how to
proceedDetailed Guidance for States - Section III Implications for Continuous
Improvement and Informing Policy
70Section I Technical Studies for Alternate
AssessmentChapters
- 1 Applying Testing Standards in a Fresh Context
- 2 The Challenges of NCLB and IDEA
- 3 Construct Validitythe Organizing Concept
- 4 Content Validity
- 5 Sources of Variance and Generalizability
- 6 ReliabilityRater Agreement and More
- 7 Criterion and Predictive Validity
- 8 Consequential Validity
71Section II What to do and how to
proceedChapters
- 9 Step-by-Step Self-Study Guides
- 10 Alignment to Standards Plus
- 11 Addressing Variance and Generalizability
- 12 Reliability Rater Agreement, Internal
Consistency, and Fidelity - 13 Criterion and Predictive Validity--Interpreti
ng the ResultsAchievement and Growth - 14 Consequential Validity So What? Benefits
for Students and Educators - 15 Stories from the StreetState Examples
72Section III Continuous Improvement Informing
PolicyChapters
- 16 Making a Plan for Communication
- 17 Presenting Findings to Your Technical
Advisory Committee - 18 Continuous Improvement Cycle
- 19 Technical Report
73Process
- Each study group oversees one section
- A writer will produce components for review and
feedback. - Study group members provide input at regularly
scheduled meetings or via email, conference call,
website
74Review cycle
- Develop Draft
- Submit to Study Group for Review
- Capture comments and recommendations
- Revise draft based on comments and
recommendations - Repeat Cycle
75DAATA Website will post
- Components as they are ready
- Related resources
- Updates to the Table of Contents
- Newsletters and progress updates
www.daata.org
76State Perspective
77Project DAATAA Perspective from New Mexico
- Dan Farley
- Assessment Consultant
- Special Education Bureau
- June 20, 2005
78NM Alternate Assessments
- Original NM Alternate Assessment
- Language Arts (Reading and Writing)
- Mathematics
- Science
- Social Studies
- NM Alternate Assessment for Reading
- DIBELS?
- NM Alternate Assessment for Writing (NMAC)
79Benefits of Participation
- Provides NM with useful technical adequacy
reports (did anyone say Peer Review?!?!) - Allows us to build stronger connections with
teachers and Department personnel (Now I have
some names) - Professional development opportunity for everyone
involved (mostly myself and probably not Jerry) - Contextualizes technical vocabulary words that
Ive seen defined in a myriad of ways - Because the final product of the grant will
influence, if not define, technical adequacy
requirements for alternate assessments, why not
get involved on the front end? (did anyone say
future Peer Reviews?!?)
80Content Related Evidence
- Gathered the following information to submit to
DAATA - Grade level content standards (with Expanded
Performance Standards) - Performance descriptors
- Alternate achievement standards (cut scores)
- Standard setting report
- Test Administration manual
- Sample score report forms
- Decision rules (scoring metric)
- Online training course
- Alternate Assessment FAQs document
81Content Related Evidence
- What NMPED received in return
- Content related evidence report
- Assessment Development
- Elaborated what the researchers found regarding
our NM Alternate Assessment perspective - Overview
- Whoops, no NMPED glossary.
- Thank goodness for ASES, who just published one!
82Content Related Evidence
- What NM received in return (cont.)
- Content related evidence report
- Instrumentation
- Type of assessment
- Domain sampling plan
- Test specifications-blueprint
- Administration
- Items (format and amount)
- Scoring
- Metric
- Standard Setting
83Content Related Evidence
- What NM received in return (cont.)
- Content related evidence report
- Alignment with standards
- Categorical Concurrence
- Depth of knowledge
- Range of knowledge
- Balance of representation
- Reporting Level and student reports
- Reporting system
- Student report
- Interpretation guide
- Report process and protocol
84Moving forward
- New Mexico is also participating in the DAATA
sources of variance study- see me if you have
any state-related questions. - Get involved in EAGs in your state the learning
curve is steep, but needs to be traversed. - If youre an ASES state, please assist DAATA with
the Reliability study!!! - Thank you!
85- Reflections/Q A
- Jan Barth, Executive Director
- Office of Student Assessment Services, WVDE