Title: EDUCATION RESEARCH MEETS THE GOLD STANDARD:
1EDUCATION RESEARCH MEETS THE GOLD STANDARD
- STATISTICS, EDUCATION, AND RESEARCH METHODS AFTER
NO CHILD LEFT BEHIND - Mack C. Shelley, II
- Iowa State University
- mshelley_at_iastate.edu
- Presented at the Joint Statistical Meetings,
August 7-11, 2005, Minneapolis, MN
2Background
- This session is meant to help inform the national
debate over the role of scientific standards for
research in education, particularly as those
research standards are influenced by statistical
methods and theory. - This session builds on a National Science
Foundation award to myself and Brian Hand
(University of Iowa).
3Background
- The panel is designed to meld research interests
in statistics, education, and related
disciplines, and to discuss the dramatically
changing context of contemporary education
research. - Why, exactly, is the context changing for
statistical research in education?
4Background
- Standards for acceptable research in education
are affected greatly by - the recent creation of the Institute of Education
Sciences in the U.S. Department of Education - passage of the No Child Left Behind Act of 2001,
and - Passage of the Education Sciences Reform Act
(H.R. 3801) in 2002
5Background
- Together, these developments
- have reconstituted federal support for research
and dissemination of information in education - are meant to foster scientifically valid
research, and - have established what is referred to as the gold
standard for research in education.
6Background
- These and other developments denote that greater
education research emphasis now is placed on - quantification,
- the use of randomized trials, and
- the selection of valid control groups
7Background
- This panel is intended to be part of a sustained
and expanded dialogue - between the statistical community and those who
implement the education research agenda - through a discussion of whether and how to
implement the new standards for statistical work
in the field of education research
8What Is The Gold Standard?
- U.S. Department of Education, Institute of
Education Sciences, National Center for Education
Evaluation and Regional Assistance - Identifying and implementing educational
practices supported by rigorous evidence A user
friendly guide - http//www.ed.gov/about/offices/list/ies/news.html
guide
9What Is The Gold Standard?
- This publication emphasizes
- evidence-based interventions
- educational outcomes that have been found to be
effective in randomized controlled trials - researchs gold standard for establishing what
works - following patterns of evidence use in medicine
and welfare policy
10What Is The Gold Standard?
- The quality of studies needed to establish
strong evidence requires - randomized controlled trials that are
well-designed and implemented - that the quantity of evidence needed spans trials
showing effectiveness in two or more typical
school settings - including a setting similar to that of
schools/classrooms
11What Is The Gold Standard?
- Possible evidence may include
- randomized controlled trials whose
quality/quantity are good but fall short of
strong evidence - and/or comparison-group studies in which the
intervention and comparison groups are very
closely matched - in academic achievement, demographics, and other
characteristics
12What Is The Gold Standard?
- Evaluating whether an intervention is backed by
strong evidence of effectiveness hinges on - well-designed and well-implemented randomized
controlled trials - demonstrating that there are no systematic
differences between intervention and control
groups before the intervention - the use of measures and instruments of proven
validity - real-world objective measures of the outcomes
the intervention is designed to affect - attrition of no more than 25 of the original
sample - effect size combined with statistical
significance - an adequate sample size to achieve statistical
significance - controlled trials implemented in more than one
site in schools that represent a cross-section of
all schools
13No Child Left Behind
- Public Law 107110 H.R. 1
- passed on January 8, 2002
- An Act to close the achievement gap with
accountability, flexibility, and choice, so that
no child is left behind - the No Child Left Behind Act of 2001 (NCLB)
- established standards for academic assessments in
mathematics, reading or language arts, and
science - multiple up-to-date measures of student academic
achievement, including measures that assess
higher-order thinking skills and understanding - These requirements for program assessment lead to
many opportunities and circumstances for the
application of statistical methods.
14No Child Left Behind
- The research program under NCLB was designed to
examine the effect of the assessment and
accountability systems on students, teachers,
parents, families, schools, school districts, and
States, including correlations between such
systems and - student academic achievement
- progress toward meeting the State-defined level
of proficiency - progress toward closing achievement gap changes
in course offerings, teaching practices, course
content, and instructional material - teacher, principal, and pupil-services personnel
turnover rates - student dropout, grade-retention, and graduation
rates - students with disabilities
- student socioeconomic status
- level of student English proficiency
- student ethnicity and race
15The Education Sciences Reform Act and IES
- The Education Sciences Reform Act
- An Act to provide for improvement of Federal
education research, statistics, evaluation,
information, and dissemination, and for other
purposes - H.R. 3801, passed January 23, 2002
- reconstituted federal support for research and
dissemination of information in education, to
foster scientifically valid research - established the Institute of Education Sciences
(IES) - replacing the Office of Educational Research and
Improvement - part of the Department of Education but
functioning separately from it
16The Education Sciences Reform Act and IES
- IES is the research arm of the Department of
Education - Mission is to expand knowledge and provide
information on - the condition of education
- practices that improve academic achievement
- the effectiveness of Federal and other education
programs - Goal
- the transformation of education into an
evidence-based field in which decision makers
routinely seek out the best available research
and data before adopting programs or practices
that will affect significant numbers of students - Consists of
- Grover J. (Russ) Whitehurst, first Director,
since November 2002 - Office of the Director
- National Center for Education Research
- National Center for Education Statistics
- National Center for Education Evaluation and
Regional Assistance - National Center for Special Education Research
17The Education Sciences Reform Act and IES
- HR 3801 defined Scientifically based research
standards to - apply rigorous, systematic, and objective
methodology to obtain reliable and valid
knowledge relevant to education activities and
programs - present findings and make claims that are
appropriate to and supported by the methods that
have been employed
18The Education Sciences Reform Act and IES
- Scientifically based research also includes
- employing systematic, empirical methods that draw
on observation or experiment - involving data analyses that are adequate to
support the general findings - relying on measurements or observational methods
that provide reliable data - making claims of causal relationships only in
random assignment experiments or other designs
(to the extent such designs substantially
eliminate plausible competing explanations for
the obtained results) - ensuring that studies and methods are presented
in sufficient detail and clarity to allow for
replication or, at a minimum, to offer the
opportunity to build systematically on the
findings of the research - obtaining acceptance by a peer-reviewed journal
or approval by a panel of independent experts
through a comparably rigorous, objective, and
scientific review - using research designs and methods appropriate to
the research question posed
19The Education Sciences Reform Act and IES
- Scientifically valid education evaluation means
an evaluation that - adheres to the highest possible standards of
quality with respect to research design and
statistical analysis - provides an adequate description of the programs
evaluated and, to the extent possible, examines
the relationship between program implementation
and program impacts - provides an analysis of the results achieved by
the program with respect to its projected effects - employs experimental designs using random
assignment, when feasible, and other research
methodologies that allow for the strongest
possible causal inferences when random assignment
is not feasible - may study program implementation through a
combination of scientifically valid and reliable
methods
20What Works
- What Works Clearinghouse (WWC)
- established in 2002 by IES
- to provide educators, policymakers, and the
public with a central and trusted source of
scientific evidence of what works in education - administered by the U.S. Department of Education,
through a contract to a joint venture of the
American Institutes for Research and the Campbell
Collaboration - reviews and reports on existing studies of
interventions (education programs, products,
practices, and policies) in selected topic areas - apply standards that follow scientifically valid
criteria for determining the effectiveness of
these interventions - Technical Advisory Group (TAG)
- leading experts in research design, program
evaluation, and research synthesis - advises on the standards for evaluation research
reviews - monitors and informs the methodological aspects
of WWC reviews and reports - www.whatworks.ed.gov
21What Works - TAG
- Dr. Larry V. Hedges, Chairperson, Stella M.
Rowley Professor of Education, Psychology, Public
Policy Studies, and Sociology, University of
Chicago, and editorial board member of the
American Journal of Sociology, the Review of
Educational Research, and Psychological Bulletin.
- Dr. Betsy Jane Becker, Professor of Measurement
and Quantitative Methods, College of Education,
Michigan State University. - Dr. Jesse A. Berlin, Professor of Biostatistics,
University of Pennsylvania School of Medicine,
and Director of Biostatistics at the university's
Comprehensive Cancer Center. - Dr. Douglas Carnine, Professor of Education,
University of Oregon, and Director of the
National Center to Improve the Tools of
Educators. - Dr. Thomas D. Cook, Professor of Sociology,
Psychology, Education and Social Policy,
Northwestern University, and Faculty Fellow at
the Institute for Policy Research. - Dr. David J. Francis, Professor of Quantitative
Methods, Chairman of the Department of
Psychology, and Director of the Texas Institute
for Measurement, Evaluation, and Statistics,
University of Houston. - Dr. Robert L. Linn, distinguished Professor of
Education, University of Colorado at Boulder, and
Co-Director of the National Center for Research
on Evaluation, Standards, and Student Testing. - Dr. Mark W. Lipsey, Senior Research Associate,
Vanderbilt Institute for Public Policy Studies,
and Director of the Center for Evaluation
Research and Methodology. - Dr. David Myers, Senior Fellow, Mathematica
Policy Research, and former Director of the U.S.
Department of Education's national evaluation of
Upward Bound. - Dr. Andrew C. Porter, Patricia and Rodes Hart
Professor of Educational Leadership and Policy
and Director of the Learning Sciences Institute
at Vanderbilt University. - Dr. David Rindskopf, Professor of Psychology and
Educational Psychology, City University of New
York Graduate Center, and elected Fellow of the
American Statistical Association. - Dr. Cecilia E. Rouse, Professor of Economics and
Public Affairs, and joint appointee in the
Economics Department and Woodrow Wilson School,
Princeton University. - Dr. William R. Shadish, Founding Faculty and
Professor of Social Sciences, Humanities, and
Arts at the University of California, Merced.
22What Works Current Topics
- The What Works Clearinghouse (WWC) prioritizes
topics based on the following criteria - potential to improve important student outcomes
- applicability to a broad range of students or to
particularly important subpopulations - policy relevance and perceived demand within the
education community and - likely availability of scientific studies.
- Specifically, the topics were selected from
nominations received through - emails from the public
- meetings and presentations sponsored by the What
Works Clearinghouse - the What Works Network
- suggestions presented by senior members of
education associations, policymakers, and the
U.S. Department of Education and - reviews of existing research.
23What Works Current Topics
- Topics include
- MathCurriculum-Based Interventions for
Increasing Middle School Math - ReadingInterventions for Beginning Reading
- Character EducationComprehensive Schoolwide
Character Education Interventions Benefits for
Character Traits, Behavioral, and Academic
Outcomes - Dropout PreventionInterventions for Preventing
High School Dropout - English Language LearningInterventions for
Elementary School English Language Learners
Increasing English Language Acquisition and
Academic Achievement - MathCurriculum-Based Interventions for
Increasing Elementary School Math - Early ChildhoodInterventions for Improving
Preschool Childrens School Readiness - Delinquent, Disorderly, and Violent
BehaviorInterventions to Reduce Delinquent,
Disorderly, and Violent Behavior in Middle and
High Schools - Adult LiteracyInterventions for Increasing Adult
Literacy - Peer-Assisted LearningPeer-Assisted Learning
Interventions in Elementary Schools Reading,
Mathematics, and Science Gains
24Does Not Meet Evidence Screens
- Studies may not pass WWC screening requirements
for the following reasons - Evaluation research design. The study did not
meet certain design standards. Study designs that
provide the strongest evidence of effects include - randomized controlled trials
- regression discontinuity designs
- quasi-experimental designs (must use a similar
comparison group and have no attrition or
disruption problems) - single subject designs
- Topic area definition. The study did not meet the
intervention definition developed by the WWC for
a particular topic. - Time period definition (generally, the last 20
years) - Relevant outcome
- academic outcomes, not, for example, student
self-confidence - needs to have only one relevant outcome to pass
this screen - test reliability or validity
- sample or description of relevant test items if a
study outcome test is not known or available - Relevant student sample
25A Real Live Current Example
- MATHEMATICS AND SCIENCE EDUCATION RESEARCH GRANTS
PROGRAM - CFDA (Catalog of Federal Domestic Assistance)
NUMBER 84.305 - RELEASE DATE May 6, 2005
- REQUEST FOR APPLICATIONS NUMBER NCER-06-02
Mathematics and Science Education Research Grants
Program - http//www.ed.gov/about/offices/list/ies/programs.
html - LETTER OF INTENT RECEIPT DATE September 12,
2005 - APPLICATION RECEIPT DATE November 3, 2005, 800
p.m. Eastern time
26A Real Live Current Example
- REVIEW CRITERIA FOR SCIENTIFIC MERIT
- Significance
- Does applicant make a compelling case for the
potential contribution of the project to the
solution of an education problem? - Does the applicant present a strong rationale
justifying the need to evaluate the selected
intervention (e.g., does prior evidence suggest
that the intervention is likely to substantially
improve student learning and achievement)? - Research Plan
- Does the applicant present
- (a) clear hypotheses or research questions
- (b) clear descriptions of and strong rationales
for the sample, measures (including information
on reliability and validity), data collection
procedures, and research design - (c) a detailed and well-justified data analysis
plan? - Does the research plan meet the requirements
described in the section on the Requirements of
the Proposed Research? - Is the research plan appropriate for answering
the research questions or testing the proposed
hypotheses?
27A Real Live Current Example
- Applications under Goal Three (Efficacy and
Replication Trials) - Under Goal Three, the Institute requests
proposals to test the efficacy of fully developed
interventions that already have evidence of
potential efficacy. - By efficacy, the Institute means the degree to
which an intervention has a net positive impact
on the outcomes of interest in relation to the
program or practice to which it is being compared.
28A Real Live Current Example
- Methodological requirements
- (i) Sample
- The applicant should define, as completely as
possible, the sample to be selected and sampling
procedures to be employed for the proposed study.
Additionally, the applicant should describe
strategies to insure that participants will
remain in the study over the course of the
evaluation.
29A Real Live Current Example
- (ii) Design
- Applicants should describe how potential threats
to internal and external validity will be
addressed. - Studies using randomized assignment to treatment
and comparison conditions are strongly preferred. - When a randomized trial is used, the applicant
should clearly state the unit of randomization
(e.g., students, classroom, teacher, or school). - Choice of randomizing unit or units should be
grounded in a theoretical framework. - Applicants should explain the procedures for
assignment of groups (e.g., schools, classrooms)
or participants to treatment and comparison
conditions.
30A Real Live Current Example
- (ii) Design (continued)
- Only in circumstances in which a randomized trial
is not possible may alternatives that
substantially minimize selection bias or allow it
to be modeled be employed. Applicants must make
a compelling case that randomization is not
possible. - Acceptable alternatives include appropriately
structured regression-discontinuity designs or
other well-designed quasi-experimental designs
that come close to true experiments in minimizing
the effects of selection bias on estimates of
effect size.
31A Real Live Current Example
- (ii) Design (continued)
- A well-designed quasi-experiment reduces
substantially the potential influence of
selection bias on membership in the intervention
or comparison group. This involves - demonstrating equivalence between the
intervention and comparison groups at program
entry on the variables measuring program outcomes
(e.g., math achievement test scores), or
obtaining such equivalence through statistical
procedures such as propensity score balancing or
regression - demonstrating equivalence or removing
statistically the effects of other variables on
which the groups may differ and that may affect
intended outcomes of the program being evaluated
(e.g., demographic variables, experience and
level of training of teachers, motivation of
parents or students) - a design for the initial selection of the
intervention and comparison groups that minimizes
selection bias or allows it to be modeled
32A Real Live Current Example
- (iii) Power
- Applicants should clearly address the power of
the evaluation design to detect a reasonably
expected and minimally important effect. - For determining the sample size, applicants need
to consider the number of clusters, the number of
individuals within clusters, the potential
adjustment from covariates, the desired effect,
the intraclass correlation (i.e., the variance
between clusters relative to the total variance
between and within clusters), the desired power
of the design, one-tailed vs. two-tailed tests,
repeated observations, attrition of participants,
etc. - Applicants should anticipate the degree to which
the magnitude of the expected effect may vary
across the primary outcomes of interest.
33A Real Live Current Example
- (iv) Measures
- Investigators should include
- relevant standardized measures of student
achievement (e.g., standardized measures of
mathematics achievement) - other measures of student learning and
achievement (e.g., researcher-developed measures) - measures of teacher practices
- information on the reliability, validity, and
appropriateness of proposed measures
34A Real Live Current Example
- (v) Fidelity of implementation of the
intervention - The applicant should
- specify how the implementation of the
intervention will be documented and measured - either indicate how the intervention will be
maintained consistently across multiple groups
(e.g., classrooms and schools) over time or
describe the parameters under which variations in
the implementation may occur - propose research designs that permit the
identification and assessment of factors
impacting the fidelity of implementation
35A Real Live Current Example
- (vi) Comparison group, where applicable
- The applicant should
- describe strategies to avoid contamination
between treatment and comparison groups - include procedures for describing practices in
the comparison groups - be able to compare intervention and comparison
groups on the implementation of key features of
the intervention - using a business-as-usual comparison group is
acceptable - applicants should specify the treatment or
treatments received in the comparison group - applicants should account for the ways in which
what happens in the comparison group are
important to understanding the net impact of the
experimental treatment
36A Real Live Current Example
- (vii) Mediating and moderating variables
- Mediating and moderating variables that are
measured in the intervention condition that are
also likely to affect outcomes in the comparison
condition should be measured in the comparison
condition (e.g., student time-on-task, teacher
experience/time in position). - The evaluation should account for sources of
variation in outcomes across settings (i.e., to
account for what might otherwise be part of the
error variance). - (viii) Data analysis
- specific statistical procedures should be
described - the relation between hypotheses, measures, and
independent and dependent variables should be
clear - the effects of clustering must be accounted for
in the analyses, even when individuals are
randomly assigned to condition