There is no doubt that teacher evaluation systems in the U.S. are broken: Teachers, administrators, parents, and policymakers agree that most districts fail to measure teaching well, help teachers improve, or dismiss those who are failing. Most teachers are tenured without a rigorous examination of their competence, and those who are struggling are often left to struggle indefinitely, while their students suffer. The vast majority of teachers, who are working hard and want to continue to improve, get little help to do so.
In a report by a group of Accomplished California Teachers, Jane Fung, an award-winning 20-year veteran of Los Angeles Unified School District described the experience of many teachers: “I have had administrators who never came into my classroom for formal observations or asked me for anything more than the initial planning/goal sheet. I have had administrators observe a formal lesson and put the feedback sheet in my box without ever having spoken to me about the lesson, and I have had years where I am just asked to sign the end-of-the-year evaluation sheet [without being observed].”
Given this sorry situation, some reformers are enthusiastic about measuring teachers’ effectiveness based on their students’ test score gains, now that such data are becoming more available. Ironically, though, relying on such “value-added” measures could undermine, rather than improve, the overall quality of teaching – especially for the highest-need students.
How could this be?
First, test score gains are not accurate measures of teachers’ quality. When tied to individual teachers, they are notoriously unstable and prone to wide degrees of error, largely because they depend on the composition of students in a class, whether they attend school regularly, have stable home lives, help from parents or tutors, and what kind of education they have had previously. It is nearly impossible to disentangle the effects of an individual teacher from these things or the effects of current and former teachers, curriculum materials, class sizes, and school leadership decisions. Out-of-school time matters too. Summer learning loss, which especially hurts low-income students, accounts for about half the achievement difference between rich and poor students.
It is not surprising, then, that research shows that the same teacher typically looks more effective on value-added measures when she is teaching more advantaged students – and less effective when she is assigned more students who are low-income, new English learners, or who have special education needs. This reality creates disincentives for teachers to take on students who struggle to learn, just as New York State’s short-lived accountability scheme that rated cardiac surgeons on their patients’ mortality rates caused doctors to turn away patients who were very ill. Some of our best teachers who reach out to work with special education students and new English learners will be at risk of being fired, and others will increasingly avoid these students by choosing schools, classes, and fields where they are less likely to encounter them.
Second, most U.S. tests are exceptionally narrow, focused mostly on multiple-choice questions assessing low-level skills in reading and math. Research also shows that placing high-stakes decisions on these tests has already caused schools to teach less history, science, and the arts, and to engage students in less writing, research, and complex problem-solving – the very skills they need to become truly ready for college and careers. As teachers focus more intensely on these tests, we can expect teaching and curriculum to suffer even more.
Furthermore, if teachers are ranked against one another, we can expect the collaboration that characterizes great schools to be replaced by competition that stops them from sharing their expertise. Where this happens, students are the ultimate losers. This was the conclusion of a recent study reporting achievement declines in Portuguese schools that tied test scores to teacher evaluation and pay.
Interestingly, at a recent international summit on the teaching profession, Education Ministers from top-achieving nations were clear that they do not think it makes sense to evaluate teachers based on student test scores, or to rank teachers against one another. While evidence of student learning may play a role in teacher evaluation, we need to look at many things that students and teachers do to get an accurate picture of teacher quality.
Better systems exist – like the rigorous performance assessments used for National Board Certification, which have been found to predict teachers’ effectiveness. These measures look at student learning in context, linking it to what teachers do in teaching specific curriculum. Observations and feedback based on professional standards, administered by trained evaluators, are successfully used in schools that are part of the Teacher Advancement Program, and in cities like Denver, Colo. and Rochester, N.Y., along with classroom work showing how teachers contribute to student learning. The best systems also look at how teachers contribute to the expertise of their colleagues and the improvement of the entire school, building on the knowledge that teaching is a team sport. And in all these cases, evaluation is linked to coaching and professional development, so that teaching always improves.
Getting teacher evaluation right is critically important. Doing it wrong could hurt both teachers and students. Smart evaluation will put test score data in its rightful place – as a small part of a much more comprehensive picture of what teachers do to foster engaging and important learning with all of the diverse learners we need them to serve well.
Linda Darling-Hammond is Charles E. Ducommun Professor of Education at Stanford University and co-Director of the Stanford Center for Opportunity Policy in Education.
All statements and opinions expressed on this blog are those of the individual contributors, and not of the Bill & Melinda Gates Foundation or NBC News.