“Everything has been said before, but since nobody listens, we have to keep going back and begin again.” – Andre Gide
This epigraph begins Larry Cuban’s paper “Reforming again, again, and again,” published in 1990. As various reforms have re-appeared, Cuban extended his analysis again (“High School Reform Again, Again, and Again”) and again (“Fixing Schools Again and Again”). Cuban speculates that this reform recycling is not a problem we can solve, it’s a condition created by the institutional and political realities that we continually have to deal with.
Just as it is possible to predict that reform initiatives will return again and again, it is also possible to predict – even before these initiatives are implemented – some of the factors that will make it difficult for the initiatives to take hold and to achieve their goals. The efforts to transform teacher evaluation that took off with the Obama administration’s Race To The Top initiative in 2009 provide a recent case in point.
Those policies made their way into the news again this past week thanks to a report from the National Council on Teacher Quality. The press release (“States Bid Hasty Retreat from Their Own Attempts to Overhaul Educator Evaluation”) and coverage highlights the ways in which states teacher evaluation policies appear to be retreating (“Most States Have Walked Back Tough Teacher-Evaluation Policies”, Education Week; “ No Thanks, Obama: 9 States No Longer Require Test Scores Be Used To Judge Teachers,” Chalkbeat). Although these developments are newsworthy they come as no surprise. Previous reports have noted problems with the design and execution of recent efforts to transform teacher evaluation, and even those who have noted some positive outcomes have highlighted implementation challenges as well.
Building on Cuban’s work with his colleague David Tyack in Tinkering Toward Utopia and further analyses by David Cohen and Jal Mehta in “Why reform sometimes succeed”, my colleagues and I have been looking at some of the reasons that so many policies and reform initiatives fail to produce the fundamental changes in schools and classrooms that they seek. In a nutshell, this work suggests that too often the goals, capacity demands, and values of reform proposals do not match the common needs, existing capabilities, and dominant values in the schools and districts they are supposed to help.
Admittedly, this is a simple heuristic, but it provides one quick way to anticipate some implementation challenges and to explain how reform initiatives evolve. Although this example is drawn from the US, the basic approach to identifying the challenges of improvement and implementation can be applied in many settings outside the US as well.
Is there a fit between reform proposals and the needs, capabilities and values “on the ground”?
Asking a succinct set of questions provides one quick way to gauge the “fit” between reform proposals and the conditions in the schools and communities where those proposals are supposed to be implemented:
- How widely shared is the “problem” that the initiative is supposed to address?
- What has to change for the initiative to take hold in schools and classrooms to have an impact?
- To what extent do teachers, administrators and schools have the capabilities they need to make the changes?
- How likely is it that the key ideas and practices of the initiative will be consistent with socio-cultural, technological, political, and economic trends in the larger society?
What’s the problem the initiative is designed to solve and who has “it”?
When problems are widely shared by many of the stakeholders involved, initiatives that address those problems are more likely to be seen as necessary and worth pursuing – a key indicator of whether those “on the ground” are likely to do what the initiative requires.
In the case of the teacher evaluation reforms, proposals for changing evaluation procedures grew along with concerns that the emphases on accountability and teacher quality in the No Child Left Behind Act of 2001 were not yielding the desired improvements in outcomes in reading and mathematics (which was also predictable even before NCLB passed into law but that’s a different blog post…). Those concerns came together with increasing interest in looking at growth in student learning through “value-added” measurement approaches and with the observation popularized by the New Teacher Project’s report on “The Widget Effect” that almost all teachers were given satisfactory evaluation ratings.
For whom was the system of teacher evaluation a problem? Policymakers, funders, and some administrators seized upon teacher evaluation as a critical problem. These “policy elites”, however, are those primarily engaged with managing the education system; but “fixing” teacher evaluation did not appear to be at the top of the list of concerns for many teachers, parents, and students, or for major stakeholder groups like teachers’ unions. As a consequence, considerable resistance should have been expected.
What has to change? To what extent do teachers, principals, and schools have the capabilities to make the changes?
The more complicated and demanding the changes are, the more difficult they will be to put in place. Simply put, the likelihood of implementing a policy or improvement initiative effectively drops the more steps and the more convoluted the plan; the more time, money, resources, and people involved; and the more that everyday behaviors and beliefs have to change.
At a basic level, the “logic” of the teacher evaluation reforms seemed fairly straightforward:
If we create better estimates of teacher quality and create more stringent evaluation systems…
…. Then education leaders can provide better feedback to teachers, remove ineffective teachers, reward more effective teachers…
… And student learning/outcomes will improve
However, by unpacking exactly what has to happen for these results to be achieved, the complications and predictable difficulties quickly become apparent. Among the issues:
- New instruments have to be created, criteria agreed upon, new observation & assessments deployed, and trainings developed
- Principals/observers have to have time for training and to carry out observations/assessments
- Principals and other observers have to be able to give meaningful feedback,
- Teachers need to be able to change their instruction in ways that yields measurable improvements on available assessments of student performance
Of course, these developments are supposed to take place in every single school and district covered by the new policy, and, at the school and classroom level, these new procedures, observation criteria, and feedback mechanisms have to be developed for every teacher, at every level, in every subject.
In addition to highlighting the enormity of the task, this analysis also makes visible critical practical and logistical issues. In this case, for example, the new evaluation procedures are supposed to be based to a large extent on measuring growth of student learning on standardized tests. Yet, the policy is also supposed to apply to the many teachers who do not teach “tested subjects” and for whom standardized tests are not adequate for assessing student learning and development.
But even if all the logistical and practical problems are addressed, to be effective, the policy still requires administrators and teachers to develop new skills and knowledge: Administrators have to improve their ability to observe instruction and to provide meaningful feedback (in many different subjects/levels); Teachers have to know how to use that feedback to make appropriate changes in their instruction that lead to improved performance on available measures. Further, even if administrators were able to put in place new evaluation procedures and develop the capabilities to deploy them, using the results to sanction or reward individual teachers conflicts with the prevailing attitudes, beliefs, and norms of behavior in many schools.
(Among others, Michael McShane draws on Pressman & Wildasky’s 1984 book Implementation to highlight the issues related to reform complexity; David Cohen, Jim Spillane, and Don Peurach have written extensively about the need to develop a much stronger “infrastructure” to support the development of educator’s knowledge and skills and to improve instruction across classrooms and schools; and Rick Hess cites James Q. Wilson’s work to stress the difficulty in counteracting local incentives and prevailing institutional cultures.)
How do the proposed changes fit with the values, trends, developments at the time?
Changes proposed that reflect enduring values as well as the socio-cultural, political, technological, and economic trends can take off in concert with other developments in society. Conversely, conflicts over basic values and shifts in trends can also mean that support and public opinion may wane relatively quickly before changes have time to take root.
In this case, the teacher evaluation policies evolved as conflicting trends were emerging. On the one hand, the new approaches to teacher evaluation fit with long-standing concerns about the efficiency of education as well as with the development of new technologies, new approaches to data use, and interest in performance accountability among leaders in business, government and other fields. On the other hand, those policies also had to be implemented in a context where concerns about academic pressure and the extent of testing were growing among many parents and educators and where advocates for local control of education were becoming more concerned and more vocal about their opposition to the development of the Common Core Learning Standards.
What would you predict?
This quick survey provides one view of the challenges faced by efforts to change teacher evaluations:
- A lack of a shared problem
- Requirements for massive, complex, and coordinated changes at every level of the education system
- Demands for the development of new knowledge, skills, attitudes and norms of behavior
- In a context of conflicting trends and values
Under these circumstances, the prognosis for effective implementation was never good. Of course, the hope was that the new policies could kick-start or set in motion many of the desired changes that could encourage the kinds of interactions between administrators and teachers that would improve student learning. Given the challenges laid out here, the fact that some aspects of teacher evaluations across the US appear to have changed could be seen as remarkable. In fact, the NCTQ report makes clear that states and districts did respond to the policies. In particular, many more states are now requiring multiple observations of some or all teachers and more than half of all states now require that all teachers get annual summative feedback.
However, the NCTQ report also explains that elements of the policy critical to the basic logic are falling by the wayside. Ten states have dropped requirements for using “objective evidence of student learning” (though 2 states have added such a requirement), and “No fewer than 30 states have recently withdrawn at least one of the evaluation reforms that they adopted during a flurry of national activity between 2009 and 2015.” The Education Week coverage also notes that states like New Mexico have rolled back tough accountability provisions. New Mexico had instituted a student-growth score that accounted for 50% of a teacher’s overall rating but has since dropped that requirement after “more than a quarter of the state’s teachers were labeled as ‘minimally effective’ or ‘ineffective.’ Educators (including highly rated teachers) hated the system, with some burning their evaluations in protest in front of the state education department’s headquarters.”
Notably, this analysis also highlights that the policies were largely indirect: The were esigned to develop an elaborate apparatus to measure teacher’s performance – with the hope that those changes would eventually affect instruction. Yet there was relatively limited investment in figuring out specifically what teachers could do to improve and the kind of feedback and support that would make those improvements possible. Under these circumstances, one could anticipate that many districts and schools would make some effort to introduce new observation and evaluation procedures, but that those new procedures would be grafted onto old ones, shedding the most complicated and controversial propositions in the process (providing another example of what Tyack and Cuban describe as a process of “schools changing reforms”).
The lesson from all this is not for the advocates to lament this rollback or the critics to revel in it. Nor is it to abandon ambitious visions for rethinking and transforming the school system we have because the work that needs to be done is difficult or controversial. The point is to use our knowledge and understanding of why changing schools is so difficult so that we can design improvement initiatives that take the predictable stumbling blocks into account. It means building common understanding of the key problems that need to be addressed, coming to terms with the concrete changes that have to be made in classrooms and schools, and building the capacity to make those changes over time.