A Focus on Measuring Outcomes and Impact
Another major trend that emerged in evaluation during the 1990s is the empha- sis on measuring outcomes and using evaluation for purposes of accountability. The United States began evaluation in what Ray Rist and his colleagues (1999) called “first wave” evaluation with a focus on innovative experimentation and collecting data to improve programs and test new interventions. However, in many ways, the United States has transformed to a “second wave” country with a focus on evaluation for accountability and, at least the claim is, for using results to make summative and budgetary decisions about program continuation and expansion. The outcomes focus began in the early 1990s and continues unabated today.
In education, the foundation for the current standards-based outcome focus began in 1983 with the publication of A Nation at Risk (National Commission on Excellence in Education, 1983). That report expressed serious concerns about the state of education in the United States and provided the impetus for change. The message, which continues today, was that education in the United States was broken and that the federal government needed to become more involved to fix it. The na- ture of that action was not determined for a few years, but gradually a federal role with a focus on accountability emerged. Historically, local school districts and, to a lesser extent, the states have been responsible for schools in the United States. Therefore, an increased federal role in an issue that had historically been based on
54 Part I • Introduction to Evaluation
local community needs was somewhat controversial. However, in 1989, the National Governors Association met with then-President George H.W. Bush at the President’s Educational Summit with Governors and endorsed national goals for education while still maintaining state and local control. Later, President Clinton, who had led the National Governors Association in meeting with President Bush at the 1989 summit, greatly increased both the role of the federal government in educa- tion and the emphasis on standards with six major pieces of legislation that he signed in 1994. Press releases indicted that “not since the 1960s has so much sig- nificant education legislation been enacted” and that the six acts “promise to alter the landscape of American education in important and lasting ways” (http://www .ed.gov/PressReleases/10-1994/legla.html). The legislation included the Improving America’s Schools Act (IASA), an amendment to the old 1965 Elementary and Sec- ondary School Act that had marked the beginning of modern evaluation, and the Goals 2000: Educate America Act. Among other things, these acts provided finan- cial support and incentives for states to develop high standards for academic achievement, to guide learning, and to monitor schools’ progress toward achieving these standards. By the end of 1994, 40 states had applied for planning funds to begin developing standards. The argument was that local authority would be main- tained by having states develop their own standards; the federal government’s role was to require standards and to provide fiscal incentives for doing so. In 2001, un- der President George W. Bush’s leadership, Congress passed legislation that has been the focus of educational reform ever since—the No Child Left Behind (NCLB) Act. This legislation greatly increased the federal role by establishing more requirements for student performance, testing, and teacher training, and by adding fiscal sanc- tions and corrective action when goals were not achieved.5 Of course, standards and methods of assessment vary greatly across the 50 states, but, in each state, standards and their means of assessment serve as the focus for educational reform and much of educational evaluation today. Lauren Resnick writes, “Test-driven accountability has become a reality [in education]” (2006, p. 33), adding that “enormous weight is placed on tests and accountability formulas (2006, p. 37).”6
These policies have greatly changed the role of evaluation in public schools in the United States. Standards and their assessment receive much public attention and, in most states, are a significant driver of educational policies, practices, and evaluation. Evaluation in K–12 education in the United States today focuses on several related issues: developing appropriate means for assessing students and their progress, iden- tifying successful schools and schools that are failing, and identifying practices that can help bring students’ performance up to the standards. As schools that do not meet the standards can be closed or faculty and administrators changed, the evaluation focus is both summative (Should a school continue or not? Be re-staffed or closed?) and form- ative (Which students in a given school are failing to meet a standard? What have
5President Obama has now proposed changing No Child Left Behind, but no specific legislation has yet been passed on the issue. 6Resnick’s special issue of Educational Measurement: Issues and Practice focuses on case studies of four states and how standards and measures of assessment have been put into practice and used.
Chapter 2 • Origins and Current Trends in Modern Program Evaluation 55
been their experiences? What are the experiences of similar students who succeed? What types of interventions may be most appropriate to help those students who have not met the standard?). Such evaluation efforts can, of course, improve schools, but the focus on standards, and their assessment, also holds risks. It has changed the focus of evaluation in education to standards and accountability at a time when resources are scarce and many school evaluation efforts are able to focus on little else.
Reactions and Policy Statements. In recent years, the American Evaluation As- sociation (AEA) has taken its first policy positions on the issues of testing and education accountability. In 2000, AEA President James Sanders, our co-author, appointed a Task Force on High Stakes Testing in K–12 Education to review the re- search and to develop a statement of the organization’s position. The AEA Position Statement on High Stakes Testing in PreK–12 Education was passed by the AEA Board in 2002 and can be found on the AEA web site at www.eval.org/hst3.htm. The statement summarizes research on the risks and benefits of high stakes test- ing, concluding that “evidence of the impact of high stakes testing shows it to be an evaluative practice where the harm outweighs the benefits” (2002, p. 1). The Task Force wrote:
Although used for more than two decades, state mandated high stakes testing has not improved the quality of schools; nor diminished disparities in academic achieve- ment along gender, race, or class lines; nor moved the country forward in moral, social, or economic terms. The American Evaluation Association (AEA) is a staunch supporter of accountability, but not test driven accountability. AEA joins many other professional associations in opposing the inappropriate use of tests to make high stakes decisions. (2002, p. 1)
The Task Force presents other avenues for improved evaluation practice, including better validation of current tests for the purposes for which they are used, use of multiple measures, and consideration of a wide range of perspectives, in- cluding those of professional teachers to assess student performance. In 2006, the AEA Board approved a second policy statement on the issue of educational account- ability (see http://www.eval.org/edac.statement.asp.) This statement expresses con- cerns with three major issues:
Overreliance on standardized test scores that are not necessarily accurate measures of student learning, especially for very young and for historically underserved stu- dents, and that do not capture complex educational processes or achievements;
Definitions of success that require test score increases that are higher or faster than historical evidence suggests is possible; and
A one-size-fits-all approach that may be insensitive to local contextual variables or to local educational efforts (American Evaluation Association, http:// www.eval.org/edac.statement.asp, 2006, p. 1)
This AEA policy statement encourages use of multiple measures, measures of individual student progress over time, context-sensitive reporting, use of data to con- sider resource allocations for teachers and schools, accessible appeals processes, and public participation and access.
56 Part I • Introduction to Evaluation
Choice in Education. Another factor influencing evaluation in the educational environment today is school choice. Choice is represented in many different ways across the country. Some cities (Washington, DC. and Milwaukee, Wisconsin being prominent examples) have had voucher and choice systems for some time and much research has been conducted on these systems (Buckley & Schneider, 2006; Goldring & Shapira, 1993; Hoxby, 2000). In many school districts, parents now are able to send their child to another public school within the district or, in some cases, outside the district. Districts across the United States have many different choice plans, from traditional neighborhood schools to magnet schools, charter schools, and, in some areas, vouchers to private schools. The choice environment in K–12 educa- tion has, of course, influenced evaluation practice. The theory of choice is based on the market theory that competition improves performance; therefore, giving parents a choice of schools will inspire schools to become more competitive, which will improve school performance and student achievement (Chubb & Moe, 1990).
In some districts, evaluation plays a role in helping educational administra- tors and teachers in individual schools or groups of schools to consider how they want to market their school to recruit other students. New programs emerge; old ones are put aside. At minimum, schools struggle with predicting their enrollments and planning to staff their schools adequately. In addition, school administrators and teachers work to develop and implement new programs designed to improve learning or draw more, and sometimes, better students. Such choices, which are new to public school administrators, present challenging decision demands. What programs, curricula, or interventions will improve the schools’ scores on standards? What programs, curricula, or interventions will attract more students to the school? Traditional evaluation methods can, and are, used to help teachers and adminis- trators deal with such decisions and provide opportunities for evaluation to serve new uses. For example, Fitzpatrick has been involved in studies that examine how low-income parents who are perhaps most likely to lose out in choice environ- ments have learned about school choice and made choices for their children (Teske, Fitzpatrick, & Kaplan, 2006). These studies are designed to help school districts better inform parents about choices. In this environment, there is much that evaluators can do to help teachers and administrators adapt to change and improve learning. (See Rodosky and Munoz [2009] for an example of how one urban school district manages its evaluation responsibilities for accountability.)
Performance Monitoring in Other Governmental Sectors. Just as education was becoming concerned with standards, their assessment, and evaluation for account- ability in the late 1990s and early part of this century, other government entities and nonprofit organizations also began focusing on performance monitoring and evaluating outcomes.7 The early influences in the trend to measure outcomes in
7Although the history in these other arenas is a little different from that of education, the theory and approach behind the focus on outcomes in both education and other sectors are the same. Therefore, it is helpful for those in both arenas, education and agencies that deliver other services, to be aware of the similar pressures to measure outcomes and the forces that influence each.
Chapter 2 • Origins and Current Trends in Modern Program Evaluation 57
government came from New Public Management, a movement in public admin- istration and management, and the related call to “reinvent government.” In 1992, David Osborne and Ted Gaebler authored the popular and influential book, Reinventing Government, which urged public policymakers and managers to build on the successes of the private sector that was then experimenting with re-engineering and Total Quality Management (TQM). Osborne and Gaebler ad- vocated an entrepreneurial, consumer-driven government in which managers viewed citizens as “consumers” and government managers became more entre- preneurial in developing and experimenting with programs, policies, and inter- ventions.8 Reinventing government was not without its critics. (See, for example, deLeon and Denhardt [2000] and their concerns with how the economic-based, market model of reinventing government and viewing citizens as consumers might neglect the broader public interest.) However, reinventing government and its principles was widely implemented in many state and local governments as well as at the federal level. During the Clinton administration, Vice-President Al Gore authored the National Performance Review, a government report to guide change, based on Osborne and Gaebler’s principles of reinvention (National Performance Review, 1993). The report and its recommendations were intended to encourage public managers to be entrepreneurial to deal with budget constraints and to become more efficient but, at the same time, to meet citizen needs.
An important part of reinventing government was, of course, accountability or collecting data to see what worked and what didn’t. Therefore, the Clinton adminis- tration also proposed the Government Performance Results Act (GPRA) to address concerns about accountability with these new initiatives (Radin, 2006). (See OBM Watch [2000] http://www.ombwatch.org/node/326 for more on GPRA.) GPRA was an example of performance monitoring measurement being advocated and imple- mented by several countries, including Canada and Australia in the late 1990s (Perrin, 1998; Winston, 1999). Joseph Wholey, a prominent leader in evaluation in the U.S. government in the 1970s, was involved in the development of GPRA and was a leader in performance measurement (Wholey, 1996). Passed in 1994 with implementation beginning in 1997, GPRA required all federal agencies to produce a strategic plan and to measure progress toward meeting the goals and objectives delineated in the plan with performance data. Thus, GPRA was the first major federal government mandate to measure program or policy outcomes. Government employees across the country became well acquainted with GPRA and its requirements as different levels of gov- ernment responded to the requirements to identify and measure outcomes.
8Note the similarity between the theories of reinventing government and the theories concerning school choice. Both emerge from concepts about the market and the “success” in the private sector and a be- lief that public institutions can become more successful by becoming more like the private sector or busi- nesses. Managers and school principals become “entrepreneurs” and clients, parents, and students become “consumers” or “customers” who are making choices and decisions about services. Given the economic failures of the private sector seen in the United States and around the world in 2008 and 2009, we have chosen to use quotation marks around the word success because economists and citizens are now not so certain about the successes of the private sector. Entrepreneurial behavior, without regula- tion, appears to have prompted the housing crisis and many problems with banks and security firms.
58 Part I • Introduction to Evaluation
9OMB’s advocacy of randomized control trials will be discussed in Chapter 15 on design. Randomized experiments are certainly one way of establishing causality, but, along with the American Evaluation Association, we believe there are many established approaches to determining causality and the one selected should be appropriate for the context of the program and the judgments and decisions to be drawn from the evaluation.
The Bush administration continued the emphasis on performance-based man- agement and measuring outcomes with its own measure to replace GPRA, the Pro- gram Assessment Rating Tool (PART) (OMB, 2004). PART is a 25-item questionnaire designed to obtain information on program performance. Scores are calculated for each program based on agencies’ responses, and one-half of the PART score is based on results or outcomes. Each year, the Office of Management and Budget (OMB) ob- tains PART scores from 20% of all government programs; programs are required to complete PART on a rotating basis, so that all programs are reviewed within five years. By 2008, 98% of federal programs had completed PART forms and been re- viewed. (See http://www.whitehouse.gov/omb/part/.) Just as scores on standards- based tests can influence the staffing and even the continuation of individual schools, PART scores are intended to be used to make budgetary decisions. As in education, instances of the dramatic use of PART scores to slash funding for progra