IIER 6: Wyatt (1996) - school effectiveness research

Issues in Educational Research, 6(1), 1996, 79-112.

School effectiveness research: Dead end, damp squib or smouldering fuse?

Tim Wyatt
University of Western Sydney - Nepean

This paper begins by asking what have we learned from two decades of studying school effectiveness, then goes on to define a research agenda in the NSW context that can begin to answer some of the questions raised about the basic parameters of "effectiveness". Several intersecting lines of evidence suggest that the current paradigm of research into school effects may have reached a dead end. This evidence includes the results of evaluation studies demonstrating the failure of most projects based on the so-called correlates of school effectiveness; the repeated finding that only a small proportion of variance in student performance can be explained by schools; and more recent research pointing to the greater importance of class membership in explaining student performance.
Recent initiatives by the NSW government to implement performance based funding and to provide the community with more extensive information about school performance raises the stakes in identifying effective schools to new heights. The task of identifying effective schools is not an easy one, either conceptually, technically or politically. Handled sensitively, the use of school performance information has the potential to contribute considerably to the improvement of schooling outcomes for students. Handled ineptly, the contribution of school effectiveness research will either be irrelevant, or create a conflagration all would rather have avoided.

Introduction

After more than two decades of research into school effectiveness, it is proper to question what it has achieved in terms of improving outcomes for students. Several intersecting lines of evidence appear to suggest that the current paradigm of research into school effects might have reached a dead end. The first of these lines of evidence draws on the conclusion of several large scale evaluations of school improvement projects based on the correlates of presumed effective schools — that despite massive funding, and zealous efforts by legislators, administrators and teachers, these projects have not generally led to any significant and sustainable improvements in student outcomes.

A second line of evidence is the continued finding that the size of the effect of schools on student learning is disappointingly small, with a recent meta-analysis of over 80 British and Dutch studies suggesting that, on average, only around 9 per cent of the total variance in student performance can be explained by the effects of attending different schools (Bosker and Witziers, 1995). Further, recent sophisticated multi-level research by Hill and Rowe (1994) and other suggests that most of even this small amount of variance is actually explained by differences attributable to membership of individual classes, not schools. They suggest that teachers, not schools, "make the difference" in student learning.

Underpinning both of these arguments has been the inability of research to produce consistent and unambiguous findings of school effectiveness across different domains of performance or over time. Despite the voluminous literature on school effectiveness, we are not much further advanced from the state of affairs described in Ralph and Fennessy's (1983) critique: much of the literature takes the form of reviews of reviews, with only a small number of highly influential empirical studies providing the "evidence" cited in paper after paper. Reynolds (1992) concludes that in many ways we have come full circle back to the interpretation of the 1966 Coleman report — that schools don't make a difference.

So, has school effectiveness research fizzled out? An overall evaluation of the available data on the size and stability of school effects leads to the conclusion (in line with Scheerens, 1992) that school effectiveness models are not as shaky as certain critics would have it , but at the same time not established as firmly as some enthusiastic school improvers would treat them. Recent initiatives by the NSW government to implement performance based funding, to provide the community with "fairer school information", and to establish "charter schools" (those identified as failing) raises the stakes in identifying effective schools to new heights. Governments elsewhere also continue to legislate the publication of more and varied information about school performance. Given the consequences for a school of being designated effective or ineffective, it is fair to ask what is meant by those terms. How might we know if a school is effective? Do schools remain effective over longer or shorter periods of time? Are they equally effective in all areas? Are they equally effective for all their students? There would seem to be little point in attempting to identify schools at one point in time as "effective" in a global sense, if in fact such a notion is a false one.

The task of identifying school effectiveness is not an easy one, either conceptually, technically or politically. Handled sensitively, the use of school performance information has the potential to contribute considerably to the improvement of schooling outcomes for students. Handled ineptly, the contribution of school effects research will either be irrelevant, or create a conflagration all would have rather avoided.

The context for school effectiveness research

The history of the research into school effectiveness has been extensively described elsewhere (Reynolds, 1992; Brophy & Good, 1986: Scheerens, 1992: Creemers, 1994) and it is not necessary to go into it in any great depth here. It is sufficient to note that the major impetus for developments in North America and British research came about as a reaction to deterministic interpretations of findings by Coleman et al (1966); Jencks et al (1972); Plowden, (1967) and others that family and neighbourhood characteristics have a greater impact on student performance than individual schools. The popular (if incorrect) interpretation was that schools do not make a difference. Subsequent research (Rutter et al, 1979; Mortimore et al 1988; Mortimore 1993), examining the relative progress made by students concluded that while background variables are important, schools can have a significant impact. Scheerens (1992) notes that many school effectiveness studies carried out prior to the mid 1980s were hampered by the limited statistical techniques available to them. The development of more sophisticated techniques and software that allows greater separation of the effects of students and schools (Goldstein, 1987) has led to an explosion in the number of studies, conducted in a variety of contexts, on different age groups, and different countries which confirms the existence of both statistically and educationally significant differences between schools in students' achievements.1

However, the picture is not nearly as clear cut as this impressive body of evidence would indicate at first glance. Very recent research suggests that the notion that schools can be placed on a continuum from effective to ineffective may be inappropriate (see Hargreaves, 1995), and indeed that effectiveness itself may not be a unitary concept. There are questions as to whether schools are differentially effective for all of their students, whether they are equally effective across all curriculum areas, and whether they remain effective over time. To those closely involved with schools this would seem to be merely common sense, but empirical demonstration is another matter. The following sections of this paper examine the evidence relating to these issues in more detail.

Perhaps more importantly, recent work by Hill and his colleagues with data from Victorian primary schools suggests that when information about class membership is considered, the additional proportion of variance accounted for by the school is very small. There has not yet been widespread replication of this finding, other than a limited demonstration of greater within school effects than between school effects in an international test (Scheerens, Vermeulen and Pelgrum, 1989), although the possibility of greater class effects was raised earlier by Fredrickson (n.d). Despite the lack of replication, it is clear that the implication of this work is that we are faced by the same conclusions as researchers in the 1960s, that schools do not make (much of) a difference. This may be overly pessimistic, as Reynolds (1992) notes, we are considerably wiser for the journey, and it is clear that individual teachers can make a difference to student's learning outcomes. Indeed there is cause for hope, because of all the variables that have been associated with effective schools, the quality of teaching has both the most consistently demonstrated impact on student learning and is within the power of schools to do something about.

It would seem necessary for future research to pay closer attention to the issue of teacher effectiveness for there to be significant advances in our understanding of what makes schools effective. Hargreaves (1995) discusses the need to consider the cultural dimension to school improvement. The question to be answered is this: is an effective school more than a collection of effective classrooms, or is there some cultural influence operating over and above the contribution of individuals. While resolution of this issue is beyond the scope of this paper, interested readers are referred to reviews of teacher effectiveness in Witrock (1986); Walberg, (1986); and Hopkins, Ainscow and West, (1994).

A further question to be asked is whether the findings of school effects research in other countries have any relevance to the Australian and in particular the NSW context. The aims, organisation content and delivery of schooling vary greatly from country to country. Few countries have the extensive private school sector that operates in Australia, or academically selective schools within the general education stream. What effects might these have for the interpretation of results? It has been suggested that the focus of much of the US work, in particular, which has focussed on attempts to identify correlates of highly effective inner city, high minority enrolment schools, has very little relevance to the majority of schools in countries such as Australia and the Netherlands.

Research by McGaw et al (1992) has suggested that Australian school communities have little appetite for the narrow focus on multiple choice tests of literacy and numeracy that is the standard fare in many North American studies. Instead, the outcomes most highly valued by Australian communities included less tangible attributes such as the development of a positive self-concept, a sense of self-discipline and self-worth, becoming a productive and confident member of the adult world and the development of appropriate value system.

Whether this reflects a more sophisticated understanding of the purposes of schooling or simply the lack of a tradition of testing is hard to determine. Surprisingly little is known about the parameters of performance of Australian schools and school systems. There has been a long history of opposition to standardised testing and the public comparison of schools, witnessed by the opposition to the ASSP project in the 1970s (Keeves and Bourke, 1976; Bourke et al, 1981) and the continued opposition of the NSW Teachers Federation to the Basic Skills Tests (Byrne, 1997). While public examinations have been part of the educational landscape in NSW since the establishment of public schools more than a century ago, only the most cursory information has been publicly available about the performance of schools on these examinations (an exception is Williams and Carpenter, 1987). The most widely known information has been the number of students in the top 1000 places in terms of the Tertiary Entrance Rank, a very imperfect indicator at best. The current NSW government promised as part of its election platform that it would publish "fairer school information" (Carr, 1995) but while prototype models of school annual reports have been disseminated for discussion (DSE, 1996), there has been no final agreement on what form these will take, and at the time of writing are the subject of a ban by the NSW Teachers Federation. If little is known about school performance in absolute terms, almost nothing is known about the comparative value added by schools of various kinds and in various locations over and above what might be expected given the intake characteristics of their students.

So, if effectiveness is to prove to be an elusive concept, is there any point in continuing? Has school effectiveness fizzled out or has it reached a dead end from which we need to retrace our steps for a while and try a new direction?

Schools directly or indirectly touch on the life of almost everyone in modern society, whether as student, parent, teacher, employer or consumer of the goods and services produced by school leavers. Education is a major undertaking of governments around the world. Schools account for a substantial proportion of public and private expenditure, averaging around 4 percent of GDP in OECD countries. The NSW Department of School Education is one of the largest employers in Australia, and has a budget of well over $3 billion annually. In return for this investment, high hopes are held for education as an instrument of social and economic policy for the betterment of individual, community and national well-being. It therefore should come as no surprise that there is intense interest in knowing whether schools are delivering value for money — how effective schooling is and how it can be improved (Hill, Rowe and Holmes-Smith, 1995).

The re-emergence of belief in "market forces" as the dominant economic model in countries including the UK, USA, Australia and New Zealand in the 1980s has also forced schools and school systems, for better or for worse, to operate in a quasi-market environment in which they must actively seek to satisfy client expectations and compete for student enrolments and thus resources. They must not only be effective, but they must also market themselves on the basis of what they do especially well (Hill, 1995a).

Such interest is not new, but may have become more intense, as moves to "reconstruct" poor performing schools takes hold (see for example, North Carolina State Board of Education, 1997). The education indicators movement of the late 1980s (see OECD, 1994; Smith, 1988; Wyatt and Ruby, 1988) refocussed attention on the need for both educational accountability and improvement to be based on accurate, reliable and defensible collection, dissemination and utilisation of information. The measurement of student outcomes as a reflection of school effectiveness is an essential and integral part of such information systems. Hill (1995a) notes that the need for reliable information and measurement has been understood for some time by those in industry and business, and the message is becoming increasingly clear within education. This is not to say that it is yet universally accepted across the education sector.

Reynolds and Packer (1992) argue that several converging forces make it likely that the need for research and development in the general area of school effectiveness and school improvement will be even greater in the 1990s than it has been thus far. They hypothesise that in addition to the increased pressure for education systems to demonstrate results, school systems are likely to become more heterogeneous in quality, encouraged by policies that promote greater school differentiation (such as greater school choice and the establishment of more specialist and selective schools). Policies encouraging greater self management will also mean that schools are more dependent on their own resources, which are not equally distributed. They believe that the likely result of these policies, in the short term, will be a substantial variation in the quality of schools, since the common factors provided by districts and local education authorities to all schools are simply being removed.

In addition, the nature of the current school population is changing, firstly as a consequence of policies of "mainstreaming" children with special educational needs (either as a result of physical, behavioural or learning problems), and also as a result of the retention into senior secondary schooling of a large number of students not necessarily interested in or inclined towards the traditional academic offerings. As these young people are highly sensitive to the quality of what they are offered within their educational setting, the influence of schools is likely to increase (Graham, 1988). Those that adapt best to working with a diverse range of students will be demonstrably more effective than those that cling to the ways of the past.

Slightly further into the future, demographic changes mean that governments in all major industrialised nations will be faced with greatly reduced cohorts of young people leaving school as a consequence of dramatically reducing birth rates in the 1960s and 1970s. In the United Sates, each retired person is currently supported by 17 workers. By 2010, if current trends continue, the proportion will be reduced to 1:4, and one of these four will be Hispanic or black, groups not traditionally well served by the schools of today. Assuming the demand for labour remains constant, no society in the future will be able to tolerate the large number of students who "drop out" or leave schools without formal qualifications as at present.2

In these circumstances, the need for governments to be able to identify effective schools and to take active steps to encourage greater effectiveness becomes self evident.

What have we learnt from two decades of school effectiveness research?

Perhaps the most important outcome of research into school effectiveness has not been the finding that schools can influence their students' learning, or that some school related factors seem to lead to better student outcomes than others (Mortimore et al, 1988; Bosker and Scheerens, 1992). Murphy (1992, pp. 94-96) identifies four aspects that he believes are the real legacy of the effective school movement:

The educability of learners. At the heart of the effective schools movement is an attack on the prevailing notion of the distribution of achievement according to a normal curve. There is a clear demonstration that all students can learn.
A focus on outcomes. For a variety of reasons, educators have tended to avoid serious inspection of the educational process. Effective schools advocates, however, argued persuasively that rigorous assessments of schooling were needed and that one could judge the quality of education only by examining student outcomes. Equally important, they defined success not only in absolute terms, but as the value added to what students bring to the educational setting.
Taking responsibility for students. The third major contribution of the effective schools movement is an attack on the practice of blaming the victim for the shortcomings of the school itself. It means an end to the philosophy of "I taught them but they didn't learn". The movement has been insistent that the school community takes a fair share of the responsibility for what happens to the youth in its care.
Attention to consistency throughout the school. One of the most powerful and enduring lessons from all the research on effective schools is that the better schools are more tightly linked — structurally, symbolically and culturally — than the less effective ones. They operate more as an organic whole and less as a loose collection of sub-systems. An overarching sense of consistency and coordination is a key element that cuts across the effectiveness correlates and permeates our better schools.

The legacy of the effective schools movement as outlined by Murphy (1992) leads into the related subject of school improvement. School effectiveness research may have intrinsic interest for some, but is ultimately of little value unless it produces something of policy relevance that can help to make schools better. Likewise, Murphy contends, school improvement efforts that lack substantive content, or focus on single curriculum initiatives or isolated teaching practices rather than whole school development are doomed to failure.

For some time, research into school improvement was seen as a somewhat different discipline to school effectiveness, but of late there has been a drawing together of the two traditions (Reid, Hopkins and Holly, 1987; Reynolds and Creemers, 1990; Levine, 1992). Many school effectiveness researchers are profoundly concerned with the implications of their research for policy makers, schools and students, indeed many are employees of school systems (e.g. Nuttall 1989; Fetler, 1989; Webster et al 1994) or have been commissioned by school authorities to carry out work on their behalf (e.g. Schagen, 1994; Thomas et al, 1994). The work of the ALIS centre at the University of Newcastle (UK) is an example of how the results of methodological and theoretical advances have been put into practice for the purposes of informing school improvement (Tymms, 1995).

The mechanisms by which the results of school effectiveness research are put into practice have also been the subject of considerable research (Fullan, 1991; Louis and Miles, 1991). There are some important lessons to be learnt from these reviews. While an extended discussion of these is beyond the scope of this paper, it is important to note that there is little support for those who would apply the findings of school effectiveness research mechanistically, without reference to a school's history and context. Rather, the approach advocated by Sammons et al (1995) and adopted by educational authorities in Scotland, the UK and NSW, is that they can be a useful starting point for school self evaluation and review. While many examples of the deleterious effects of testing programs based on minimum competency testing have been identified (eg. Madaus 1988), it would also appear that the worst fears of those who vehemently oppose the publication of student results seldom come to pass.

A large number of initiatives around the world have attempted to put into practice the findings of the school effectiveness research. Mann (1992) summarises this work as follows.

Since the early 1970s, school practitioners and education researchers in Great Britain and the United States started to document the 'within-school' characteristics that could 'add value', that is, help children achieve over and above what would be predicted given their family backgrounds, chiefly social and economic status.
Effective schools advocates believed that enough was known about the best teaching practices to help children from low-income families learn the same basic things as other, more privileged students. The basic assertion of effective schools advocates was that 'compensatory education' was possible, that there was a set of practices that did not depend on extra money or new grants of authority. If those factors were maximised, simultaneously and persistently, then the neediest children would be able to acquire the same basic skills (literacy, numeracy) as their more privileged peers. Various researchers have created different lists of these factors. Most can be resolved into the five used by the late Ron Edmonds:

Strong leadership at the building level.

"Best practice" teaching.

An organizational climate that supports good work by teachers.

Curriculum that fosters an "instructional emphasis" or an "academic press."

A pupil progress measurement system that is geared more to the next lesson's teaching than the next grade's promotion.

That model continues to be widely used (and widely resisted) in state and local school reform movements around the U.S. and abroad. Some advocates (including this author) have argued that the preferable but unused knowledge base created a moral imperative that should compel teachers to change their teaching. The rhetoric was close to both the goals of a democratic public school and a profession of teaching. The goals lost (p. 224).

The outcome of these efforts, the Hudson Institute concludes, has been a "$500 billion flop". In the US, the average reading achievements of 9, 13 and 17 years old students tested in the National Assessment of Educational Progress have not increased over the last 17 years; the national trends appear as flat lines. As Mann describes it:

If those lines were an electrocardiogram, the doctor would go talk to the family. Since they measure education, school people blithely substitute excuses for action: the tests count the wrong things, those are someone else's kids, and so on.

Less emotively, the 1985 Commonwealth Quality of Education Review Committee (QERC) in Australia concluded that assessing effectiveness of schools was constrained by the absence of unanimity of what students should achieve, the lack of effective measures of achievement across the spectrum of educational objectives and the difficulty of separating the effects of schooling from those of the complexity of social processes experienced by learners. QERC noted that physical provision for schooling was qualitatively better than before, and that the qualifications of teachers were higher, but it was unable to provide evidence that the cognitive outcomes of schooling had become better or worse.

Chapman (1988) describes the outcomes of a large scale school improvement effort in Victoria. By the end of 1986, approximately half of Victorian Government schools had joined the School Improvement Plan (SIP). A review conducted in that year found that the overwhelming majority of schools (93%) were satisfied with their overall progress in school improvement. However, the researchers undertaking the study (Robinson and Rocher, 1987 in Chapman, 1988) concluded from the evidence available "the link between school improvement and student outcomes appears rather weak". The State Board of Education concluded that the SIP was limited because it paid relatively slight attention to questions of comparisons of schools performance and to thinking about questions of efficiency, effectiveness and cost effectiveness. Nor did it meet the problems of detecting and dealing with schools performing unsatisfactorily.

Not surprisingly, the school effectiveness movement and school effectiveness research has come under considerable criticism from Australian academics and policy analysts. Questionable methodological procedures, narrow concepts of effectiveness, the emphasis on standardised achievement, the danger of recreating the dream of the efficient one-best system of instruction and the conservative and simplistic prescriptions for effectiveness, improved standards and excellence are identified as contributing to a movement which is socially conservative and educationally regressive (Angus, 1986).

Measuring school effectiveness

How the effectiveness of schools can best be measured and reported is a question that perhaps has no single correct answer. One difficulty has been that there is little agreement on what is meant by effectiveness. Reid, Hopkins and Holly (1987) concluded that while all reviews assume that effective schools can be differentiated from ineffective ones, there is no consensus yet on just what constitutes an effective school.

Most current definitions have in common a focus on student outcomes, and in particular the concept of the value added by the school (McPherson, 1992). This focus implies that a school's performance is to be judged not on results alone but on the school's contribution to these results. The definition adopted by an international study of the quality in schooling by the OECD (see Chapman, 1991) encapsulates these elements:

An effective school is one that promotes the progress of its students in a broad range of intellectual, social and emotional outcomes, taking into account socio-economic status, family background and prior learning (p.1).

A study of the perceptions of what constitutes an effective school by the Australian Council for Educational Research (ACER) found that rather than the narrow concentration on test results in literacy and numeracy commonly found in overseas studies, Australian school communities valued most highly the following:

positive relationship with learning
development of a positive self-concept
sense of self-discipline and self-worth
student's living skills · becoming a productive and confident member of the adult world in time.
the development of appropriate value systems; and
the preparation of the student for the next stage of learning (McGaw et al, 1992).

However, the bulk of current school effects research accepts an operational definition of an effective school as one in which students progress further than might be expected from consideration of its intake (Mortimore, 1991). In this paper the term value added is used to mean the extent to which students may, over a given period, have exceeded or fallen below the expected progression from a given starting point. A value-added measure is one which attempts to describe the educational value that the school adds over and above that which would have been predicted given the backgrounds and prior attainments of the students within the school (Hill, 1995a).

Wyatt and Ruby (1988) reviewed a number of different strategies used to construct indicators of school effectiveness, including comparisons against standards, comparison of actual against expected scores, data envelope analysis, gain scores and cluster analysis. Cuttance (1987) and Gray et al (1990) have also reviewed a number of models for reporting the value added by schools. Of these, models based on the statistical technique of regression have been the most widely used. There is a general consensus in the academic literature that multi-level regression models (that is, those that take into account the hierarchically nested nature of the data) provide the best estimates of the sources of variation in performance between individual students, class, schools or higher units of aggregation (see for example, Goldstein, Bryk and Raudenbush, 1992; Hill et al, 1994).

However, education authorities have, in the main, tended to opt for the use of simpler linear regression models.3 The reasons for this are not clear. It may be that the spread of the knowledge and software needed to operate multi-level models in practice has lagged well behind their theoretical development. But considerations for the need to present information in more accessible form to audiences broader than highly trained specialists and academics may also play a role. The UK Schools and Curriculum Authority (SCAA) Report on Value Added argues that the models used must be as simple and straightforward as possible while still maintaining accuracy. They recommend that, in the first instance, a simple linear regression of student intake scores on student outcome scores on two public examinations be used to derive an effectiveness score. Thomas and Goldstein (1995) on the other hand, argue that empirical interpretation of value-added is anything but simple and straightforward. An example of the presentation of value added results which avoids the league table approach is shown below.

Presentation of value-added results

Bosker and Witziers (1995) identify three kinds of value added measures. The first Hill (1995) describes as "Unpredicted Achievement", described as the level of attainment of students adjusted for family background characteristics, such as socio-economic status or ethnicity, and student general ability (such as measured by IQ tests).4 The second value added measure might be termed "Learning Gain", which is defined as the level of achievement of students adjusted for their prior levels of achievement. The third is "Net Progress", namely, level of student achievement adjusted for family background, ability and initial level of achievement.

Figure 1: Value added results compared to raw results

Of these options, the Net Progress or fully specified model is generally accepted to provide the most accurate measure of value-added (see Sammons et al 1994b). However, even within these fully specified models, prior achievement has been consistently identified to be the most important determinant of later achievement. Gray et al (1990) reviewed a large number of value-added studies and found that correlations between examination outcomes and student's social backgrounds were typically around 0.35 whereas the correlations between examination results and finely differentiated prior attainment was typically about 0.7.

The SCAA review identified potential input variables and listed them in order of usefulness in predicting later achievement.

The most useful are finely differentiated measures of prior achievement.
Grouped measures of prior achievement.
Social background variables for each individual student.
Average social or academic background of the school population.
Social and other characteristics of the neighbourhood in which the school is situated or the catchment area from which its students are drawn.

Thomas and Goldstein (1995) argue that while value-added measures are clearly preferable to raw results as indicators of the progress pupils make in schools, league tables or rankings of value added scores are no better than rankings of raw results. They claim that it would be irresponsible to publish league tables without taking account of the accuracy of the data. Goldstein et al, (1993) and Goldstein and Healy (1995) demonstrate that about two-thirds of all the value added comparisons between pairs of schools are too imprecise to provide a fine separation of institutions in terms of GCSE results. This is illustrated in Figure 2. Schools can be judged as statistically different (at the 5% level) only when the bars for a pair of schools do not overlap. Goldstein argues that it only makes sense to identify schools at the extreme. This in fact has been standard practice in many systems around the world — schools are typically banded together in broad groupings, although the basis for the grouping differs from place to place. The SCAA report in the UK for example, recommends reporting school scores in terms of placement within quartiles, while in California (Fetler, 1989) a comparison group is formed for each school on the basis of the 10 percent of schools immediately above and ten percent of schools immediately below the school ranked according to a composite index of student achievement and family background factors.

Figure 2: Confidence limits around value-added comparisons

In practice, there are several reasons why single measures of value added in themselves are no more satisfactory than raw measures in representing school effectiveness. A good deal of recent research suggests that schools are differentially effective (Nuttall, et al, 1989; Sammons et al, 1994) that is, they enhance the performance of certain kinds of students, say those from high SES backgrounds, but not others. There is also evidence that schools may be differentially effective in some subject areas but not others (Nuttall et al, 1989; Sammons et al, 1993). Further, school effects may not be stable over even relatively short periods of time, that is, they may be effective one year, but not the next (see Thomas, Sammons and Mortimore, 1995; Teddlie and Stringfield, 1993). Tabberer (1994) concludes that the evidence of differential effectiveness brings the consideration of single feature measures, such as those that underpin league tables, even further into question.

The issue of differential school effectiveness is considered in greater detail below.

Stability of school effects over time

It would seem obvious that one of the fundamental requirements for a school to be judged "effective" would be that the outcomes achieved by its students did not fluctuate greatly from year to year, but relatively little research has been conducted to examine the stability of individual school effects over time. Virtually all of the published research has been cross sectional in nature. A few studies have examined student growth in academic achievement (Hoffer, Greeley and Coleman, 1985; Willms, 1984), but few have examined year to year variations in school performance.

The studies which have reported in this area have produced inconsistent findings. Early research suggested considerable stability over a period of years (Rutter, et al, 1979), but in more recent work it appears that schools can vary quite markedly in their performance even over a period as short as 2 to 3 years (Goldstein, 1987; Nuttall, et al, 1989). Nuttall et al (1989) conclude that these results give rise to a note of caution about any study of school effectiveness that relies on measures of outcome in just a single year, or a single cohort of students.

Bosker and Scheerens (1992) reviewed a number of studies, and found that estimates of the stability of school effects over time (as measured by between year correlations) were between .35 and .65 for primary schools and .70 and .95 for secondary schools. Some studies in the US (Mandeville and Anderson, 1986) have found even lower correlations (around .10) for elementary school mathematics and language.

Gray et al (1995) identified eleven studies which had collected data on more than one cohort of students. Four of these looked at British secondary schools. Of these, three reported inter-year correlations that were greater than 0.9 (Willms and Raudenbush, 1989; Nuttall et al, 1989; Sime and Gray, 1990), and one reported 'middling ' correlation (between 0.5 and 0.9). One study of Dutch secondary schools (Bosker and Guldemond, 1991) also estimated high correlations while another (Roeleveld, de Jong and Koopman, 1990) reported middling ones. Correlations in primary schools studies appear to be lower. Mandeville (1988) and Rowan and Denk (1982) in the USA report low to middling correlations as did Blok and Hoeksma (1993) in the Netherlands.

The widely cited study by Teddlie and Stringfield in Louisiana (1993) followed up eight pairs of outlier schools over a number of years. It is not possible to describe their data in the same way as the other studies. They claim, however, the over the period of their study, four of the schools were stable and effective, four were stable and ineffective, five were improving and three were declining.

It may be that the inadequacy of the statistical models used in many of these studies may have led to an under-estimation of the stability of effects, since they did not separate out sample variance from true parameter variance. Willms and Raudenbush (1988) found that measurement noise can account for up to 80% of the observed year variance.

Instability of estimates of effectiveness over time may have several causes. First, the instruments used to measure student outcomes may not test the same thing in successive years or may not adequately measure what was intended. Second the statistical methods used may produce spurious results. Thirdly, the world may be inherently unstable, and the school effectiveness scores simply reflect this.

Willms and Raudenbush (1988) hypothesise that the explanation for the finding of instability may lie in the manner in which school organisational structure and classroom practices contribute to student performance. These factors can change over time as teachers and principals transfer from one school to another. The impact of staff turnover may be felt more keenly in the generally smaller primary schools than secondary schools, explaining some of the discrepancy in the estimates for the two sectors. Year to year performance may also vary depending on the success of local improvement initiatives or the manner and extent to which wider reforms are implemented. The extent to which school performance is influenced by wider social, economic and political factors is largely unknown. These factors may also vary across communities and change over time.

On the other hand, schools which have developed a positive school climate and in which a tradition of high achievement has been enculturated may experience greater stability than others. The question of whether school effects are stable has implications for accountability schemes based on the presumed ability to identify school effectiveness. Without an understanding of how much the variation between schools is attributable to school practice versus social factors and how much is simply random fluctuation, there is potential for misuse of indicators based on single year measures. Goldstein et al (1988) point to the need for caution in interpreting school effects at any one point in time:

It is clear that the uncertainty attached to individual school estimates, at least based on a single years data, is such that fine distinctions and detailed rank orderings are statistically invalid.

Gray et al (1995) propose that a new way of looking at the data is needed. They note that when school effectiveness researchers have been in a position to collect a second years' data they have sought to replicate their previous findings. A lack of stability between years was interpreted as a threat to the validity of their findings. This orientation may have inadvertently inhibited development in this area. School improvement is essentially about creating change in levels of school performance. If we continue to view instability negatively we may be missing an important point.

For satisfactory studies of changes in school performance over time, Gray et al (1995) argue that a number of factors must be brought together, including:

measures of outcome and prior attainment on individual students;
data on a minimum of three cohorts and preferably more;
multi-level statistical analysis; and
an orientation towards examining the data for systematic changes in school performance over time.

To test this framework, Grey et al analysed data from three successive cohorts in 30 English secondary schools. They found that only a small proportion of the schools in the study (between a fifth and a quarter) were improving or deteriorating in terms of their effectiveness. A particularly striking finding was that whilst several schools improved in effectiveness, only one initially ineffective school did so consistently. As with any pioneering work, the study is not without its critics. Tymms (1995) queries several methodological aspects, which require further exploration. A later section of this paper suggests some areas for research in the NSW context that can help to advance knowledge in this area.

Are schools equally effective across the curriculum?

As with the work on the stability of school effects, the research on differential school effects is both limited and contradictory. The early work of Rutter, et al (1979), and Reynolds (1976) reported high intercorrelations between school's academic effects and their social effects. Mandeville and Anderson (1986) investigated the inter-correlation between primary school effect indices for different subject areas (maths and reading) and found indices near .70. Brandsma and Knuver (1989) arrived at much the same figure (.72) in Dutch elementary schools. Cuttance (1987) reports correlations for secondary schools in Scotland of .47 and .74 for English and arithmetic respectively.

More recent work suggests schools may be differentially effective in different areas. Mortimore, et al (1988) found substantial variations between schools' effectiveness on one academic outcome, such as oracy (heavily school influenced), and another, such as reading (less heavily influenced). Smith and Tomlinson (1989) also report substantial variation in the departmental success rates of different schools at public examinations, with these differences being more than simply a function of the school's overall effectiveness. FitzGibbon et al (1990) report similar findings between English and Mathematics departments.

Nuttall, et al (1993) using data from the ILEA Junior School Project found that some schools were more effective in raising pupil's performance in one cognitive area rather than another. Only a few schools in their sample had a marked positive or negative effect on both reading and mathematics. The project's findings, they conclude, is evidence that no simplistic division of schools into good or bad is possible, even on the basis of results in basic subjects such as reading and mathematics.

Hill, (1995) reports the only Australian data in this area. He found from the Victorian Quality Schools Project that the correlation between value-added measure for 51 primary schools in English and mathematics to be 0.64, indicating that primary schools are by no means equally effective across these two core areas of the curriculum. In addition, because in every case the students in these 51 schools were taught by the same teacher for both English and mathematics, it is evident that teachers are also differentially effective across the curriculum. Figure 3 illustrates the relationship between performance in English and mathematics in Hill's study. The possibility of replication of this kind of work with the NSW Basic Skills Tests in Literacy and Numeracy is readily apparent.

Figure 3: Relationship between value added measures of English and mathematics achievement

Analyses by the Victorian Board of studies using 1994 VCE data reveal the same phenomenon of differential effectiveness across the curriculum. Of the total of 190 schools, only 22 are in the top 10% for four or more subjects. Intercorrelations between subjects ranged from 0.32 to 0.71.

This method can also be applied at the course or KLA level using NSW Higher School Certificate results (see Wyatt, 1995). The figure below illustrates how the relative performance of various curriculum areas within a school might be compared, when both the level of difficulty of the examination and the quality of the candidature for each course has been taken into account.

Figure 4: Relative effectiveness across curriculum areas

Only a small number of studies have examined whether schools are equally effective at promoting both social and academic outcomes. Reynolds (1976) shows small academic, but large behavioural and attitudinal differences in the same school. The ILEA study (Mortimore et al 1988) showed that schools can be differentially effective with respect to their student's academic and social outcomes. Given the emphasis many schools place on the social development of their students, it would seem to be an appropriate area for further investigation.

Do schools have the same effect on all of their students?

Once again, the findings in this area are unclear. There is some evidence in the international literature that schools are not equally effective for different groups of students within the same school, such as students from different ethnic groups, ability ranges and socio-economic status. Nuttall, et al (1993) using data from ILEA primary schools found that effective schools appear to raise their reading attainment scores for all pupils irrespective of initial attainment level. Conversely, less effective schools seem to depress later attainment scores for all. Gray, et al (1990) also found little evidence of differential effectiveness in their study of schools in a wide range of LEAs. This effect is shown in the figure below.

Figure 5: Plot of school slopes showing predicted Year 5 reading scores

However, Aitken and Longford (1986) found that schools can differ in their regression slopes, suggesting that some may be more effective for pupils of a certain ability level that others, a finding supported by many others (Nuttall, et al (1989); McPherson and Willms (1987); Willms and Cuttance (1985) Smith and Tomlinson (1989). In the Victorian Quality Schools Project, Hill found that in the case of primary school English attainment, girls make greater progress than boys, students from high socio-economic backgrounds make greater progress than students from low socio-economic status, and classes with a high proportion of non-English speaking (NESB) students make less progress than classes with low proportions of NESB students. For mathematics, the significant factors are also gender, student SES and NESB, but in this case girls make rather less progress than boys.

To sum up these confusing findings, the following table drawn from Bosker and Scheerens (1992, p.749) is helpful.

Table 1: Range of Stability Estimates for School Effects

	Primary	Secondary
Across years Across grades Across classes Across subjects Across criteria	.35 - .65 .10 - .65 .45 - 1.00 .70 - .75 .00 - .05	.70 - .95 .25 - .90 — .45 - .75 .35 - .70

What might explain these conflicting findings?

Levine and Lezotte reach the following conclusions about why there is so much variability in the school effects research.

1. Different achievement criteria and different data analysis methods frequently lead to conflicting conclusions about whether a school is effective or ineffective. Judgements depend highly on the grade and subject tested, and whether the test is norm referenced or criterion referenced.

The tendency to draw conclusions about school effectiveness from data derived from low level learning tasks that are the most easily taught and measured by teachers and are the most easily memorised and regurgitated by students is one of the most distressing of these methodological uncertainties, according to Levine. This tendency is particularly evident in the many studies which depend mainly on outcome criteria such as primary grade test scores, which are generally limited in the extent to which they assess independent learning, conceptual application, thinking and higher order tasks and skills. Also, when scores are aggregated across grade levels , poor performance on higher order tasks in later grades can be masked by acceptable performance in lower grades. Similarly, when scores are aggregated across tests and sub-tests, mechanical skills such as punctuation, spelling and mathematical computation can disguise unacceptable performance on fundamental reading and maths skills such as comprehension and problem solving (Levine and Stephenson, 1987).

2. Identification of a school as effective is highly dependent on methodological variations, such as whether or not, and in what way socio-economic status measures are used to control for student's background. The regression models used are highly sensitive to the way data is aggregated and disaggregated. Judgements about effectiveness are sometimes made on the basis of only slight achievement differentials. When large sample numbers are involved, trivial differences in achievement can be statistically significant.

The criteria and analytic methods for classifying a school or schools as effective or ineffective have been topics of considerable debate. Many studies contrast the differences obtained when value-added measures are used in place of raw results, but only a few studies have directly contrasted the implications of using alternative methodologies for calculating value added. Marko (1974) was one of the first to contrast different methods of aggregating data, and found little difference in his sample. However, this research was limited because analysis using multi-level models had not been developed at that time. Gray, Jesson and Jones (1986) searched for fairer ways to compare schools' examination results. The research of Goldstein, (1987) is considered to be the authoritative work in this area, but there is still scope for considerable work on how the complex results of multi-level models can be presented in an easily understandable form by lay audiences. The following figure illustrates how two different methods of calculating and reporting value-added can give very different impressions of the state of affairs in a school. Figure 6 (a) shows the difference between expected and actual scores based on performance at the mean, whereas Figure 6 (b) shows the same value added scores for students in the bottom quarter, middle half and top quarter. Figure 6 (a) would suggest that the school is doing slightly better than would be expected, but when disaggregated as in Figure 6 (b) it is apparent that the school is doing much worse than expected for its lower ability students, but much better for its more able students.

Figure 6: Comparison of different methods for reporting value-added

An agenda for school effectiveness research

Several suggestions for further research have been mentioned above (see also Reynolds and Packer 1992). The databases relating to student achievement held by various school authorities within NSW offer a significant opportunity to contribute to the international knowledge base on effective schooling. There are a number of technical problems that need to be resolved before this potential can be realised, not least being the task of integrating disparate data sets to construct longitudinal records. There are also a number of legal, industrial and ethical issues that also must be addressed before such work can be undertaken. The agenda for further research sketched below assumes that such issues will be resolved in time.

Several recent initiatives, if they are successfully implemented, are of particular interest in terms of providing something new. Recording of information of student outcomes against curriculum profiles may provide much more information about student progress in a wider variety of subject areas and in greater detail than has typically been studied in the past. They also provide the possibility of measurement of the extent to which students can demonstrate mastery of a variety of skills rather than subject matter knowledge, and thus better reflect the multiple goals of schooling expressed in school and systemic mission statements.

There is also much that more traditional analysis of Basic Skills Test and public examinations such as the School Certificate and Higher School Certificate can reveal about some of the enduring questions about a school effectiveness. The size of the samples, the breadth of curriculum tested, and the relatively long periods of time over which data have been collected are all superior to the bulk of studies reported in the literature, and provide the potential if not for definitive answers, at least answers with a high degree of confidence within the NSW context. Issues concerning the size of school effects, their consistency over time, their consistency across different kind of school output, their consistency for different types of pupil and the applicability of findings across international settings fall into this category.

What needs to be done? One of the shortcomings of the British and North American literature arises from its historical focus on identifying schools that have been effective in teaching disadvantaged youth. Most of these are inner city schools, which have no counterpart in many countries. We need research undertaken in more typical samples of schools.

Larger sample sizes are also needed. More studies of secondary school effectiveness are also needed, particularly since the literature on school effectiveness from the Netherlands and the United States concentrate heavily upon research in elementary schools.

Some British studies have been highly defective in their measurements of pupil intakes into schools, which may have led to invalid assumptions being made about schools or education systems being more effective simply because full allowance had not been made for the intake quality of their pupils. What is needed in the future is multiple indicators of intake, covering a range of pupil academic and social factors, as in the study by Mortimore et al. (1988).

The methodology of measuring 'value added' also needs to be further explored. The early studies using 'means-on-means' analyses, where school averages for all pupils are used, as in Reynolds' (1979, 1982) work, make it impossible to analyse the school experience of different groups of pupil and also have lower explanatory variance. Individual pupil level data rather than group data are now widely agreed to be necessary, both on intake and at outcome (Aitken and Longford, 1986), to permit the appropriate use of multi-level techniques of analysis, which can nest pupils within classrooms and classrooms within schools and the schools within the context of outside-school factors.

There is a need also to broaden the investigation of effectiveness to include social outcomes from schools, which may be independent of academic outcomes. Only a small number of studies thus far have included even very limited measures of behaviour and attendance. Not only do we not know what the parameters of effectiveness are in this domain are, we do not know how it interacts with other domains. Some outcomes may partially determine, as well as being partially determined by, the academic outcomes of schooling. It seems strange that almost every education system around the world places great emphasis on the importance of non-cognitive outcomes for students, but few appear to have ever attempted to determine how effective they are in this area.

Reynolds (1992) outlines some of this further work required. He notes that we are still not completely sure which processes are associated with effectiveness, and also how the school organisational factors have their effects — through their effects upon pupil self-concepts or by direct modelling, for example? We need to know what creates the organisational factors, which may require a degree of historical study since there are those who insist that what makes an effective school is in part the history of being an effective school. There is a need also, he argues to lift the level of abstraction from mere empiricism to a more conceptual level. This list of research topics is not an exhaustive one. There is a great deal of qualitative investigation needed over and above the statistical analysis referred to above. The scope of the research agenda outlined above suggests that far from being a dead issue, there is considerable work yet to be done in school effectiveness. The challenge is to ensure that this work leads to a better understanding of school performance and how it can be improved.

Notes

There have been several dozen, if not hundreds of individual school effectiveness studies conducted in the past decade. Many of these have had quite small samples or are limited to a particular education authority. Some of the more widely cited studies include the following: Reynolds (1976, 1982), Gray (1981), Edmonds (1979), Brookover, et al, (1979), Cuttance (1987), Smith and Tomlinson, (1989), Willms and Raudenbush, (1989), Nuttall, et al (1989), Gray et al (1990), Daly, (1991), FitzGibbon, 1991, Jesson and Gray, 1991, Stringfield et al, (1992), Goldstein et al, (1993), Sammons et al (1994a, 1994b, 1994c), Thomas and Mortimore (1994), Thomas et al (1994), Bondi (1994), Hill et al (1994).

Similar sentiments are expressed in the Commonwealth Government's major education policy document Strengthening Australia's Schools (Dawkins, 1988).

See Salganik (1994) for a discussion of the methods used to identify effective schools in ten US state accountability schemes.

General ability measures and standardised test instruments have been found to be poorer predictors of later achievement than curriculum relevant tests (see Madaus, Kelleghan, Rakow and King, 1979 ).

References

Angus, L.B. (1986). The Risk of School Effectiveness: A Comment on Recent Education Reports. The Australian Administrator. Deakin University, Australia, Vol 7, No. 3, June.

Aitken, M.A. and Longford, N.T. (1986). Statistical Modelling Issues in School Effectiveness Studies. Journal of the Royal Statistical Society A, 149, 1-26.

Blok, H. and Hoeksma, J.B. (1993). The stability of school effects over time: an analysis based on the final test of primary education. Tijdschrift voor Onderwijsresearch, 18 (6) pp 331-334.

Bosker, R. and Witziers, R. (1995). School Effects: Problems solutions and a meta analysis. Paper presented at the Eighth Annual International Congress for School Effectiveness and Improvement, CHN, Leeuwarden, The Netherlands, January.

Bosker, R. and Scheerens, J. (1992). Issues in the Interpretation of the Results of School Effectiveness Research. Chapter 4 in Creemers, B and Scheerens J (Eds).

Bosker, R.J. and Guldemond, H. (1991). Interdependency of Perrformance Indicators: an empirical study in a categorical school system. in S. Raudenbusch and J.D. Willms (Eds.), Schools, Classrooms and Pupils. New York: Academic Press.

Bourke, S.F., Mills, J.M., Stanyon, J. and Holzer, F. (1981). Performance in literacy and numeracy, 1981. Australian Government Publishing Service, Canberra.

Brandsma, H.P. and Knuver, J.W.M. (1988). Organisatorische verschillen tussen basisscholen en hun effect op leeringprestaties. Tijdschrift voor Onderwijsresearch, 13 (14) pp. 201-212.

Brookover, W., Beady, C., Flood, P., Schweitzer, J. and Wisenbaker, J. (1979). School social systems and student achievement. schools can make a difference, New York- Praeger.

Brophy, J. and Good, T. (1986). Teacher Behaviour and Student Achievement, Ch 12 in M C Wittrock (Ed) Handbook of Research on Teaching, New York: Macmillan.

Bryk, A. and Raudenbush, S. (1992). Hierachical Linear Models. Newbury Park, California: Sage.

Chapman, J. (1988). School Improvement and School Effectiveness in Australia. Paper presented at International Congress of Effective Schools.

Byrne, R. (1997). Why We Should Ban the BST. Education (Journal of the NSW Teachers Federation). Feb 17, p.15.

Carr, R. (1995). Labor's Plans for School Education. Sydney: NSW Labour Party policy document.

Coleman, J.S., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfield, F. and York, R. (1966). Equality of Educational Opportunity, Washington: US Government Printing Office.

Coleman, J., Hoffer, T. and Kilgore, S. (1981). Public and private schools, Chicago: National Opinion Research Center.

Coleman, J., Hoffer, T. and Kilgore, S. (1982). Cognitive outcomes in public and private schools, Sociology of Education, 55, (213): 65-76.

Creemers, B.P.M. (1994). The History, Value and Purpose of School Effectiveness Studies in D Reynolds et al (Eds) Advances in School Effectiveness Research and Practice, Oxford: Pergamon.

Cuttance, P. (1987). Modelling variation in the effectiveness of schooling, Edinburgh: Centre for Educational Sociology.

Cuttance, P.F. (1994). Integrating Best Practice and Performance Indicators to Benchmark the Performance of a School System. Conference Paper, Sydney.

Cuttance, P.F. (1994). Monitoring Educational Quality through Performance Indicators for School Practice. School Effectiveness and School Improvement, Vol 5, No.2, pp.101-126.

Daly, P. (1991). How Large are Secondary School Effects in Northern Ireland? School Effectiveness and School Improvement, 2, (4): 305-323.

Edmonds, R. (1979). Effective Schools for the Urban Poor, Educational Leadership, 37, (1): 15-27.

Fetler, M. (1989). A method for the construction of differentiated school norms. Paper presented at the annual meeting of the American Education Research Association, San Francisco, CA (ERIC No Ed312302).

FitzGibbon, C.T. (1991). Multilevel modelling in an indicator system, Chapter 6 in S.W. Raudenbush and J.D. Willms (Eds) Schools, Classrooms and Pupils International Studies of Schooling from a Multilevel Perspective, San Diego: Academic Press.

FitzGibbon, C.T. (1992). School Effects at A level: Genesis of an Information System in D. Reynolds and P. Cuttance (Eds) School Effectiveness Research Policy and Practice, London: Cassell.

FitzGibbon, C.T., Tymms, P.B. and Hazlewood, R.D. (1990). Performance Indicators and Information Systems. in D. Reynolds, B. Creemers and T. Peters (Eds.) School Effectiveness and Improvement. Groningen: RION.

Frederickson, J.R. (n.d.). Models for Determining School Effectiveness. Harvard University mimeo.

Fullan, M (1991). The new meaning of educational change, London: Cassell.

Goldstein, H. (1987). Multilevel Models in Educational and Social Research, London: Charles Griffin & Co.

Goldstein, H. and Healy, M.J.R. (1995). The graphical presentation of a collection of means. Journal of the Royal Statistical Society, A, 158, pp175-177.

Goldstein, H., Rasbash, J., Yang, M., Woodhouse, G., Pan, K., Nuttall, D. and Thomas, S. (1993). A Multilevel Analysis of School Examination Results, Oxford Review of Education, 19, (4): 425-433.

Graham, J. (1988). Schools, Disruptive Behaviour and delinquency: A review of the research. London: HSMO.

Gray, J. (1981). A Competitive Edge: examination results and the probable limits of secondary school effectiveness, Educational Review, 33, (1): 25-35.

Gray, J. (1990). The quality of schooling: frameworks for judgements, British Journal of Educational Studies, 38, (3): 20.4-233.

Gray, J., Jesson, D. and Jones, B. (1986). The search for a fairer way of comparing schools' examination results, Research Papers in Education, 1, (2): 91-122.

Gray, J., Jesson, J. and Sime, N. (1990). Estimating Differences in the Examination Performances of Secondary Schools in Six LEAs: a multi-level approach to school effectiveness, Oxford Review of Education, 16, (2): 137-158.

Gray, J., Jesson, D., Goldstein, H., Hedger, K. and Rasbash, J. (1993). A Multi-Level Analysis of School Improvement: Changes in Schools' Performance Over Time, paper presented at the 5th European Conference of the European Association for Research on Learning and Instruction, 3 September, Aix-en-Provence, France.

Hargreaves, D. (1995). School Culture, School Effectiveness and School Improvement. School Effectiveness and School Improvement, Vol 6., No. 1, March. p.23

Hill. P.W. (1995a). School Effectiveness and Improvement. Inaugural Professorial Lecture, 24 May, 1995 Faculty of Education, University of Melbourne.

Hill, P.W. (1995b). Value Added Measures of Achievement. IARTV Seminar Series No. 44, Melbourne.

Hill, P., and Rowe, K. (1994). Multilevel Modelling of School Effectiveness Research. Paper presented at the seventh International Congress for School Effectiveness and Improvement, Melbourne.

Hill, P., Rowe, K. and Holmes-Smith, P. (1995). Factors Affecting Students' Educational Progress: Multilevel modelling of Educational Effectiveness. Paper presented at the 8th ICSIE, Leewarden. Netherlands.

Hoffer, T., Greeley, A., and Coleman, J. (1985). Achievement growth in public and Catholic schools. Chicago, IL: University of Chicago, National Opinion Research Center.

Hopkins. D., Ainscow, M., and West, M. (1994). School Improvement in an Era of Change. London: Cassell.

Jencks, C., Smith, M., Ackland, H., Bane, M., Cohen, D. Gintis, H., Heyns, B. and Michelson, S. (1972). Inequality: A reassessment of the effects of family and schooling in America. New York: Basic Books.

Keeves, J.P. and Bourke, S.F. (1976). Australian Studies in School Performance Volume 1: Literacy and numeracy in Australian schools: A first report. Australian Government Publishing Service, Canberra.

Levine, D.U. (1992). An Interpretive Review of US Research and Practice Dealing with Unusually Effective Schools., in Reynolds, D. and Cuttance, P. (1992) (Eds), School Effectiveness Research, Policy and Practice, London: Cassell.

Levine, D.U. and Lezotte, L. (1990). Unusually Effective Schools: A review and Analysis of research and practice. Madison, WI: National Center for Effective Schools Research and Development.

Levine, D.U. and Stephenson, R.S. (1987). Are effective or meritorious schools meretricious? The Urban Review, 11 (2) pp.63-80.

Lezotte, L. (1989). School improvement based on the effective schools research, International Journal of Educational Research, 13, (7): 815-825.

Louis, K.S. and Miles, M.B. (1991). Toward Effective Urban High Schools: The Importance of Planning and Coping, Chapter 7 in J R Bliss, W A Firestone and C E Richards (Eds) Rethinking Effective Schools. Research and Practice, Englewood Cliffs, New Jersey: Prentice Hall.

Louis, K. and Miles, M. (1992). Improving the urban high school what works and why, London: Cassell.

Luyten, H. (1994). Stability of School Effects in Secondary Education: The impact of variance across subjects and years, paper presented at the annual meeting of the American Educational Research Association, 4-8 April, New Orleans.

Madaus, G.G., Kellaghan, T., Rakow., E.A. and King, D. (1979). The sensitivity of measures of school effectiveness, Harvard Educational Review, 49, 207-230.

Madaus, G.F. (1988). Critical Issues in Curriculum: The Influence of Testing on the Curriculum. National Society for the Study of Education. 87(10, pp. 83-121.

Mandeville, G.K. (1988). School effectiveness indicators revisited: cross year stability. Journal of Educational Measurement, 25, pp. 349-366.

Mandeville, G.K. and Anderson, L. W. (1986). A study of the stability of school effectiveness measures across grades and subjects. AERA paper, San Francisco.

Mann, D. (1992). School Reform in the United States: A National Policy Review 1965-91. School Effectiveness and School Improvement, Vol 3, No. 3, pp. 216-230.

Marko, G.L. (1974). A Comparison of Selected School Effectiveness Measures Based on Longitudinal Data. Journal of Educational Measurement Vol 11, No.4., Winter.

McGaw, B., Piper, K., Banks, D. and Evans, B. (1992). Making Schools More Effective. Hawthorn, Vic: ACER.

McPherson, A. (1992). Measuring added value in schools, National Commission on Education Briefing No 1, February 1992, London.

McPherson, A. and Willms, J.D. (1987). Equalisation and Improvement: some effects of comprehensive reorganisation in Scotland. Paper presented to the Annual Meeting of the American Research Association, May.

Mortimore, P. (1993). School Effectiveness and the Management of Effective Learning and Teaching, School Effectiveness and School Improvement, 4, (4): 290-310.

Mortimore, P., Sammons, P., Stoll, L., Lewis, D. and Ecob, R. (1988a). School Matters: The Junior Years, Wells: Open Books.

Mortimore, P., Sammons, P., Stoll, L., Lewis., D. and Ecob, R (1988b). The effects of school membership on pupils' educational outcomes, Research Papers in Education, 3, (1): 3-26.

Mortimore, P., Sammons, P. and Thomas, S. (1995). School Effectiveness and Value Added Measures, Paper presented at the Desmond Nuttall Memorial Conference 10.6.94, Assessment in Education: Principle Policy and Practice, 1, (3): 315-332.

Murphy, J. (1992). Effective Schools: Legacy and Future Directions. in Reynolds, D and Cuttance, P. (1992) (Eds) School Effectiveness Research, Policy and Practice, London: Cassell.

NSW Department of School Education (DSE) (1996). School Accountability and Improvement Model: A Rationale. Sydney: DSE.

North Carolina State Board of Education (1997). North Carolina Education Standards and Accountability Commission Report,

Nuttall, D.L. (1989). Differential School Effectiveness. Paper presented at the American Education Research Association, San Francisco, March.

Nuttall, D., Goldstein, H., Prosser, R. and Rasbash, J. (1989). Differential School Effectiveness, Chapter 6 in International Journal of Educational Research, special issue Developments in School Effectiveness Research, 13, 769-776.

OECD (1994). Making Education Count: Developing and Using International Indicators, Paris: OECD.

Purkey, S.C. and Smith, M.S. (1983). Effective Schools: A Review, Elementary School Journal, 83, (4): 427-452. Quality of Education Review Committee (QERC) (1985).

Quality of Education in Australia. Report of the Review Committee, Canberra: Australian Government Printing Service.

Ralph, J. H. and Fennessy, J. (1983). Science or Reform: Some Questions about the Effective Schools Model. Phi Delta Kappan, 64, (10): 589-694.

Raudenbush, S.W. and Bryk, A.S. (1989). Quantitative methods for Estimating Teacher and School Effectiveness, in Bock, R.D. (Ed) Multilevel Analysis of Educational Data., Academic Press, New York.

Reid, K., Hopkins, D. and Holly, P. (1987). Towards the Effective School, Oxford. Blackwell.

Reynolds, D. (1976). The delinquent school in P Woods (Ed) The Process of Schooling, London: Routledge & Kegan Paul.

Reynolds, D. (1982). The Search for Effective Schools, School Organisation, 2, (3): 215-237.

Reynolds, D. (1992). School Effectiveness and School Improvement An Updated Review of the British Literature in D Reynolds and P Cuttance (Eds) School Effectiveness Research, Policy and Practice, London: Cassell.

Reynolds, D. and Creemers, B. (1990). School Effectiveness and School Improvement: A Mission Statement, School Effectiveness and School Improvement, 1, (1): 1-3.

Reynolds, D. and Cuttance, P. (1992). (Eds) School Effectiveness Research, Policy and Practice, London: Cassell.

Reynolds, D., Creemers, B., Nesselrodt, P.S., Schaffer, E.C., Stringfield, S. and Teddlie, C. (1994). Advances in School Effectiveness Research and Practice, Oxford: Pergamon.

Reynolds, D et al (1994). School Effectiveness Research: A Review of the International Literature in D Reynolds, B P M Creemers, P S Nesselrodt, E.C. Schaffer, S. Stringfield and C. Teddlie (Eds) Advances in School Effectiveness Research and Practice, Oxford: Pergamon.

Reynolds, D. and Packer, A. (1992). School Effectiveness and School Improvement in the 1990s. in Reynolds, D and Cuttance, P (1992). (Eds) School Effectiveness Research, Policy and Practice, London: Cassell.

Roeleveld, J., de Jong, U., and Koopman, P. (1990). Stabiliteteit van schooleffecten. Tijdschrift voor Onderwijsresearch, 15 (5), pp.301-316.

Rowan, B. and Denk, C.E. (1982). Modelling the academic performance of schools using longitudinal data: an analysis of school effectiveness measures and school and principal effects on school-level achievement. Instructional Management Program Publications. San Francisco: Far West Laboratories.

Rutter, M. (1983). School effects on pupil progress - findings and policy implications, Child Development, 54, (1): 1-29.

Rutter, M., Maughan, B., Mortimore, P. and Ouston, J. (1979). Fifteen Thousand Hours. Secondary Schools and their Effects on children, London: Open Books.

Sammons, P., Hillman, J. and Mortimore, P. (1995). Key Characteristics of Effective Schools: A review of the school effectiveness research. Institute of Education, University of London, London.

Sammons, P. (1987). Findings from School Effectiveness Research., A Framework for school improvement, keynote paper presented to the Annual Convention of the Prince Edward Island Teachers' Federation on 'School Atmosphere: The Barometer of Success', Charlottetown, Prince Edward Island, Canada, 29-30.10.87.

Sammons, P., Mortimore, P. and Thomas, S. (1993a). Do schools perform consistently across outcomes and areas? Paper presented to the ESRC Seminar Series "School Effectiveness and School Improvement', July 1993, University of Sheffield.

Sammons, P., Nuttall, D. and Cuttance, P. (1993b). Differential School Effectiveness: Results from a reanalysis of the Inner London Education Authority's Junior School Project data, British Educational Research Journal, 19, (4): 381405.

Sammons, P., Cuttance, P., Nuttall, D. and Thomas, S. (1994a). Continuity of School Effects: A longitudinal analysis of primary and secondary school effects on GCSE performance. Paper originally presented at the sixth International Congress for School Effectiveness and improvement, Norrkoping, Sweden and revised version submitted to 'School Effectiveness and School Improvement".

Sammons, P., Thomas, S., Mortimore, P., Owen, C. and Pennell, H. (1994b). Assessing School Effectiveness.- Developing Measures to put School Performance in Context, London: Office for Standards in Education (OFSTED).

Schagen, I. (1994). Multilevel Analysis of the Key Stage 1 National Curriculum Assessment Data in 1991 and 1992). Oxford Review of Education, Vol 20, No. 2, p.163.

Scheerens, J. (1992). Effective schooling: Research, theory and practice, Cassell, London.

Scheerens, J., Vermeulen, C.J. and Pelgrum, W.J. (1989). Generalisability of Instructional and School Effectiveness Indicators across Nations. International Journal of Educational Research, Vol. 13 No. 7, pp. 789-99.

Sime, N. and Gray, J. (1991). Struggling for Improvement: Some estimates of the contribution of school effects over time. Paper presented to the Annual Conference of the British Education Research Association.

Smith, M.S. (1988). Education Indicators, Phi Delta Kappan, Vol 69 (7), pp. 487-491.

Smith, D.J. and Tomlinson, S. (1989). The School Effect: A Study of Multi-Racial Comprehensives, London: Policy Studies Institute.

Stoll, L. and Fink, D. (1994). Views from the field: linking school effectiveness and school improvement School Effectiveness and School Improvement, 5, (2), 149-177.

Tabberer, R. (1994). School and Teacher Effectiveness. Slough: NFER.

Teddlie, C. and Stringfield, S. (1993). Schools Make a Difference: Lessons learned from a 10 year study of school effects, New York: Teachers College Press.

Thomas, S. and Goldstein, H. (1995) Value-added, What Next? University of London. internal working paper.

Thomas, S. and Mortimore, P. (1994). Report on Value Added Analysis of the 1993 GCSE examination Results in Lancashire, Institute of Education, University of London, London.

Thomas, S. and Mortimore, P. (1994). Report on value added analysis of 1993 GCSE examination results in Lancashire (in press) Research Papers in Education.

Thomas, S., Sammons, P. and Mortimore, P. (1994). Stability and Consistency in Secondary Schools' Effects on Students' GCSE Outcomes, paper presented at the annual conference of the British Educational Research Association, 9 September, St Anne's College, University of Oxford.

Tymms, P. (1995). A comment on Gray, Jesson, Goldstein, Hedger and Rasbash. School Effectiveness and School Improvement, Vol. 6, No. 2, pp.115-117.

WaIberg, H. J. (1986). Syntheses of Research on Teaching, Chapter 7 in M C Wittrock (Ed) Handbook of Research on Teaching, New York: Macmillan.

Webster, W. J., Mendro, R.L. and Almaguer, T.O. (1994). Effectiveness Indicators: A Value-Added Approach to Measuring School Effectiveness. Studies in Educational Evaluation, Vol 20, PP. 113-145.

Williams, T. and Carpenter, P. (1987). Private Schooling and Public Achievement Paper presented at the AARE conference, 3-6 January.

Willms, J. D. (1988). Estimating the stability of school effects with a longitudinal, hierarchical linear model. AERA paper, New Orleans.

Willms, J. D. (1992). Monitoring School Performance: A Guide for Educators, London: Falmer.

Willms, J. D. and Cuttance, P. (1985). School Effects in Scottish secondary schools. British Journal of Sociology of Education. 6 (3) pp.289-306.

Willms, J. D. and Raudenbush, S. W. (1989). A longitudinal hierarchical linear model for estimating school effects and their stability, Journal of Educational Measurement, 26, (3): 209-232.

Witziers, B. (1994). Coordination in secondary schools and its implications for student achievement, paper presented at the annual conference of the American Educational Research Association, 4-8 April, New Orleans.

Wyatt. T. J. and Ruby, A. (1988). Using Performance Indicators to judge the effectiveness of schools. Reporting Educational Progress Monograph series. Australian Conference of Directors-General of Education, Sydney.

Wyatt, T. J. (1994). Developing performance indicators to improve quality performance in education. Paper prepared for IIR Conference, Sydney, December, 1994.

Wyatt, T.J. (1995). Models for the Analysis of Student Outcomes Data: A Discussion Paper. Quality Assurance Directorate, NSW DSE, internal working document.

Author: Mr Tim Wyatt is an EdD student at the University of Western Sydney - Nepean Campus. He was formerly a Director in the NSW Premier's Department and is now employed in the NSW Department of School Education.

Please cite as: Wyatt, T. (1996). School effectiveness research: Dead end, damp squib or smouldering fuse? Issues In Educational Research, 6(1), 79-112. http://www.iier.org.au/iier6/wyatt.html

[ IIER Vol 6, 1996 ] [ IIER Home ]

© 1996 Issues in Educational Research
Last revision: 22 Oct 2013. This URL: http://www.iier.org.au/iier6/wyatt.html
Previous URL: http://education.curtin.edu.au/iier/iier6/wyatt.html
Previous URL from 21 Jan 1998 to 2 Aug 2001: http://cleo.murdoch.edu.edu.au/gen/iier/iier6/wyatt.htm
HTML : Clare McBeath [c.mcbeath@bigpond.com] and Roger Atkinson [rjatkinson@bigpond.com]