Aggregating school based findings to support decision making: Implications for educational leadership
Theodore S. Kaniuka
Fayetteville (NC) State University
Michael R. Vitale
East Carolina University
Nancy R. Romance
Florida Atlantic University
Successful school reform is dependent on the quality of decisions made by educational leaders. In such decision making, educational leaders are charged with using sound research findings as the basis for choosing school reform initiatives. As part of the debate regarding the usability of various evaluative research designs in providing information in support of decision making, randomised field trials (RFT) have been advanced as the only valid way of determining program effectiveness. This paper presents a methodological rationale that would apply multi-level statistical analysis to aggregated, pre-post intervention data readily available from multiple school sites within a multiple baseline design framework to provide educational leaders with a valid alternative to RFT designs. Presented and discussed is an illustrative application of the potential value of such a design, in establishing the effectiveness of a cluster of reading programs in a form, that is directly applicable by educational decision makers considering reform initiatives and involving developmental reading.
This paper argues that the use of research to support educational decision making can be enhanced by applying the logical framework of multiple-baseline designs through the large-scale multilevel statistical analysis (Raudenbush & Bryk, 2001) of school-based evaluative data that report pre- and post intervention achievement findings. By applying multiple-baseline design logic to such multi-site data, the finding of multilevel statistical analyses of such data could provide the type of information about innovation effectiveness that, ultimately, is of greater direct utility to educational leaders than RFT experimental studies which, because of their cost, are highly limited. Therefore, the question raised in this paper is about constructing an informational framework that educational decision makers can use as a valid alternative to RFT. And, by implication, if such a valid informational framework exists, are RFT studies necessary (vs. just sufficient) for making sound instructional decisions?
The use of research by educational leadership to support policy decisions and school reform has a problematic history (Lagemann, 2002) that has resulted in the educational profession being accused of not being a research-based profession (see Hess, 2007). Recently Levin (2010) suggested that to increase the use of research by school leadership, focusing on the social aspect of how leaders access and evaluate information is an important element in the decision making process. The social aspect of how people access information has been studied in depth. For example, in summarising research on the diffusion of innovations, Rogers (2003) commented that people often rely on non-scientific sources as they make decisions about adopting innovations. Other researchers supporting this view found that managers and other leaders rely on their own experiences and opinions of colleagues more than on research evidence (Dobbins et al. 2007; Maynard 2007). When making decisions, educational leaders behave similarly in seeking the advice of colleagues, reflect on their own experiences, and depend on localised knowledge (Kochanek & Clifford, 2011). Contributing to the reluctance of many practitioners to use research and scientific knowledge is also that they often find the style of research presentations difficult to understand, regarding the way it is conducted as de-contextualised, and its relevance is severely limited (Fusarelli, 2008). The question, then, is what can researchers do to improve the access, understanding, and ultimate use of research findings by educational leaders?
Considering the above, Coburn, Honig and Stein, (2008) offered a comprehensive review of the literature on district level use of research and evidence during decision-making. While they argued that researchers needed to continue to provide high quality research, they also noted that researchers also needed to consider district variables that often determine if and how research evidence is used. Specifically, in support of this view, Coburn et al. suggested that researchers needed to adopt an important role of "supporting the development of district capacity to effectively engage in research activities so that district personnel attend to and access this research." p. 29.
While central to the purpose of educational research, the idea of developing causal arguments can only yield potential value if they motivate people to action. Wiliam and Lester (2008) offered the view that to actualise research into action, moving away from the generation of "knowledge" and theories toward inspiring individuals to action is fundamental. The problem for educational researchers, then, is to present the information they produce in a manner that inspires educational practitioners into action. This idea was discussed by Flyvbjerg (2001) who argued that research needs to be considered a phronetic activity, that is, an activity that induces action. He stated:
... phronetic social science is problem-driven and not methodology-driven, in the sense that it employs those methods which for a given problematic best help answer the four-value rational questions [Where are we going? Who gains and who loses, and by which mechanisms of power? Is this development desirable? What, if anything, should we do about it?]. (p. 196)Clearly, one interpretation of the overly-strong emphasis on RFT is that of methodology over problem-solving that calls into question the relevance of adhering to this methodology alone when alternatives exist that also offer valid answers which motivate people to action. Furthermore, the engendering of actions through the dynamics of knowledge must necessarily incorporate the circumstances in which it is developed to be transferable across educational contexts to meet the localised requirement many educational leaders need in order to see research as relevant.
Beyond the relevancy concern is one of professional ethics. For example, in addition to providing an approach for documenting the effectiveness of educational interventions by using aggregated school-based evaluation, the methodological approach presented in this paper also provides a localised context for interpretation of the findings by practitioners. In doing so, positive achievement outcomes resulting from the use of this methodology would allow practitioners to adopt interventions validated as effective rather than denying such services to students because other standards of alternative approaches (e.g. RFT) have not been met. Bulterman-Bos (2008) suggested that researchers must adopt a view of education as moral practice. Her belief was that the way researchers conceptualise education dictates how they practise research and also how the results of the work they do is communicated. In this regard, even the results of well-designed RFT studies cannot be universally applied with the expectation that such results will be transferable. Rather, all research, including RFT studies, requires extensive replication in order to determine the degree to which contextual factors potentially affecting fidelity of implementation result in divergent outcomes that cause practitioners to question the utility of the original research. In contrast, the use of multiple-baseline design logic conceptualises education as a system made up of multiple loosely-coupled systems (Weick, 1976) that are real world representations of the contexts in which large-scale reform occurs. As a result, any findings done across such diverse contexts are better able to immediately and accurately communicate relevant information to educational decision-makers.
As an example illustrating of multiple baseline design logic, an experimental study might involve three experimental units (e.g., schools) from which baseline data would be obtained for a series of intervals (e.g., tests by years), after which the experimental intervention would be introduced to one randomly-selected unit while data collection continued for all three. Then, after the experimental effect of the intervention stabilised, the intervention would be continued with the first unit and implemented with a second randomly selected unit. Again, data collection would continue from all three sites until the effect of the second treatment stabilised as well.
The point of such an experimental study would be to demonstrate that the experimental effect observed (in comparison to the performance obtained without the treatment being implemented) would result after the intervention was implemented. And, the resulting inferential form of "causal" conclusion would be that the experimental effect could be produced by implementing the instructional treatment. Of course, if the expected effect did not occur, then the study would be classified a failure. But the important points from the standpoint of experimental design are (a) that the emphasis in the overall design is on the time-lagged replication of effects and (b) that the logical scope of the design subsumes that of randomised field trials (RFAs) for which the randomised assignment of treatment is not time-lagged (i.e., not implemented at the same point in time).
While logically powerful, implementing multiple-baseline experimental or quasi-experimental designs typically are not feasible in school evaluation settings because of the resource-intensive, multiyear implementation/instrumentation requirements. However, multiple-baseline design logic is readily applicable to evidence-based conclusions that result from the aggregation of school-based pre-post evaluative data when the instructional intervention of interest has been implemented in a time-lagged fashion across schools in different settings. The point of this paper is that, when appropriate existing forms of evaluative data that meet the multiple baseline design logic requirements can be obtained and analysed, the results of aggregating such data can provide informational support to decision makers forced to make decisions in "real time" regarding the potential effectiveness of an instructional intervention.
Given the preceding, the benefits to educational leaders of using such a multiple-baseline design framework are the following. Although all experimental studies require implementation of an intervention under control of the researcher(s), a pattern of data following the logic of a multiple-baseline design can yield meaningful conclusions as long as (a) comparable data can be used to integrate the effects of an intervention across different (and independent) sites and (b) the adoption of the interventions are time-staggered. Such a rationale provides the means to integrate independent components that, without such integration, would not yield interpretable data. In effect, the argument advanced is: If an intervention can be shown to produce results on a before-after intervention context across many independent sites, then, from an evaluative standpoint, the resulting data can be interpreted as "causal", subject to probabilistic qualifications.
In the following example, publically available publisher's data were used to illustrate the use of multiple-baseline designs as an evaluative tool using existing pre-post intervention and post-intervention performance trends associated with results of the implementation of school reform reading initiatives across a wide range of schools in diverse contexts over time. While interpreting such findings raises questions regarding sampling bias (i.e., only positive findings are reported), the collection of findings across schools represents a correlational form of a multiple-baseline design with missing data that, as such, is amenable to multi-level statistical analysis. From the standpoint of replicability, the resulting form of statistical analysis provides a meaningful framework for the evaluation of any new specific intervention implemented in multiple schools sites.
In the 2-level HLM statistical model used, school demographic characteristics (percent of minority students, percent of students on free/reduced lunch) were coded at level 2 and the multi-year information nested within schools coded at level 1. In assigning the level 1 variables for the HLM model, a dummy variable for treatment indicated whether achievement data were obtained prior to or after the intervention (0= prior, 1= after) and served as a test for the pre/post intervention effect. Although the majority of schools reported achievement data by grade, in a few schools achievement data were reported for a group of grades (e.g., for grades 3-4-5). For those schools, the mean grade was used for the grade-level predictor. In addition, to assess the effects of the reading intervention after it was initiated, the number of years using the reading program was coded as a second treatment predictor (e.g.. initial/first year of implementation = 0, second year = 2, etc.) and years prior to the reading intervention coded as -1, -2, etc.). In addition, to the two coded treatment variables, grade level and standardised within-school reading achievement were included as Level 1 variables in the HLM model.
Specifically the two level HLM model used was the following:
Level-1 Model
Reading_Achievementij = β0j + β1j*(Test_1ij) + β2j*(Test_2ij) + β3j*(Grade_Levelj) + rijLevel-2 Model
β0j = γ00 + γ01*(Pct_Minj) + γ02*(Pct_FRL) + u0j; β1j = γ10 + u1j; β2j = γ20 + u2j; β3j = γ30 + u3j;The rationale for the HLM model used was to consider schools as if they were individuals for whom repeated measures were available for analysis (i.e., considering such repeated measures nested within individuals or, in the present application, as years within schools). In testing model components, HLM computes regression coefficients appropriate for variables at each level. In the case of the major treatment variable, the HLM coefficient for treatment (coded as 1 or 0) indicated the overall effect of the reading program. In addition, the HLM coefficient for the post-treatment treatment variable indicated whether the effect of the treatment accelerated after initial implementation.Where, for Level 1 variables: Test_1 = 1 if prior to the intervention, 0 if after; Test_2 = linear coefficients assigned to years after implementation; Grade_Level = Grade Level for the data (in a few cases, for schools having multiple grade levels, average grade was used); Reading_Achievement = standardised within school reading achievement.
For Level 2 variables: Pct_Min = percent of minority students in a school; Pct_FRL = percent of students in a school receiving free or reduced lunch. (Note: rij and (u0j, u1j; u2, u3j;) are Level 1 and Level 2 error terms, respectively.
Fixed effect | Standard. approximation | |||||
Coefficient | Error | T-ratio | d.f. | P-value | ||
For INTRCPT1, β0 | INTRCPT2, γ00 | -1.017 | 0.33 | -3.05 | 71 | 0.00 |
Pct_Min, γ01 | 0.003 | 0.002 | 1.35 | 71 | 0.18 | |
Pct_FRL, γ02 | 0.006 | 0.003 | 2.25 | 71 | 0.03 | |
For Test_1 slope, β1 | INTRCPT2, γ10 a | 0.570 | 0.108 | 5.29 | 573 | 0.00 |
For Test_2 slope, β2 | INTRCPT2, γ20 b | 0.220 | 0.025 | 8.66 | 573 | 0.00 |
For Grade_Level slope, β3 | INTRCPT2, γ30 | -0.040 | 0.021 | -1.87 | 573 | 0.06 |
a. 95% Confidence Interval for Test_1 Treatment is: [+.36, +.78] b. 95% Confidence Interval for Test_2 is: [+.17, +.27] |
The statistical analysis of these school-based school-level evaluative data reported by SRA showed that the introduction of the direct instruction reading programs across a wide variety of school sites was followed by a significant improvement in student reading achievement and that the achievement levels displayed by students increased with continued use of the programs and that these effects were consistent across grade levels. In interpreting these findings (see Table 1), educational decision makers could expect that the initial year of implementation of such programs would result in a pre-post implementation increase of .57 standard deviations of their within-school standardised reading achievement level (i.e., z equivalent = .57). In addition, as shown in Figure 1, beginning with year 2 of implementation, the .57 achievement expectation also would increase an additional .22 standardised units per year (i.e., Effect for Year 2 = .79; for Year 3 = 1.01; for Year 4 = 1.23).
Figure 1: Effect of reading curriculum on reading achievement over a four year period
Since these data represented multi-year implementation periods, it is possible that the increased achievement trends reflected both improved teacher skills in program delivery as well as the cumulative effect of the reading curriculum on student achievement levels as they entered succeeding grades. Because, consistent with a multiple-baseline design requirement, the initiation of the treatments was distributed across a multi-year time span, the group of findings reported across the multiple schools demonstrated effective replication of the effect of the programs (see also Adams & Engelmann, 1996).
District administrators appear to place a higher value on information coming from other districts than that coming from the research community or state education agency. When discussing sources of information for these two decision points and the frequent use of other districts as models, respondents talked about the value they placed on using the work of those who have actually put ideas into practice. (p.13)Additionally, the diverse sources of local data used in the methodological approach presented in this paper incorporate the points raised by Kochanek and Clifford (2011) regarding the connecting of the findings to practitioner decision makers.
The study example in this paper that illustrated the use of data that were generated as a result of a series of interventions in working school settings. A broad perspective is that with the emphasis on local school evaluation, the approach advocated in this paper may become more important in supporting school administrators in their decision making efforts than RFT. This perspective is consistent with the research on knowledge diffusion and context (see Rogers, 2003) and the idea of relevancy (Fusarelli, 2008). Additionally, Coburn et al (2008) posited that interpretation is one of three critical stages that educators transition through when using research. As shown here, multiple-baseline design logic can use existing data in a manner that is more readily interpretable because of how the data were generated and the ease with which educational leaders could, through follow-up contacts with other educators, estimate future results from adopting the intervention with fidelity.
The use of the multiple-baseline framework presented suggests the usefulness of aggregating disparate evaluative data into useful patterns focusing on replication of findings. Because of the emphasis in the design upon inter-site replication of time-distributed interventions, educational decision makers are able to consider such findings as providing a "causal conclusion" of program effectiveness that is far stronger than a "proof-of-concept" demonstration based on pre-post data alone. Clearly such findings provide an evidenced-based perspective to consider the feasibility of adopting such interventions that is far better than the "no information available" that would result from waiting from studies to be conducted that meet RFT methodological requirements. Given the importance of addressing educational needs, the emphasis on the replicability of evaluative findings when arranged in multiple-baseline design logic has the potential to be a useful form of evaluation information for such decision makers.
The engagement in research by local educators has been argued (see Cooper, Levin, & Campbell, 2009; Honig, 2007; Education Week, 2013) as a basis for forming partnerships to promote the use of research in real school settings. The formation of such partnerships is important because it potentially provides the means through which local school districts could identify and pool existing pre-post implementation achievement data for analysis using the combination of multiple baseline design and multilevel analysis methodology presented in this paper. Further, through such partnerships, a database for the cumulative collection of such evaluative data could be established. Overall, such partnerships would provide the means for collaborative data collection, data analysis, and dissemination of findings. And, as part of such disseminations, educational leaders using findings would be able to pursue follow-up contact with other leaders in demographically similar school districts to identify implementation requirements. Overall, as pre-post achievement data from multiple content-similar school settings is obtained, the model presented here has promising implications for advancing sound, evidence-based school decision-making.
Burch, P. (2007). Educational policy and practice from the perspective of institutional theory: Crafting a wider lens. Educational Researcher, 36(2), 84-95. http://dx.doi.org/10.3102/0013189X07299792
Bulterman-Bos, J. (2008). Response to comments: Clinical study: A pursuit of responsibility as the basis of educational research. Educational Researcher, 37(7), 439-445. http://dx.doi.org/10.3102/0013189X08326296
Chatterji, M. (2008). Comments on Slavin: Synthesizing evidence from impact evaluations in education to inform action. Educational Researcher, 37(1), 23-26, http://dx.doi.org/10.3102/0013189X08314287
Coburn, C. E., Honig, M. & Stein, M. K. (2008). What is the evidence on districts' use of evidence? In L. Gomez, J. Bransford & D. Lam (Eds.), Research and practice: The state of the field. Cambridge, MA: Harvard Education Press.
Cooper, A., Levin, B. & Campbell, C. (2009). The growing (but still limited) importance of evidence in education policy and practice. The Journal of Educational Change, 10(2-3), 159-171. http://dx.doi.org/10.1007/s10833-009-9107-0
Dobbins, M., Rosenbaum, P., Plews, N., Law, M. & Fysh, A. (2007). Information transfer: What do decision makers want and need from researchers? Implementation Science, 2:20. http://www.implementationscience.com/content/2/1/20
Education Week (2013). Spotlight: On data driven decision making. http://www.edweek.org/ew/marketplace/products/spotlight-data-driven-decisionmaking-v2.html?cmp=EB-SPT-032113
Ellis, A. (2005). Research on educational innovations. Larchmount, NY: Eye on Education.
Flyvbjerg, B. (2001). Making social science matter. Why social inquiry fails and how it can succeed again. Cambridge, UK: Cambridge University Press.
Fusarelli, L. (2008). Flying (partially) blind: School leaders' use of research in decision-making. Phi Delta Kappan, 89(5), 365-368. http://www.kappanmagazine.org/content/89/5/365.abstract
Henig, J. R. (2008/2009). The spectrum of educational research. Educational Leadership, 66(4), 6-11. http://www.ascd.org/publications/educational-leadership/dec08/vol66/num04/The-Spectrum-of-Education-Research.aspx
Hess, F. M. (2008/2009). The new stupid. Educational Leadership, 66(4), 12-17. http://www.ascd.org/publications/educational-leadership/dec08/vol66/num04/The-New-Stupid.aspx
Hess, F. (2007). When research matters. Cambridge, MA: Harvard Education Press.
Honig, M. I. & Coburn, C. E. (2007). Evidence-based decision-making in school district central offices: Toward a policy and research agenda. Educational Policy, 22(4), 578-608. http://dx.doi.org/10.1177/0895904807307067
IES (Institute of Education Sciences) (2013). Requests for applications. U.S. Department of Education. http://ies.ed.gov/funding/14rfas.asp
Kochanek. J. & Clifford, M. (2011). Refining a theory of knowledge diffusion among district administrators. A paper presented at the American Educational Research Association Annual Meeting, New Orleans, LA. (April 9, 2011)
Lagemann, E. (2002). Usable knowledge in education research. New York: Spencer Foundation.
Lesik, S. A. (2006). Applying the regression-discontinuity design to infer causality with non-random assignment. The Review of Higher Education, 30(1), 1-19. http://muse.jhu.edu/journals/review_of_higher_education/toc/rhe30.1.html
Levin, B. (2010). Leadership for evidence-informed education. School Leadership & Management, 30(4), 303-315. http://dx.doi.org/10.1080/13632434.2010.497483
Marchand-Martella, N. E., Slocum, T. A. & Martella, R. C. (2003). Introduction to direct instruction. Columbus, OH: Allyn & Bacon.
Maynard, A. (2007). Translating evidence into practice: Why is it so difficult? Public Money and Management, 27(4), 251-256. http://dx.doi.org/10.1111/j.1467-9302.2007.00591.x
Moss, B. & Yeaton, W. (2006). Shaping policies related to developmental education: An evaluation using the regression-discontinuity design. Educational Evaluation and Policy Analysis, 28(3), 215-229. http://dx.doi.org/10.3102/01623737028003215
No Child Left Behind Act of 2001 Pub. L. No. 107-110, 115 Stat. 1425 (2002). http://www.ed.gov/policy/elsec/leg/esea02/
Nutley, S., Walter, I. & Davies, H. T. O. (2007). Using evidence. Bristol: The Policy Press.
Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501-525. http://dx.doi.org/10.1146/annurev.psych.52.1.501
Raudenbush, S. W. & Bryk, A. S. (2001). Hierarchical linear models: Applications and data analysis methods. (2nd ed.). Thousand Oaks, CA: Sage.
Rickinson, M. (2005). Practitioners' use of research: A research review for the National Evidence for Education Portal (NEEP) Development Group. (Working Paper). London: National Educational Research Forum. http://www.eep.ac.uk/nerf/word/WP7.5-PracuseofRe42d.doc?version=1?
Ronka, D., Lachat, M. A., Slaughter, R. & Meltzer, J. (2008/2009). Answering the questions that count. Educational Leadership, 66(4), 18-24. http://www.ascd.org/publications/educational-leadership/dec08/vol66/num04/Answering-the-Questions-That-Count.aspx
Rogers, E. (2003). Diffusion of innovations (5th ed.). New York: Free Press.
Scientific Research Associates (n.d.). What is direct instruction? https://www.mheonline.com/ assets/sra_download/ReadingMasterySignatureEdition/MoreInfo/DI_Method_2008.pdf
Sidman, M. (1960). Tactics of scientific research. New York: Basic Books.
Slavin, R. E. (2008a). Response to comments: Evidence-based reform in education: Which evidence counts? Educational Researcher, 37(1), 47-50. http://dx.doi.org/10.3102/0013189X08315082
Slavin, R. E. (2008b). Perspectives on evidence-based reform in education - What works? Issues in synthesizing educational program evaluations. Educational Researcher, 37(1), 5-14. http://dx.doi.org/10.3102/0013189X08314117
Sloane, F. (2008). Comments on Slavin: Through the looking glass: Experiments, quasi-experiments, and the medical model. Educational Researcher, 37(1), 41-46. http://dx.doi.org/10.3102/0013189X08314835
Stuart, E. A. (2007). Estimating causal effects using school-level data sets. Educational Researcher, 36(4), 187-198. http://dx.doi.org/10.3102/0013189X07303396
Van der Heyden, A., Witt, J. & Gilbertson, D. (2007). A multi-year evaluation of the effects of a Response to Intervention (RTI) model on identification of children for special education. Journal of School Psychology, 45(2), 225-256. http://dx.doi.org/10.1016/j.jsp.2006.11.004
Weick, K. (1976). Educational organizations as loosely-coupled systems. Administrative Science Quarterly, 21(1), 1-9. http://www.jstor.org/stable/2391875
West, S., Duan, N., Pequegant, W., Gasit, P., Des Jarlasis, D., Holtgrave, D., Szapocznik, J., Fishbein, M., Rapkin, B., Clatts, M. & Mullen, P. (2008). Alternatives to the randomized controlled trial. American Journal of Public Health, 98(8), 1359-1366. http://dx.doi.org/10.2105/AJPH.2007.124446
Wiliam, D. & Lester, F. K. Jr. (2008). On the purpose of mathematics education research: Making productive contributions to policy and practice. In L. D. English (Ed.), Handbook of international research in mathematics education (2nd ed., pp. 32-48). New York: Routledge.
Authors: Dr Theodore S. Kaniuka is Associate Professor of Educational Leadership in the Department of Educational Leadership at Fayetteville State University. His present research interests are in the areas of high school reform, in particular Early College High Schools, research methods and program evaluation, and educational policy. Email: tkaniuka@uncfsu.edu Dr Michael R. Vitale is Professor of Educational Research in the Department of Curriculum and Instruction at East Carolina University. His research includes the development of models and operational instructional initiatives to raise school achievement expectations as well as instructional design principles to undergraduate teacher education programs. Email: vitalem@ecu.edu Dr Nancy R. Romance is Professor of Science Education in the Department of Teaching and Learning at Florida Atlantic University. She holds degrees in Educational Leadership and in Science. Her research interests include a K-5 model that integrates literacy within in-depth science instruction and student vocabulary development. Email: romance@fau.edu Please cite as: Kaniuka, T. S., Vitale, M. R. & Romance, N. R. (2013). Aggregating school based findings to support decision making: Implications for educational leadership. Issues in Educational Research, 23(1), 69-82. http://www.iier.org.au/iier23/kaniuka.html |