QJER logo 2
[ Contents Vol 6, 1990 ] [ QJER Home ]

The unit of analysis "problem" in educational research

B. E. Tainton


Research in and on educational institutions is complicated by the multiplicity of contexts in which educational activities occur, and by the hierarchical nature of these contexts: students within instructional groups; groups within classes; classrooms and teachers within schools; schools within regions, and so on. Given the accountability pressures on education systems in the western world to acquire, analyse and interpret pertinent information on educational inputs, processes and outcomes, it is perhaps timely that attention is drawn once again to the problems that researchers, evaluators and others should consider in order to provide valid indicators of system performance.

Within the confines of one short paper, the purpose of this article is to demonstrate the need for researchers to make informed decisions regarding the levels at which educational data are conceptualised, measured and analysed by: (a) discussing some of the fundamental concepts and issues associated with the unit of analysis problem; (b) drawing attention to potential difficulties of interpretation when data are analysed at different levels, by presenting data on the structure and climate of Queensland primary schools; and (c) discussing briefly some theoretical and empirical implications.

In general, a non-technical approach has been adopted. More complex statistical treatments and examples of the issues may be found in Burstein (1980), Cronbach (1976), Knapp (1977), Langbein and Lichtman (1978), Sirotnik (1980) and Cooley, Bond and Mao (1981).

Background

In generating and analysing data to discern educational effects or relationships among salient variables, researchers should be ever vigilant that concepts and measurement are considered at appropriate 'levels'. Evidence that analyses based on different units or contexts can lead to very different interpretations has accumulated in the forty years since Robinson's (1950) seminal paper on the analysis of social data at various levels of aggregation. Burstein (1980) has provided a good review of the progress made in considering such issues as adequate analytic models, appropriate measurement units, conceptualisation of 'group' and 'individual' variables, and inferential problems of data aggregation. On a more empirical note, Sirotnik (1980) and his associates in their study of American schooling grappled with the theoretical and practical issues of analysing data generated by teachers and students in varying educational contexts and at individual person and school levels of aggregation. In the past two decades, articles in the annual Review of Research in Education have increasingly acknowledged, either directly or indirectly, the multilevel nature of educational data.

Whatever the progress, there is also evidence that the recognition and accommodation of units of analyses in educational research require greater efforts. Knapp (1982), for instance, reviewed articles published in the Educational Administration Quarterly and concluded that conceptual and analytical considerations had been handled with 'varying qualities of methodological rigour' (p.1). Miskel and Sandlin (1981) found no articles in the Journal of Educational Administration in which the units of analyses had been explicitly identified. Such a situation is by no means confined to the discipline of educational administration, and the famous admonition of Cronbach (1976) that the majority of studies of educational effects 'have collected and analysed data in ways that conceal more than they reveal' (p.1) is indicative of researchers' difficulties in coming to grips with unit of analysis and related issues.

In contrast to the advances in fundamental statistical knowledge on the analysis of data at different levels of aggregation and disaggregation, there appears to be less accessible literature which gives guidance to educational researchers in making what are often difficult decisions about the nature of their variables and the appropriate level(s) at which the data should be analysed, or which demonstrates the empirical consequences of making 'wrong' decisions.

The Nature of the 'Problem'

At the outset, it is necessary to define carefully terms which, unfortunately, have been confused with one another in the research literature.

The unit of analysis is the entity on which there are data and which will be subjected to statistical analysis; for example, investigating the differences in classrooms for which there are various measures of aspects of the learning environment.

The unit of observation is the entity on which the original measurements are made; in the previous example, student perceptions of certain dimensions of the learning environment may be the means for acquiring information about the classroom.

The context of the analysis refers to the theoretical framework within which the analysis will be conducted; for example, an investigation of the impact of learning environments on student performance would require an understanding of the nature of the variables, a perspective on whether data aggregation or disaggregation would be meaningful, and a recognition of appropriate relationships between constructs and their measurement.

Frequently the unit of observation and the unit of analysis are the same. Such would be the case if relationships between student achievement and student attitudes were to be examined regardless of classroom membership. There are many situations, however, when they are not the same, and the researcher, therefore, is confronted with a number of important decisions.

In addition to the multiplicity of contexts, much educational research is correlational in nature. Indices of relationships among variables may be computed at three different 'levels'. The total analysis is based on the aggregate of the observations, regardless of group factors such as classrooms or schools. The within analysis is also an individually-based approach, but indices of association among the variables are computed separately for each group and then averaged across the groups. In the between analysis, the group is the unit of analysis whereby scores on the variables are computed for each group, and then interrelated.

The relationship among these three indices, the statistical procedures for their computation, and issues of interpretation, have all attracted attention since Pearson (1896) who reported correlations between breadth and length of human skulls to be .09 for males, -.14 for females and .20 for the combination of the two sexes. For the next five decades the literature on the effects of social data aggregation appears to have been dominated by sociological interests. Robinson ( 1950) demonstrated mathematically the anomalies that can occur when computing correlations within groups, between groups and across the totality of individuals reg ardless of groups. In answer to the question of the relationship between literacy in the USA and skin colour, Robinson arrived at a correlation of .20 between the two variables when the person was the unit of observation and the unit of analysis, .77 when the state was the unit of analysis, and .95 with major geographical areas as the unit of analysis. Furthermore, he found that correlations between literacy and nativity computed for the three units of analysis were .12, -.53 and .62 respectively, a warning to other sociologists of the 'ecological fallacy' and the possible differences in relationships both in magnitude and sign for levels of data aggregation. More recent research (Cronbach, 1976; Knapp, 1977) has demonstrated the confounding of between-aggregate and within-aggregate effects in deriving correlation coefficients at the total level of analysis.

Educational concern for estimating association indices from aggregated data increased from the 1970s as program evaluation and research on educational effects and processes gained prominence. Greater sophistication in computational procedures has been applied by Burstein (1980), Cronbach (1976), Knapp (1977, 1982) and others who have grappled with the issues that concerned Pearson so long ago. There has been evidence also of a greater awareness of the need to extend bivariate analytical models to accommodate the multivariate and complex nature of educational data.

One of the most significant developments in the past two decades has been a greater focus on the selection of appropriate units of analysis, with a gradual realisation that the choice should be based on substantive as well as statistical considerations. Researchers such as Burstein (1980), Cronbach (1976) and Sirotnik (1980) would regard the levels of statistical treatment to be interesting and perhaps potentially conflicting, but of dubious consequence until one addresses the substantive question: What property of what object is being measured in the research? If the property is basically 'systemic' (that is, intrinsic to the group), then the between analysis is probably the more appropriate, and average scores across individuals within groups is one way of providing an indication of a system variable. If the property is fundamentally 'phenomenological' (that is, intrinsic to the individual), but within a group context, then the within analysis is more appropriate. The total analysis becomes meaningful only in the context that the nature of the group is entirely irrelevant to what is being measured.

Identifying the properties of concepts clearly as systemic or phenomenological can often be a difficulty for researchers, given the range of sociological and psychological perspectives underlying much of educational data. Furthermore, whether variables measured at different levels of aggregation or disaggregation are indicators of the same, different or meaningless constructs depends very much upon the theoretical perspectives being deployed.

Sirotnik's (1980) distinction between 'systemic' and 'phenomenological' constructs also has implications for their measurement. Suppose the following items, for teachers as respondents, have been designed to provide indicators of the concept of co-operation:

  1. I am usually a co-operative kind of person.
  2. I co-operate with other teachers at this school.
  3. We generally co-operate with one another at this school.
  4. You co-operate with others at this school.
  5. Teachers work with one another at this school.
What is being measured by each of these items? Is it something about the teacher, the staff, or the school? Or is it more than one concept being measured? If scores were computed for a school, what meaning could be imputed to such scores? Would the scores represent an intrinsic property of the school or a description of the 'average' teacher at the school? Is it meaningful to derive a scale for these particular five items?

Without theoretical perspectives to guide interpretation, results of multilevel analyses can appear sometimes to be paradoxical, while even the most sophisticated conceptualisation of a research investigation may be constrained or adversely affected by measurement problems. Appeal to both substantive and methodological criteria in analysing educational data and in understanding what properties of what constructs are being investigated is therefore essential.

To summarise, it is important in the analysis of educational data that appropriate recognition is given to the unit of observation, unit of analysis and research context. Serious problems can arise when: (a) data are collected or analysed for the wrong unit or context; (b) the unit and context are correct, but the interpretations are made for a different unit or context; and (c) units and/or contexts are mixed. To illustrate how important it is to resolve these problems, data from a study of organisational structure and organisational climate in Queensland primary schools (Tainton, 1979) will be analysed at two different levels - school and individual teacher.

Relationships Between Organisational Structure and Organisational Climate

For the above study of Queensland primary school environments, perceptions of four dimensions of organisational structure as identified by Mackay (1964) - Impersonality, Centralization, Organizational Constraint and Standardization - were obtained from 670 teachers in 45 schools. Thus the teacher was the unit of observation. Teachers also provided responses to an adapted version of Halpin and Croft's (1963) Organizational Cimate Description Questionnaire on four dimensions of teacher behaviour - Esprit, Hindrance, Intimacy and Disengagement - and four dimensions of principal behaviour - Professional Leadership, Production Emphasis, Supportiveness and Aloofness.

To pursue the relationships between structure and climate, teacher responses on each variable may be aggregated for each school, mean scores for each variable computed for each of the 45 schools, and the school scores intercorrelated (the between analysis). The resultant correlation matrix is shown in Table 1.

Let us now consider a different analysis. Suppose that instead of generating school mean scores, the 670 teacher responses on each variable were taken individually, regardless of school membership, and intercorrelated (the total analysis). These correlations are also displayed in Table 1 (in parentheses).

Table 1: Intercorrelations Among Structure and Climate Dimensions for Different Units of Analyses*



123456789 101112
1.Impersonality-










2.Centralization.05
(.01)
-









3.Organizational
Constraint
.01
(-.02)
.21
(.00)
-








4.Standardization.29
(.01)
.04
(-.00)
-.19
(-.00)
-






5.Esprit-.77
(.59)
-.05
(.01)
-.22
(.20)
-.20
(.01)
-






6.Hindrance -.14
(-.10)
.32
(-.27)
.14
(-.20)
-.29
(.08)
.13
(.00)
-





7.Intimacy.19
(.04)
-.24
(.17)
-.21
(.04)
-.11
(.05)
-.32
(-.00)
.02
(.01)
-




8.Disengagement-.34
(-.10)
-.00
(-.07)
.08
(.01)
.10
(-.02)
-.37
(-.01)
.16
(.01)
.06
(.00)
-



9.Professional
Leadership
-.66
(-.46)
-26
(-.18)
-.49
(.30)
.15
(-.05)
.68
(.50)
-.06
(-.24)
.02
(.04)
-.34
(-.07)
-


10.Production
Emphasis
-.43
(.34)
-.36
(-.33)
.27
(-.12)
-.11
(-.02)
.22
(.18)
.26
(.15)
-.09
(-.01)
.03
(.07)
-.02
(-.01)
-

11.Supportiveness -.51
(.14)
-.18
(.14)
.20
(.05)
-.10
(-.01)
.54
(.30)
.04
(-.04)
-.23
(.13)
-.17(-.02).25
(-.00)
.22
(-.01)
-
12.Aloofness0.3
(-.31)
.47
(-.31)
-.00
(-.06)
-.14
(.02)
.05
(-.04)
.21
(.27)
-.04
(-.00)
.21
(.15)
-.02
(.00)
-.05
(.00)
-.03
(-.00)
-
Correlations in parentheses are based on the individual teacher as the unit of analysis (N=670).
Other correlations are based on the school as the unit of analysis (N=45).

Has the unit of analysis made any difference in the nature of the bivariate relationships between the structure and climate variables? Examination of the two correlation matrices reveals: (a) a tendency for the between correlations to be higher than the total correlations, which is consistent with the research literature identified by Cronbach (1976), Knapp (1977) and Sirotnik (1980); and (b) some of the correlations change in sign when the unit of analysis changes. The more 'interesting' pairs of correlations have been highlighted in the table.

A more parsimonious representation of the relationships between organisational structure and organisational climate may be investigated in multivariate terms by undertaking separate canonical correlation analyses on the two matrices. Canonical correlation analysis may be used to examine relationships between two sets of variables by determining linear combinations of the variables in the first set which are maximally correlated with linear combinations of variables in the second set.

The resultant canonical analyses are shown in Table 2, and as before, the statistics for the total analysis are shown in parentheses. For both levels of analysis, there were three significant canonical correlations, indicating that the relationship between structure and climate is complex. As could have been expected from an inspection of the correlation matrix, the strength of the relationships at the school level was greater than that for the individual teacher level.

Table 2: Canonical Correlation Analyses*

RootsCanonical
Correlation
Chi-SquaredfP
1..92 (.82)138.8 (1005.6)32.000 (.000)
2..81 (.55)68.1 (272.9)21.000 (.000)
3..61 (.20)27.8 (34.7)12.005 (.001)
4..49 (.10)10.4 (7.6)507 (.18)
Statistics in parentheses are based on N=670 teachers.
Other statistics are based on N=45 schools.

In canonical correlation analysis, the nature of the relationships between two sets of variables is explored by examining the canonical variates, the linear composites of the variables in each set which are maximally correlated. Interpretation of these variates, or 'factors', may be based on the size and sign of the coefficients (the correlation of each variable with its 'factor'). Table 3 presents the three ways in which organisational structure and organisational climate are related, and as previously, the statistics for the total analysis are given in parentheses. Conventionally, coefficients with absolute values of less than .30 are considered to make little contribution to the description of a variate. As before, 'interesting' pairs of coefficients have been highlighted.

Table 3: Coefficients for Canonical Variates*

Organisational Structure
Dimensions
Coefficients for Organisational Structure Variates
IIIIII
Impersonality.88 (.85)-.42 (-.41)-.07 (.32)
Centralization.30 (.28).83 (.88).07 (.39)
Organizational Constraint.38 (.33).28 (.29).77 (-.87)
Standardization-.12 (.04)-.39 (-.03).41 (.21)
Organisational Climate
Dimensions
Coefficients for Organisational Climate Variates
IIIIII
Esprit-.78 (.76).36 (-.31)-.20 (-.39)
Hindrance.08 (-.31).52 (-.45)-.11 (-.48)
Intimacy.12 (.12)-.31 (.26)-.47 (-.45)
Disengagement.33 (-.12)-.19 (-.04).08 (.11)
Professional Leadership-.93 (.73)-.06 (.09)-.23 (.02)
Production Emphasis-.24 (.20).63 (-.82).41 (.33)
Supportiveness-47 (.23).09 (.15).46 (.08)
Aloofness.19 (-.48)-.54 (-.30)-.27 (-.26)
Statistics in parentheses are based on N=670 teachers.
Other statistics are based on N=45 schools.

For the individual teacher analysis, the climate 'factors' in particular are either heavily loaded by many variables with coefficients opposite in sign to the respective school level coefficients, or they have non-significant coefficients. In fact, some 14 of the 24 pairs of coefficients for the climate 'factors' are sufficiently different across the two levels of analysis to result in conflicting interpretations of how structure and climate are related.

Since the two interpretations of the relationships between structure and climate cannot be reconciled, which one should prevail? From the exposition of the unit of analysis 'problem' in the previous section, the seeming paradox cannot be resolved solely on statistical criteria, and therefore some of the theoretical perspectives underlying the research on structure and climate have to be considered.

Following Sirotnik's (1980) classification, what is the nature of the structure and climate constructs? Whether they are 'systemic' and refer to some attributes of the organisation, or are individuals' psychological reactions to their personal environment, has been the subject of considerable and continual debate (see Guion, 1973; House & Rizzo, 1972; Howe & Gavin,1974; Jones & James, 1976; Mansfield & Payne, 1977; Schneider & Bartlett,1968). On balance, there would appear to be support for the view that organisational climate and psychological climate are separate concepts, and that their inappropriate measurement has contributed to conceptual confusion. In taking this position, aggregation of psychological climate scores to the organisational level would not be an indicator of organisational climate.

Inspection of the items measuring the structure and climate dimensions in the study of Queensland schools would reveal a reference to descriptions of organisational situations, events and practices, rather than to individual teacher reactions to, or feelings about, their psychological environment. The items therefore tend to measure 'systemic' features of the school and an estimate of these organisational attributes is the mean of the teachers' responses for each school. From this perspective, the between analysis would be the more appropriate.

However, there is a further debate in the literature regarding the appropriateness of aggregating individual perceptions of organisational constructs to provide a group measure (see Anderson & Walberg, 1974). How to interpret, and account for, variability in individual perceptions of the same organisation has implications for the selection of an appropriate analytical model. The issue basically amounts to whether the question of interest is the differences between groups based on single indicators of a systemic concept, or whether variability in perceptions is considered as item unreliability or as an indication of the construct not being valid for the particular situation. In the present study, alpha coefficients across schools were fairly similar, indicating that a between analysis was reasonable.

Clearly, not all the issues can be explored here. If, in this empirical study, the canonical analysis of the variables measured at the school level is considered to be the more 'proper' solution given the nature of the measured constructs, then conclusions drawn from the data analysis at the individual teacher level would have to be regarded as highly questionable.

Theoretical and Empirical Implications

The potential paradox that psychometric treatments of data generated at one level but aggregated to another level can yield different interpretations, has important implications for educational theory and practice of educational research. Resolution of interpretative dilemmas, as the present study demonstrates, rests on both substantive and methodological criteria. Therefore, it is imperative that researchers ask the right questions at the appropriate level, and come to terms with what properties of what variables are being measured, Further, it is equally important that a research framework for dealing with theoretical issues is proposed and relationships between the units of analysis, units of observation and research context are considered before the 'appropriate' analysis model is determined and results interpreted.

There is mounting evidence, supported by these Queensland data, of the inappropriateness of investigating organisational constructs at the total level of analysis. Much of the previous research on structure and climate in educational and other institutions has produced unexpected weak or inconclusive relationships (see Tainton (1979) for a review). Since examples of data analyses conducted at other than the total level are difficult to locate in such studies, one wonders how many of these indeterminate results could have been attributed to methodological artefacts. Perspectives on the relationship between structure and climate have roots in theories of organisational behaviour and systems theory, and the strong associations found in this study from the school level analyses are consistent with much of the literature relating to professionals working in bureaucratic environments.

In view of the increasing need to generate reliable and valid measures of organisational processes and outcomes, consideration of the implications of the unit of analysis issues for instrument development should also be of critical concern to researchers. Exploration of the dimensionality of a set of organisational variables, or of the construct validity of indicators of organisational attributes, obviously requires appropriate levels of analyses. Yet rarely in the organisational research literature has the instrument development phase proceeded with data from a sufficient number of institutions to permit something besides a correlation matrix being computed across individual respondents (see Sirotnik, 1980).

In fairness to educational researchers, there are often pragmatic reasons for developing instrumentation or analysing data from a small sample of contexts. However, as information on the nature of multi-level educational data has gradually built up, researchers should now begin to consider more seriously the consequences of their conceptualisation, operationalisation and data analysis decisions, and improve the quality of both educational inquiry and indicators of system performance.

References

Anderson, G.J. & Walberg, H.J. (1974) 'Learning environments' in Evaluating Educational Performance, ed H.J. Walberg, Berkeley, California: McCutchan.

Burstein, L. (1980) 'The analysis of multilevel data in educational research and evaluation', in Review of Research in Education, vol. 8, ed D.C. Berliner, American Educational Research Association.

Cooley, W.W., Bond, L. & Mao, B. (1981) 'Analyzing multilevel data', in Educational Evaluation Methodology: The State of the Art, ed R.A. Berk, Baltimore: Johns Hopkins University Press.

Cronbach, L.J. (1976). Research on Classrooms and Schools: Formulation of Questions, Design, and Analysis, Occasional paper, Stanford: Stanford Evaluation Consortium, Stanford University.

Guion, R.M. (1973) 'A note on organizational climate', Organizational Behavior and Human Performance, 9, 120-125.

Halpin, A.W. & Croft, D.B. (1963) The Organizational Climate of Schools, Chicago: Midwest Administration Center, University of Chicago.

House, R.J. & Rizzo, J.R. (1972) 'Towards the measurement of organizational practices: Scale development and validation', Journal of Applied Psychology, 56, 388-396. L

Howe, J.G. & Gavin, J.F. (1974) Organizational Climate: A Review and Delineation, Technical Report No. 7402, Fort Worth, Texas: Texas Christian University, Institute of Behavior Research.

Jones, A.D. & James, L.R. (1976) Psychological and Organizational Climate: Dimensions and Relationship, Technical Report No.76-4, Fort Worth, Texas: Texas Christian University, Institute of Behavior Research.

Knapp, T.R. (1977) 'The unit-of-analysis problem in applications of simple correlation analysis to educational research', Journal of Educational Statistics, 2(3), 171-186.

Knapp, T.R. (1982) 'The unit and context of the analysis for research in educational administration', Educational Administration Quarterly, 18(1), 1-13.

Langbein, L.I. & Lichtman, A.J. (1978) Ecological Inference, Beverley Hills, California: Sage Publications.

Mackay, D.A. (1964) An Empirical Study of Bureaucratic Dimensions and Their Relation to Other Characteristics of School Organizations, unpublished doctoral dissertation, University of Alberta.

Mansfield, R. & Payne, R.L. (1977) 'Correlates of variance in perceptions of organizational climate, in Organizational Behavior in its Context, eds D.S. Pugh & R.L. Payne, Westmead, England: Saxon House, Teakfield Limited.

Miskel, C. & Sandlin, T. (1981) 'Survey research in educational administration, Educational Administration Quarterly, 17, 1-20.

Pearson, K. (1896) 'Mathematical contributions to the theory of evolution III regression, heredity and panmixia, Philosophical Transactions of the Royal Society, 187, 253-318.

Robinson, W.S. (1950) 'Ecological correlations and the behavior of individuals', American Sociological Review, 15, 351-357.

Schneider, B. & Bartlett, C.J. (1968) 'Individual differences and organizational climate: I the research plan and questionnaire development', Personnel Psychology, 21, 323-333.

Sirotnik, K.A. (1980) 'Psychometric implications of the unit-of-analysis problem (with examples from the measurement of organizational climate), Journal of Educational Measurement, 17(4), 245-282.

Sirotnik, K.A., Nides, M.A. & Engstrom, G.A. (1980) 'Some methodological issues in developing measures of classroom learning environment: A report of work-in-progress', Studies in Educational Evaluation, 6, 279-289.

Tainton, B.E. ( 1979) Educational Environment in Queensland Primary Schools: Dimensions, and Relationships with Selected Contextual and Organizational Characteristics of Schools, unpublished master's thesis, University of Queensland.

Please cite as: Tainton, B. E. (1990). The unit of analysis "problem" in educational research. Queensland Researcher, 6(1), 4-19. http://www.iier.org.au/qjer/qr6/tainton.html


[ Contents Vol 6, 1990 ] [ QJER Home ]
Created 28 Aug 2006. Last revision: 28 Aug 2006.
URL: http://www.iier.org.au/qjer/qr6/tainton.html