John P Keeves
This paper applies multilevel analytic procedures to data from the 1997 Course Experience Questionnaire and Selected Higher Education Finance Statistics 1997 to demonstrate a link between expenditure decisions taken within the 40 DETYA funded universities and graduates' perceptions of the quality of teaching in, and overall satisfaction with, their recently completed courses. There is little between university variation to explain but variation in expenditure accounts for approximately 50 per cent of it. The additional student services financed by within university budget decisions can be linked to more favourable Good Teaching Scale and Overall Satisfaction Index opinions.
In 1997 the total operating expenses for the 40 DETYA funded universities in Australia was $7.7 billion (DETYA, 1998). By any standards the tertiary education sector is big business, and the Federal Government is rightly concerned with evaluation studies of how the universities spend the very substantial sums of money that they receive in the form of direct grants from government sources. Furthermore, since such monies are combined with the payments made directly by students, the income received from endowments and investments, and the tax-free donations provided by individuals and organisations, there is concern for the major items of expenditure incurred by universities.
Of particular interest in the late 1990s is the provision of services to assist students in their university life. The present government is now challenging the compulsory nature of the student union fee, and some universities seem reluctant to fund many student services from the monies they collect directly from students.
The tertiary institutions are already providing some services directly to students out of their overall budgets. There is, however, no routine evaluation of the general effectiveness of the partitioning of the overall budget into different line items, although, for the purposes of international advertising, some of the more entrepreneurial institutions are now seeking Standard and Poor's credit ratings of their operating performance. Furthermore, both DETYA and the Australian Vice-Chancellor's Committee (AVCC) support the regular evaluation of the effectiveness of the teaching services through the analysis of national data collected by the Graduate Destination Survey and Course Experience Questionnaire (CEQ).
On an annual basis, since 1993, recent graduates are invited by their university and the Graduate Careers Council of Australia (GCCA) to answer a questionnaire summarising their recently completed course experiences. The results are reported in newspapers and student guides, and used in institutional publicity with strong claims being made by those universities that are apparently successful, and implied reprimands for institutions that are apparently not successful. The weaknesses of the current version of the questionnaire are recognised and development of the instrument is an ongoing process. Nevertheless, the CEQ is the only national survey of recent graduates' perceptions of their course experiences. It is a source of valuable information.
The purpose of this article is to explore the possibility of using this instrument to evaluate the effect of variation in budget expenditures on graduates' CEQ opinions. University graduates are not the only client group in a university. However, they are a major group and their views regarding the conduct of university affairs must be given some credence.
It is clearly not possible to set up an experimental study to examine the effects of variation in university budgetary policies on graduates' views. However, the number of tertiary institutions in Australia is sufficient to use the natural variation that exists in the allocation of monies to particular budget lines across universities to explain the variation both between and within institutions on CEQ respondents' perceptions. While strict causality linking budgetary policies and graduate opinions cannot be claimed, the undertaking of regression analyses in order to account for variation necessarily implies the construction of a model in which causal influences are hypothesised. The analysis tests whether the model hypothesised is consistent with the observed data, and leads to the estimation of the parameters of the model. Thus, the finding of significant relationships would provide evaluative evidence of the effects of different financial policies.
Very little work has been done in Australia using data collected through the employment of view scales as measures of specific criterion variables or educational outcomes. Consequently, the procedures employed in the analyses presented in this paper may be considered by some to be unusual. The meaningful nature and strength of the resultant findings should demonstrate the suitability of the approach employed. Nevertheless, despite having a history going back more than 60 years, the reasons why this application of view scales has not attracted wider attention are discussed in the following section.
View scales have been used in numerous other investigations; (e.g., the First IEA Mathematics Study (Wolf, 1967; Keeves, 1968), the First and Second IEA Science Studies (Comber and Keeves, 1973; Keeves, 1992), the Harvard Project Physics Study (Anderson and Walberg, 1974), and the series of classroom environmental measures developed by Trickett and Moos (1973), and Fraser (1993)). However, these applications of view scales all encountered serious problems in data analysis because multilevel procedures were not available. It is only since the development of multilevel methods that the appropriate use of view scales at the between- and within-institution levels is starting to emerge. There are two pre-requisites for the use of view scales. First, there is a need to ground the development of view scales on established theory as was done by Pace and Stern (1958). Second, there is a need for organisational researchers to recognise that the responses of members of organisations to such instruments can be used to investigate organisational characteristics.
The items employed in the CEQ are descriptive or view scale items; they are not attitudinal in nature. Nevertheless, Ramsden (1998) has pointed out that quality teaching creates the conditions under which students develop effective learning strategies, and this leads to higher levels of student satisfaction, the characteristic measured by the CEQ. The version of the instrument used in this study comprised 24 items. Item 16 did not cluster unambiguously and was dropped from the analyses. The final item assessed respondents' overall levels of satisfaction with their recently completed courses. The remaining items formed five dimensions or scales.
In the social sciences the use of SRS (simple random sample) formulas on data from complex samples is now the most frequent source of gross mistakes in the construction of confidence statements and tests of hypotheses. (Kish, 1957: 157)Computation of standard errors would not be a problem if the CEQ data were collected in accordance with a known sampling design and without losses. However, they are not. The dilemma is that we have neither the entire population nor a genuine sample. Nevertheless, from a national perspective, and within universities, there is growing interest in comparisons of CEQ data. Decisions regarding the statistical significance of differences are required, and it is essential that the methods used to estimate standard errors are soundly based.
CEQ analysts who believe that there is an underlying structure to the data (e.g. students grouped in courses, courses grouped in institutions) will, depending on their purposes, employ programs like WesVarPC, Stata, or multilevel software such as HLM or MLwiN to estimate the statistics of interest and compute appropriate standard errors. Applications of the simple random sample formulae, employed in software such as SPSS, SAS or Excel, to grouped data underestimate standard errors and confidence intervals. As a consequence, it may be concluded that significant differences exist where there are none.
The design effect (deff) is an indicator of the influence of the sample design on error estimation. It is defined as the ratio of the variance estimate of a complex sample to the variance estimate of a simple random sample of the same size. The value of the square root of deff (usually called deft) is the factor by which sampling errors calculated using textbook simple random sample formulae must be multiplied in order to obtain estimates that reflect the clustering effect of students in courses or universities. Table 1 records the intraclass correlation coefficients (?) and deffs for the CEQ scales at the course and institution levels. Data from the 1997 questionnaire are used and, irrespective of whether a finite population correction (fpc) is applied or not, the magnitudes of the deffs highlight the need for the more appropriate error estimates computed by multilevel software.
The estimates of the intraclass correlations recorded in Table 1 at the institution and course levels provide evidence of limited variability between institutions, and more substantial variability between courses. In addition, the substantial design effects merit serious consideration by those who seek to demonstrate CEQ-related differences between institutions or courses because they have a marked effect on certain aspects of significance testing.
|Institutions (N = 41)|
|Courses (N = 168)|
fpc = finite population correction
The intraclass correlations, although small, are of sufficient magnitude to warrant an examination of differences between institutions. The present investigation seeks to identify institutional factors that account for the variability in CEQ scores between universities. Subsequent analyses might well be undertaken to identify general factors that would account for variability between courses.
If the aim of any CEQ analysis is, simply, to describe respondents' opinions using measures of central tendency, measures of dispersion and respondent numbers then the potential for bias that exists when response rates are low is not an issue. On the other hand, if it is intended that statistics generated from the survey are used to draw inferences about the quality of teaching within and between courses and institutions then more rigorous collection and analytical procedures are required. If, for example, subsequent funding were to be based, in part, on the results of CEQ analyses it would be essential to replace the current collection of voluntary responses with an appropriate sampling design.
At present it is understood that a score of 100 representing strongly agree is more than a score of 50 representing agree, and that the score of 50 is more than the score of zero representing ambivalence, apathy, inapplicability or uncertainty. However, it is not known whether the interval between strongly agree and agree is the same as the interval between agree and uncertain. IRT techniques convert ordinal responses to interval measures of respondents' opinions. Once this has been accomplished the magnitudes of the differences between CEQ scores can be specified more precisely on linear and interval scales.
Initial investigations of CEQ data from the 1996 survey using the computer program Conquest (Wu, Adams & Wilson, 1997) were directed towards assessing the usefulness of two derivatives of the Rasch model (Andrich, 1978; Masters, 1982). The intention was to identify the most appropriate model to use with the data in order to minimise the level of misclassification and reduce measurement error. As well, one of the outcomes of this approach is the generation of continuous interval measures of respondent opinion that would enable statistically acceptable comparisons of the opinions of respondents from different fields of study and different institutions, if comparisons of this nature were considered appropriate.
The two item response models that may be appropriate for use with the CEQ data are: (i) a Rating Scale model (Andrich, 1978), and (ii) a Partial Credit model (Masters, 1982). A characteristic of the Rating Scale model is that the increment in the underlying trait required to choose between one score category and the next (e.g. from agree to strongly agree) is the same for all items. However, this increment may vary between the different score categories. In the Partial Credit model the increment in the underlying trait required to choose between one score category and the next can vary between both items and categories.
Johnson and Congdon (1997) analysed the opinions of the 55,570 bachelor degree respondents to the 1996 CEQ who answered all items. They reached three conclusions. First, the computer program Conquest readily enables the ordinal scale CEQ scores to be converted to interval scale measures that are more appropriate for subsequent parametric analyses. Second, the IRT approach confirms the findings of classical factor analyses that the CEQ data are best considered as five scales rather than a single measure. Third, a Partial Credit model fits the CEQ data better than does a Rating Scale model.
Finally, if the traditional ordinal scale estimates of the Good Teaching Scale (GTS) means are compared with interval scale estimates derived from a Partial Credit model, the results suggest that, while it may be more appropriate to base statistical analyses on interval scale measures, the assumption of linearity does not seriously distort respondent opinion at the institutional level.
The 10 DETYA defined broad fields of study are agriculture, architecture, business studies, education, engineering, health, humanities and social sciences, law, science and veterinary science. The courses of study were classified in terms of these broad fields rather than the possible maximum of 188 specific fields of study for two main reasons: (i) simplicity of analysis and presentation of results, and (ii) at the specific field level of aggregation, data were not available from some institutions.
In order to use categorical data, such as fields of study, in regression analyses either: (i) dummy coding, (ii) effect coding, or (iii) orthogonal coding is employed. Dummy coding is the simplest of the three methods and Pedhazur (1982: 274-329) demonstrates that the end results of multiple regression analyses of the same data coded by each of these techniques are identical. Consequently, in the analyses that follow, dummy variables are used to control statistically for the known CEQ opinion differences between respondents from the humanities, and engineering or architecture respondents in particular. For a fully specified model N-1 dummy variables can be included in a model where N is the maximum number of categories (10 in this example). Nine of the 10 broad fields are included. The humanities and social sciences category is excluded. Thus, the coefficients in this section of the model indicate opinion differences between the various broad fields and the excluded category, humanities and the social sciences.
The initial analyses focus on a two-level model. The nine dummy course variables and the five student background variables that follow are grouped at Level 1:
In the analysis of the first model reported in Table 2a Good Teaching Scale score is the dependent variable. At the student-level, the estimated coefficients for sex, age, and permanent residence are all positive indicating that older females whose permanent residence is overseas tend to hold more favourable Good Teaching Scale views. The coefficient for mode of attendance is negative indicating that part-time or external graduates express more negative views than do their colleagues who attend on a full-time basis. Similarly, the coefficient for non-English speaking background (NESB) is slightly negative indicating that NESB students tend to hold slightly less negative Good Teaching Scale opinions than their colleagues from an English speaking background (NESB = 1, ESB = 2).
The levels of significance for the various broad fields are not presented because, in this two level model, the data have been disaggregated from the group to the individual level and the estimates of error have been calculated with an inappropriate number of cases. Nevertheless, it should be noted that the coefficients are all negative indicating that the Good Teaching Scale opinions of students from the nine broad fields included in the analysis tend to be more negative than the corresponding opinions of students from the humanities and social science broad field.
At the institution level population size is a factor that influences CEQ opinions. The differences are statistically significant for three of the five scales and the Overall Satisfaction Index. The actual values of the size coefficients are of the order of ±0.0002 depending on the outcome variable. But, these coefficients must be multiplied by the institutional populations to gain an appreciation of the magnitude of the size effect.
|Good Teaching Scale
||Clear Goals & Standards Scale
|Buildings & grounds||-0.07||0.24||-0.30||0.770||0.60||0.22||2.71||0.011||**|
** = significant at 0.05 level
* = significant at 0.10 level
Table 2b: The influence of student and institutional factors on the AWS and AAS mean scores
|Buildings & grounds||0.11||0.29||0.37||0.711||0.36||0.25||1.45||0.157|
** = significant at 0.05 level
* = significant at 0.10 level
Populations were in the range 31-39,742 so that the effect of university size, or factors associated with size, may be associated with differences of up to eight points on the CEQ scale. With the exception of the Appropriate Workload Scale, these analyses suggest that, at the institution level, bigger is not necessarily better when it comes to CEQ views.
Finally, after controlling for the Level 1 predictors respondent age, sex, mode of attendance, NESB, permanent residence and field of study there is evidence that some CEQ opinions are more favourable in institutions that allocate a higher proportion of their budgets to student services. The relationship is positive and statistically significant for the Good Teaching Scale, Clear Goals and Standards Scale, Generic Skills Scale and the Overall Satisfaction Index. On the other hand, the relationship is negative for the Appropriate Assessment Scale.
||Overall Satisfaction Index
|Buildings & grounds||0.07||0.16||0.42||0.678||0.02||0.23||0.07||0.949|
** = significant at 0.05 level
* = significant at 0.10 level
Other factors being equal it is estimated that a one percentage point increase in expenditure on student services is likely to be associated with a 0.51 point increase in the university mean Good Teaching Scale score and a 0.92 point increase in the mean Overall Satisfaction Index score. The influence of this predictor on student opinion may seem rather modest at first glance. However, the minimum and maximum proportions of total expenditure allocated to student services varied between one and 18 percentage points. Other factors remaining equal it can be appreciated that differences in institutional means of up to 8-9 points on the Good Teaching Scale and 15-16 points on the Overall Satisfaction Index may be linked to expenditure decisions taken within individual universities.
Moreover, it can be seen from the null and final model variance estimates in Table 3 that university population size and the expenditure variables in the final model account for 14-67 per cent of the reduction in unexplained variance at Level 2 depending on the outcome variable.
|Variation between university means (?00 null model)||38.33||12.87||20.5||32.82||8.93||27.88|
|Variation between university means (?00 final model)||12.77||7.55||9.93||12.3||7.69||9.69|
|Variance explained by Level 2 variables (%)||67||41||52||63||14||65|
In the case of the Overall Satisfaction Index, for example, the reduction in unexplained variance at Level 2 is given by:
100 x (27.9 - 9.7)/27.9 = 65%
Clearly, there is not much between university variation to explain but variation in expenditure accounts for much of it. The additional student services financed by within university budget decisions can be linked to more favourable GTS and OSI opinions.
These results might have been hypothesised in advance. A high level of support for the library relates to a high degree of satisfaction on the Good Teaching Scale. Stronger support for academic salaries and research is related to higher levels of general satisfaction on the Overall Satisfaction Index. University administrators should be mindful of these findings. The results are clear and the implications obvious. Any reduction in support for student services is likely to lead to a drop in student satisfaction. In turn, it could be hypothesised that this would contribute to a decline in enrolments.
However, most analyses to date have failed to account for the nested nature of the CEQ data. Multilevel procedures have not been employed and, as a consequence, inappropriate tests of significance and biased estimates of effects are computed. This article has used a multilevel approach to examine the effect of characteristics of institutions on students' views as measured by the CEQ. While it would not be sound to draw immoderate conclusions from the findings summarised here, evidence is presented of what would appear to be significant relationships between the ways in which institutions spend their monies and the views of their graduates. It would appear that both the size of the institution and the allocation of resources to student services influence graduates' views, particularly with respect to the Good Teaching Scale and the Overall Satisfaction Index. The relationships would appear both meaningful and soundly based.
Anderson, G.J. & Walberg, H.J. (1974). Assessing Classroom Learning Environment. In K. Marjoribanks (ed.) Environments for Learning. NFER, Windsor.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.
Bryk, A.S. & Raudenbush, S.W. (1992). Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, Beverly Hills, California.
Bryk, A.S., Raudenbush, S.W. & Congdon, R. (1996). HLM: Hierarchical Linear and Non Linear Modeling with HLM/2L and HLM/3L Programs. Scientific Software, Chicago, Illinois.
Comber, L.C. & Keeves, J.P. (1973). Science Education in Nineteen Countries: An Empirical Study. Almqvist and Wiksell, Stockholm.
DETYA (1998). Selected Higher Education Finance Statistics 1997. AGPS, Canberra.
Entwistle, N.J. & Ramsden, P. (1983). Understanding Student Learning. Croom Helm, London.
Fraser, B.J. (1993). Context: classroom and school climate. In D. Gabel (ed.) Handbook of Research in Science Teaching and Learning. Macmillan, New York.
Johnson, T.G. & Congdon, P. (1997). The Course Experience Questionnaire Presentation (Draft Report). Camberwell, ACER
Keeves, J.P. (1967). Students' attitudes concerning mathematics. Unpublished M.Ed. Thesis, University of Melbourne.
Keeves, J.P. (ed.) (1992). The IEA Study of Science III: Changes in Science Education and Achievement: 1970-1984. Pergamon Press, Oxford.
Kish, L. (1957). Confidence intervals for clustered samples. American Sociological Review, 22, 154-165.
Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Pace, C.R. & Stern, G.C. (1958). An approach to the measurement of psychological characteristics of college environments. Journal of Educational Psychology, 49, 269-277.
Ramsden, P. (1991). Report on the Course Experience Questionnaire trial. In: R.Linke Performance Indicators in Higher Education, Vol 2. AGPS, Canberra.
Ramsden, P. (1998). The CEQ: Looking back – and forward. Paper presented at the Course Experience Questionnaire Symposium, 1998, University of New South Wales, Sydney, 29-30 November, 1998.
Rasch, G. (1980). Probabilistic models of some intelligence and attainment tests. University of Chicago Press, Chicago. (Originally published by The Danish Institute for Educational Research, Copenhagen, 1960).
Stern, G.C. (1970). People in Context. Wiley, New York.
Trickett, E.J. & Moos, R.H. (1973). Social environments of junior high and high school classrooms. In K. Marjoribanks (ed.) Environments for Learning. NFER, Windsor.
Wolf, R.M. (1967). Construction of descriptive and attitude scales. In T. Husen (ed.) International Study of Achievement in Mathematics. Almqvist and Wiksell, Stockholm.
Wu, M., Adams, R.J. and Wilson, M.R. (1997). ConQuest: Generalised Item Response Modelling Software. Camberwell, ACER.
|Authors: Dr Trevor Johnson is a Research Fellow at the Australian Council for Educational Research, Camberwell, Victoria. His research interests include sampling and educational research methodology.
Dr John Keeves is a Professorial Fellow within the School of Education, the Flinders University of South Australia, Bedford Park. His research interests are wide and varied. From a very strong initial interest in Mathematics and Science Education, he has extended his field of inquiry in these areas from Australia to a cross-national and comparative perspective. As a consequence, he has developed a keen interest in educational research methodology and measurement.
Please cite as: Johnson, T. G. and Keeves, J. P. (2000). Spending on the selling of wisdom. Issues in Educational Research, 10(1), 39-54. http://www.iier.org.au/iier10/johnson.html
© 2000 Issues in Educational Research
Last revision: 4 Sep 2013. URL: http://www.iier.org.au/iier10/johnson.html
HTML : Clare McBeath, Curtin University [firstname.lastname@example.org] and Roger Atkinson [email@example.com]
During the period 14 Mar 2001 to 30 July 2001 the previous URL for this article, http://cleo.murdoch.edu.edu.au/gen/iier/iier10/johnson.html, recorded 154 accesses.