Issues In Educational Research, 10(1), 2000, 39-54.

Spending on the selling of wisdom

Trevor G Johnson
Australian Council for Educational Research

John P Keeves
Flinders University

This paper applies multilevel analytic procedures to data from the 1997 Course Experience Questionnaire and Selected Higher Education Finance Statistics 1997 to demonstrate a link between expenditure decisions taken within the 40 DETYA funded universities and graduates' perceptions of the quality of teaching in, and overall satisfaction with, their recently completed courses. There is little between university variation to explain but variation in expenditure accounts for approximately 50 per cent of it. The additional student services financed by within university budget decisions can be linked to more favourable Good Teaching Scale and Overall Satisfaction Index opinions.

In 1997 the total operating expenses for the 40 DETYA funded universities in Australia was $7.7 billion (DETYA, 1998). By any standards the tertiary education sector is big business, and the Federal Government is rightly concerned with evaluation studies of how the universities spend the very substantial sums of money that they receive in the form of direct grants from government sources. Furthermore, since such monies are combined with the payments made directly by students, the income received from endowments and investments, and the tax-free donations provided by individuals and organisations, there is concern for the major items of expenditure incurred by universities.

Of particular interest in the late 1990s is the provision of services to assist students in their university life. The present government is now challenging the compulsory nature of the student union fee, and some universities seem reluctant to fund many student services from the monies they collect directly from students.

The tertiary institutions are already providing some services directly to students out of their overall budgets. There is, however, no routine evaluation of the general effectiveness of the partitioning of the overall budget into different line items, although, for the purposes of international advertising, some of the more entrepreneurial institutions are now seeking Standard and Poor's credit ratings of their operating performance. Furthermore, both DETYA and the Australian Vice-Chancellor's Committee (AVCC) support the regular evaluation of the effectiveness of the teaching services through the analysis of national data collected by the Graduate Destination Survey and Course Experience Questionnaire (CEQ).

On an annual basis, since 1993, recent graduates are invited by their university and the Graduate Careers Council of Australia (GCCA) to answer a questionnaire summarising their recently completed course experiences. The results are reported in newspapers and student guides, and used in institutional publicity with strong claims being made by those universities that are apparently successful, and implied reprimands for institutions that are apparently not successful. The weaknesses of the current version of the questionnaire are recognised and development of the instrument is an ongoing process. Nevertheless, the CEQ is the only national survey of recent graduates' perceptions of their course experiences. It is a source of valuable information.

The purpose of this article is to explore the possibility of using this instrument to evaluate the effect of variation in budget expenditures on graduates' CEQ opinions. University graduates are not the only client group in a university. However, they are a major group and their views regarding the conduct of university affairs must be given some credence.

It is clearly not possible to set up an experimental study to examine the effects of variation in university budgetary policies on graduates' views. However, the number of tertiary institutions in Australia is sufficient to use the natural variation that exists in the allocation of monies to particular budget lines across universities to explain the variation both between and within institutions on CEQ respondents' perceptions. While strict causality linking budgetary policies and graduate opinions cannot be claimed, the undertaking of regression analyses in order to account for variation necessarily implies the construction of a model in which causal influences are hypothesised. The analysis tests whether the model hypothesised is consistent with the observed data, and leads to the estimation of the parameters of the model. Thus, the finding of significant relationships would provide evaluative evidence of the effects of different financial policies.

Very little work has been done in Australia using data collected through the employment of view scales as measures of specific criterion variables or educational outcomes. Consequently, the procedures employed in the analyses presented in this paper may be considered by some to be unusual. The meaningful nature and strength of the resultant findings should demonstrate the suitability of the approach employed. Nevertheless, despite having a history going back more than 60 years, the reasons why this application of view scales has not attracted wider attention are discussed in the following section.

View scales and their use

The Eight-Year Study into the introduction of democratic processes and the removal of authoritarian control in United States high schools demonstrated that the educational climate of a school could be assessed using information obtained from students by means of descriptive or view scales (Aikin, 1942). Subsequently, Pace and Stern (1958) developed the College Characteristics Index and used it to evaluate colleges in the Bennington study. Later, Stern (1970) reported on the use of the High School Characteristics Index, and an Organizational Climate Index. Stern (1970: 27) recognised that any estimates of the reliability of such instruments would be adversely affected by high degrees of agreement by the student body within an institution. The problem demanded a multilevel analysis of data undertaken at both within group and between group levels. Unfortunately, lack of suitable software and readily accessible computing power meant that analyses of this nature were not possible at that time.

View scales have been used in numerous other investigations; (e.g., the First IEA Mathematics Study (Wolf, 1967; Keeves, 1968), the First and Second IEA Science Studies (Comber and Keeves, 1973; Keeves, 1992), the Harvard Project Physics Study (Anderson and Walberg, 1974), and the series of classroom environmental measures developed by Trickett and Moos (1973), and Fraser (1993)). However, these applications of view scales all encountered serious problems in data analysis because multilevel procedures were not available. It is only since the development of multilevel methods that the appropriate use of view scales at the between- and within-institution levels is starting to emerge. There are two pre-requisites for the use of view scales. First, there is a need to ground the development of view scales on established theory as was done by Pace and Stern (1958). Second, there is a need for organisational researchers to recognise that the responses of members of organisations to such instruments can be used to investigate organisational characteristics.

Origin and development of the CEQ

The CEQ was developed to measure graduate satisfaction with the quality of the teaching and learning experiences encountered while undertaking their recently completed tertiary courses. The first version of the questionnaire was constructed by Ramsden (1991) based on work undertaken at the University of Lancaster in the 1980s into student approaches to learning. The Approaches to Studying Inventory (Entwistle and Ramsden, 1983) sought to measure different forms of motivation, study methods and attitudes. However, it did not start from an analysis of teaching quality or institutional support that would seem necessary for the formal evaluation of courses.

The items employed in the CEQ are descriptive or view scale items; they are not attitudinal in nature. Nevertheless, Ramsden (1998) has pointed out that quality teaching creates the conditions under which students develop effective learning strategies, and this leads to higher levels of student satisfaction, the characteristic measured by the CEQ. The version of the instrument used in this study comprised 24 items. Item 16 did not cluster unambiguously and was dropped from the analyses. The final item assessed respondents' overall levels of satisfaction with their recently completed courses. The remaining items formed five dimensions or scales.

The CEQ has been developed within a research tradition that appears to be unaware of the problems associated with single level analyses of educational data collected under natural conditions. Even though there is a clear link between these levels, it is argued that the questionnaire is designed and employed to assess the effects of courses on individuals, rather than the characteristics of individuals. However, there is ample evidence in some publications and some university publicity that the CEQ data are being used to assess outcomes at the institutional level as well as at the course and student level. This use of the CEQ data does not recognise that student level analyses provide limited information with respect to discrimination between courses and institutions, and that multilevel analyses should be employed. A well designed instrument should also show a high degree of homogeneity of responses within courses and a high degree of heterogeneity of responses between courses and institutions for effective multilevel analysis.

Erroneous errors

There is another important reason why it is preferable to employ a multilevel approach. Consider the problem of estimating standard errors. Kish (1957) discussed the consequences of applying the usual standard error formulas found in textbooks to data obtained from complex samples and concluded that:
In the social sciences the use of SRS (simple random sample) formulas on data from complex samples is now the most frequent source of gross mistakes in the construction of confidence statements and tests of hypotheses. (Kish, 1957: 157)
Computation of standard errors would not be a problem if the CEQ data were collected in accordance with a known sampling design and without losses. However, they are not. The dilemma is that we have neither the entire population nor a genuine sample. Nevertheless, from a national perspective, and within universities, there is growing interest in comparisons of CEQ data. Decisions regarding the statistical significance of differences are required, and it is essential that the methods used to estimate standard errors are soundly based.

CEQ analysts who believe that there is an underlying structure to the data (e.g. students grouped in courses, courses grouped in institutions) will, depending on their purposes, employ programs like WesVarPC, Stata, or multilevel software such as HLM or MLwiN to estimate the statistics of interest and compute appropriate standard errors. Applications of the simple random sample formulae, employed in software such as SPSS, SAS or Excel, to grouped data underestimate standard errors and confidence intervals. As a consequence, it may be concluded that significant differences exist where there are none.

The design effect (deff) is an indicator of the influence of the sample design on error estimation. It is defined as the ratio of the variance estimate of a complex sample to the variance estimate of a simple random sample of the same size. The value of the square root of deff (usually called deft) is the factor by which sampling errors calculated using textbook simple random sample formulae must be multiplied in order to obtain estimates that reflect the clustering effect of students in courses or universities. Table 1 records the intraclass correlation coefficients (?) and deffs for the CEQ scales at the course and institution levels. Data from the 1997 questionnaire are used and, irrespective of whether a finite population correction (fpc) is applied or not, the magnitudes of the deffs highlight the need for the more appropriate error estimates computed by multilevel software.

The estimates of the intraclass correlations recorded in Table 1 at the institution and course levels provide evidence of limited variability between institutions, and more substantial variability between courses. In addition, the substantial design effects merit serious consideration by those who seek to demonstrate CEQ-related differences between institutions or courses because they have a marked effect on certain aspects of significance testing.

Table 1: Intraclass correlations at institutional and course levels

Statistic Good
Clear Goals
& Standards

Institutions (N = 41)
? 0.02 0.01 0.01 0.02 0.01 0.01
deff (fpc=1) 35.7 17.2 26.3 25.2 16.6 25.4
deff (fpc=0.4) 14.3 6.9 10.5 10.1 6.6 10.2
Courses (N = 168)
? 0.08 0.04 0.09 0.14 0.04 0.04
deff (fpc=1) 94.5 34.2 71.2 147.0 39.1 34.5
deff (fpc=0.4) 37.8 13.7 28.5 58.8 15.7 13.8

fpc = finite population correction

The intraclass correlations, although small, are of sufficient magnitude to warrant an examination of differences between institutions. The present investigation seeks to identify institutional factors that account for the variability in CEQ scores between universities. Subsequent analyses might well be undertaken to identify general factors that would account for variability between courses.

Data collection

In 1997 there were a total of 155,137 higher education award course completions. Questionnaires were mailed to 149,768 recent graduates (96.5 per cent of the completions population) and 97,437 (65.1 per cent of those contacted, 62.8 per cent of the target population) responded. The CEQ data are not collected in accordance with the constraints of a sampling design. They are the opinions of recent graduates who have responded voluntarily to their university's invitation to participate in the survey. Tardy graduates generally receive two or three mailed follow-up messages, and some institutions issue final reminders by telephone. However, no large-scale survey ever obtains a complete response from its target population. The AVCC and GCCA consider that a response rate of 70 per cent is desirable and achievable. But, this would require additional resources. Nevertheless, it should not be overlooked that the current CEQ data represent the opinions of more than 60 per cent of the entire 1997 graduate population.

If the aim of any CEQ analysis is, simply, to describe respondents' opinions using measures of central tendency, measures of dispersion and respondent numbers then the potential for bias that exists when response rates are low is not an issue. On the other hand, if it is intended that statistics generated from the survey are used to draw inferences about the quality of teaching within and between courses and institutions then more rigorous collection and analytical procedures are required. If, for example, subsequent funding were to be based, in part, on the results of CEQ analyses it would be essential to replace the current collection of voluntary responses with an appropriate sampling design.

Nature of the data

A second issue for consideration is the nature of the data collected. Currently, the CEQ raw data are collected on a 15 point scale, and reported using a 100+100 point scale to represent opinions ranging from strongly disagree to strongly agree respectively. The rationale underlying the transformation is that positive mean scores reflect general agreement, and negative mean scores reflect general disagreement, with the items forming the five CEQ scales. The scales are ordinal in nature and it is assumed that equal differences in scale scores represent equal increments in opinion change (i.e. the scales are linear). Some critics of the current methods question the validity of this assumption, and argue that item response theory (IRT) techniques should be applied to the CEQ analyses.

At present it is understood that a score of 100 representing strongly agree is more than a score of 50 representing agree, and that the score of 50 is more than the score of zero representing ambivalence, apathy, inapplicability or uncertainty. However, it is not known whether the interval between strongly agree and agree is the same as the interval between agree and uncertain. IRT techniques convert ordinal responses to interval measures of respondents' opinions. Once this has been accomplished the magnitudes of the differences between CEQ scores can be specified more precisely on linear and interval scales.

Initial investigations of CEQ data from the 1996 survey using the computer program Conquest (Wu, Adams & Wilson, 1997) were directed towards assessing the usefulness of two derivatives of the Rasch model (Andrich, 1978; Masters, 1982). The intention was to identify the most appropriate model to use with the data in order to minimise the level of misclassification and reduce measurement error. As well, one of the outcomes of this approach is the generation of continuous interval measures of respondent opinion that would enable statistically acceptable comparisons of the opinions of respondents from different fields of study and different institutions, if comparisons of this nature were considered appropriate.

The two item response models that may be appropriate for use with the CEQ data are: (i) a Rating Scale model (Andrich, 1978), and (ii) a Partial Credit model (Masters, 1982). A characteristic of the Rating Scale model is that the increment in the underlying trait required to choose between one score category and the next (e.g. from agree to strongly agree) is the same for all items. However, this increment may vary between the different score categories. In the Partial Credit model the increment in the underlying trait required to choose between one score category and the next can vary between both items and categories.

Johnson and Congdon (1997) analysed the opinions of the 55,570 bachelor degree respondents to the 1996 CEQ who answered all items. They reached three conclusions. First, the computer program Conquest readily enables the ordinal scale CEQ scores to be converted to interval scale measures that are more appropriate for subsequent parametric analyses. Second, the IRT approach confirms the findings of classical factor analyses that the CEQ data are best considered as five scales rather than a single measure. Third, a Partial Credit model fits the CEQ data better than does a Rating Scale model.

Finally, if the traditional ordinal scale estimates of the Good Teaching Scale (GTS) means are compared with interval scale estimates derived from a Partial Credit model, the results suggest that, while it may be more appropriate to base statistical analyses on interval scale measures, the assumption of linearity does not seriously distort respondent opinion at the institutional level.

The data and the analytic procedures

The analyses reported in this paper were undertaken using the personal computer program HLM 4.01 (Bryk, Raudenbush and Congdon, 1996). Other software packages that could be used include MLwiN 1.02 (Rasbash, Healy, Browne and Cameron, 1998) and Proc Mixed, a SAS procedure. All three are designed to take account of the multilevel nature of the data. That is, the clustering of students within courses and within universities.

The 10 DETYA defined broad fields of study are agriculture, architecture, business studies, education, engineering, health, humanities and social sciences, law, science and veterinary science. The courses of study were classified in terms of these broad fields rather than the possible maximum of 188 specific fields of study for two main reasons: (i) simplicity of analysis and presentation of results, and (ii) at the specific field level of aggregation, data were not available from some institutions.

In order to use categorical data, such as fields of study, in regression analyses either: (i) dummy coding, (ii) effect coding, or (iii) orthogonal coding is employed. Dummy coding is the simplest of the three methods and Pedhazur (1982: 274-329) demonstrates that the end results of multiple regression analyses of the same data coded by each of these techniques are identical. Consequently, in the analyses that follow, dummy variables are used to control statistically for the known CEQ opinion differences between respondents from the humanities, and engineering or architecture respondents in particular. For a fully specified model N-1 dummy variables can be included in a model where N is the maximum number of categories (10 in this example). Nine of the 10 broad fields are included. The humanities and social sciences category is excluded. Thus, the coefficients in this section of the model indicate opinion differences between the various broad fields and the excluded category, humanities and the social sciences.

The initial analyses focus on a two-level model. The nine dummy course variables and the five student background variables that follow are grouped at Level 1:

  1. sex (1 = males, 2 = females)
  2. age (1 = under 25, 2 = 25-34, 3 = 35 and over)
  3. attendance (1 = full-time, 2 = part-time, 3 = external)
  4. non-English speaking background (NESB, 1 = yes, 2 = no)
  5. country of permanent residence (1 = Australia, 2 = overseas).
At Level 2, the eight institution level variables included in the model are:
  1. size (student population: range = 31-39,742)
  2. salaries/research - proportion of budget allocated to academic salaries and research: 31-78%
  3. library - proportion of budget allocated to library: 3-8%
  4. academic support - proportion of budget allocated to academic support: 1-20%
  5. student services - proportion of budget allocated to student services: 1-18%
  6. public services - proportion of budget allocated to public services: 0-8%
  7. buildings and grounds - proportion of budget allocated to buildings and grounds: 3-12%
  8. administration - proportion of budget allocated to administration: 8-30%.
The 59,594 bachelor degree respondents in this analysis were located in 10 broad fields of study within 39 higher education institutions. In view of the underlying structure of the data it may be argued that a three-level model is more appropriate because it is less likely to under-estimate standard errors at Level 2. A three-level analysis with respondent variables at Level 1, dummy variables representing nine of the 10 broad fields at Level 2 (as before, the humanities and social sciences broad field was omitted from the analysis), and institution variables at Level 3 was carried out. A comparison of the standard errors computed by the approaches indicated that the choice of a two- or three-level model is considerably less important than the single-level or multilevel decision. Consequently, to save space, only the results of the two-level analyses are reported in this paper.

The analyses and the results

In order to conserve space an explanation of the model building process is omitted. No attempt is made to construct a parsimonious model for each of the CEQ indicators. For the purposes of comparison all Level 1 and Level 2 variables listed above are included for every scale and the Overall Satisfaction Index. The six analyses were carried out using HLM and the results are summarised in Tables 2a, 2b and 2c. Asterisks denote statistically significant differences.

In the analysis of the first model reported in Table 2a Good Teaching Scale score is the dependent variable. At the student-level, the estimated coefficients for sex, age, and permanent residence are all positive indicating that older females whose permanent residence is overseas tend to hold more favourable Good Teaching Scale views. The coefficient for mode of attendance is negative indicating that part-time or external graduates express more negative views than do their colleagues who attend on a full-time basis. Similarly, the coefficient for non-English speaking background (NESB) is slightly negative indicating that NESB students tend to hold slightly less negative Good Teaching Scale opinions than their colleagues from an English speaking background (NESB = 1, ESB = 2).

The levels of significance for the various broad fields are not presented because, in this two level model, the data have been disaggregated from the group to the individual level and the estimates of error have been calculated with an inappropriate number of cases. Nevertheless, it should be noted that the coefficients are all negative indicating that the Good Teaching Scale opinions of students from the nine broad fields included in the analysis tend to be more negative than the corresponding opinions of students from the humanities and social science broad field.

At the institution level population size is a factor that influences CEQ opinions. The differences are statistically significant for three of the five scales and the Overall Satisfaction Index. The actual values of the size coefficients are of the order of 0.0002 depending on the outcome variable. But, these coefficients must be multiplied by the institutional populations to gain an appreciation of the magnitude of the size effect.

Table 2a: The influence of student and institutional factors on the GTS and CGS mean scores

  Good Teaching Scale
  Clear Goals & Standards Scale
Fixed Effect Coeff SE T-ratio P-value   Coeff SE T-ratio P-value  

Intercept 13.59 0.67 20.14 0.000 ** 20.58 0.51 40.37 0.000 **
Size (population) -0.00 0.00 -5.46 0.000 ** -0.00 0.00 -1.63 0.114  
Salaries/research -0.15 0.11 -1.28 0.211   0.06 0.11 0.56 0.580  
Library 0.73 0.42 1.75 0.091 * -0.16 0.40 -0.39 0.698  
Academic support 0.10 0.17 0.57 0.572   0.07 0.16 0.45 0.656  
Public services -0.93 0.31 -2.99 0.006 ** -0.55 0.30 -1.86 0.073 *
Buildings & grounds -0.07 0.24 -0.30 0.770   0.60 0.22 2.71 0.011 **
Administration 0.06 0.13 0.45 0.656   0.13 0.12 1.08 0.289  
Student services 0.51 0.21 2.37 0.024 ** 0.41 0.20 2.01 0.053 *
Age 3.72 0.39 9.64 0.000 ** 1.01 0.34 3.00 0.006 **
Sex 0.35 0.43 0.81 0.424   0.56 0.39 1.44 0.161  
Attendance -1.82 0.68 -2.68 0.012 ** -1.43 0.55 -2.58 0.015 **
NESB -0.05 0.50 -0.10 0.925   1.26 0.46 2.73 0.011 **
Permanent residence 7.79 0.84 9.31 0.000 ** 0.30 0.80 0.38 0.708  
Agriculture -5.07 1.75       -4.02 2.07      
Architecture -15.36 2.27       -14.08 1.93      
Business Studies -16.83 1.27       -1.94 1.07      
Education -5.82 0.86       -2.08 1.39      
Engineering -22.28 1.13       -8.81 1.00      
Health -15.98 1.50       -10.64 1.41      
Law -15.00 2.81       -5.17 2.41      
Science -7.20 0.96       0.02 0.76      
Veterinary Science -5.19 4.55       13.14 4.38      

** = significant at 0.05 level
* = significant at 0.10 level

Table 2b: The influence of student and institutional factors on the AWS and AAS mean scores

  Appropriate Workload Scale
  Appropriate Assessment Scale
Fixed Effect Coeff SE T-ratio P-value   Coeff SE T-ratio P-value  

Intercept 4.13 0.73 5.66 0.000 ** 28.91 0.62 46.83 0.000 **
Size (population) 0.00 0.00 1.24 0.224   -0.00 0.00 -2.26 0.031 **
Salaries/research -0.22 0.13 -1.61 0.118   0.12 0.11 1.11 0.275  
Library -0.38 0.51 -0.75 0.461   0.26 0.42 0.61 0.545  
Academic support -0.17 0.21 -0.82 0.418   0.30 0.17 1.73 0.094 *
Public services -0.29 0.39 -0.75 0.458   0.16 0.33 0.50 0.623  
Buildings & grounds 0.11 0.29 0.37 0.711   0.36 0.25 1.45 0.157  
Administration -0.23 0.16 -1.43 0.165   -0.14 0.14 -1.04 0.308  
Student services -0.05 0.26 -0.19 0.851   -0.44 0.22 -2.02 0.052 *
Age -2.20 0.36 -6.17 0.000 ** 3.88 0.41 9.51 0.000 **
Sex -3.31 0.45 -7.28 0.000 ** 4.80 0.42 11.31 0.000 **
Attendance 3.86 0.59 6.53 0.000 ** 4.56 0.81 5.66 0.000 **
NESB 6.39 0.71 9.06 0.000 ** 5.64 0.72 7.88 0.000 **
Permanent residence -3.15 0.95 -3.30 0.003 ** 0.74 1.15 0.64 0.528  
Agriculture -7.31 2.01       -12.19 3.69      
Architecture -22.25 2.95       1.21 3.50      
Business Studies -5.19 1.25       -24.46 1.34      
Education -0.81 1.65       -2.56 1.14      
Engineering -26.80 2.17       -17.56 1.45      
Health -16.86 1.88       -19.79 1.95      
Law -16.87 2.52       -12.56 2.73      
Science -9.45 0.96       -19.63 1.41      
Veterinary Science -42.68 6.27       -31.22 5.98      

** = significant at 0.05 level
* = significant at 0.10 level

Populations were in the range 31-39,742 so that the effect of university size, or factors associated with size, may be associated with differences of up to eight points on the CEQ scale. With the exception of the Appropriate Workload Scale, these analyses suggest that, at the institution level, bigger is not necessarily better when it comes to CEQ views.

Finally, after controlling for the Level 1 predictors respondent age, sex, mode of attendance, NESB, permanent residence and field of study there is evidence that some CEQ opinions are more favourable in institutions that allocate a higher proportion of their budgets to student services. The relationship is positive and statistically significant for the Good Teaching Scale, Clear Goals and Standards Scale, Generic Skills Scale and the Overall Satisfaction Index. On the other hand, the relationship is negative for the Appropriate Assessment Scale.

Table 2c: The influence of student and institutional factors on the GSS and OSI mean scores

  Generic Skills Scale
  Overall Satisfaction Index
Fixed Effect Coeff SE T-ratio P-value   Coeff SE T-ratio P-value  

Intercept 34.69 0.44 78.77 0.000 ** 38.11 0.58 65.60 0.000 **
Size (population) -0.00 0.00 -2.13 0.041 ** -0.00 0.00 -3.49 0.002 **
Salaries/research -0.10 0.08 -1.36 0.185   0.23 0.11 2.12 0.043 **
Library 0.37 0.29 1.26 0.217   -0.53 0.41 -1.28 0.209  
Academic support -0.05 0.12 -0.44 0.663   0.24 0.17 1.40 0.172  
Public services 0.11 0.22 0.50 0.619   -0.24 0.32 -0.75 0.457  
Buildings & grounds 0.07 0.16 0.42 0.678   0.02 0.23 0.07 0.949  
Administration -0.03 0.09 -0.28 0.782   0.14 0.13 1.12 0.271  
Student services 0.25 0.15 1.73 0.093 * 0.92 0.22 4.28 0.000 **
Age 0.45 0.33 1.36 0.183   0.89 0.46 1.96 0.059 *
Sex 2.37 0.32 7.50 0.000 ** 1.58 0.57 2.79 0.010 **
Attendance -4.92 0.59 -8.33 0.000 ** 0.82 0.70 1.18 0.249  
NESB 0.84 0.39 2.18 0.038 ** 2.04 0.59 3.44 0.002 **
Permanent residence 1.92 0.80 2.40 0.023 ** 5.13 1.12 4.58 0.000 **
Agriculture 7.92 1.43       2.10 2.36      
Architecture 0.47 1.69       -17.09 3.35      
Business Studies -2.32 1.00       -3.74 1.62      
Education -4.21       -6.67 1.74      
Engineering 3.53 1.13       -6.82 1.66      
Health -4.13 0.92       -10.34 1.91      
Law 1.49 1.36       -3.80 2.87      
Science 0.59 0.74       1.17 0.90      
Veterinary Science 0.62 4.95       17.11 8.45      

** = significant at 0.05 level
* = significant at 0.10 level

Other factors being equal it is estimated that a one percentage point increase in expenditure on student services is likely to be associated with a 0.51 point increase in the university mean Good Teaching Scale score and a 0.92 point increase in the mean Overall Satisfaction Index score. The influence of this predictor on student opinion may seem rather modest at first glance. However, the minimum and maximum proportions of total expenditure allocated to student services varied between one and 18 percentage points. Other factors remaining equal it can be appreciated that differences in institutional means of up to 8-9 points on the Good Teaching Scale and 15-16 points on the Overall Satisfaction Index may be linked to expenditure decisions taken within individual universities.

Moreover, it can be seen from the null and final model variance estimates in Table 3 that university population size and the expenditure variables in the final model account for 14-67 per cent of the reduction in unexplained variance at Level 2 depending on the outcome variable.

Table 3: Variance explained by level 2 predictors


Variation between university means (?00 null model) 38.33 12.87 20.5 32.82 8.93 27.88
Variation between university means (?00 final model) 12.77 7.55 9.93 12.3 7.69 9.69
Variance explained by Level 2 variables (%) 67 41 52 63 14 65

In the case of the Overall Satisfaction Index, for example, the reduction in unexplained variance at Level 2 is given by:

100 x (27.9 - 9.7)/27.9 = 65%

Clearly, there is not much between university variation to explain but variation in expenditure accounts for much of it. The additional student services financed by within university budget decisions can be linked to more favourable GTS and OSI opinions.

These results might have been hypothesised in advance. A high level of support for the library relates to a high degree of satisfaction on the Good Teaching Scale. Stronger support for academic salaries and research is related to higher levels of general satisfaction on the Overall Satisfaction Index. University administrators should be mindful of these findings. The results are clear and the implications obvious. Any reduction in support for student services is likely to lead to a drop in student satisfaction. In turn, it could be hypothesised that this would contribute to a decline in enrolments.


Since 1993, the Course Experience Questionnaire has been used to assess the course experiences of recent graduates in Australian universities. Five scales and an index of overall satisfaction are developed from the 24 items used. Under these circumstances the strength of the scales and the index rests on their ability to measure the views of students as they are clustered together within courses and within institutions. Consequently, it must be emphasised that the scales should exhibit a high degree of homogeneity of student responses within courses and within institutions, and a high degree of heterogeneity of responses between courses and between institutions.

However, most analyses to date have failed to account for the nested nature of the CEQ data. Multilevel procedures have not been employed and, as a consequence, inappropriate tests of significance and biased estimates of effects are computed. This article has used a multilevel approach to examine the effect of characteristics of institutions on students' views as measured by the CEQ. While it would not be sound to draw immoderate conclusions from the findings summarised here, evidence is presented of what would appear to be significant relationships between the ways in which institutions spend their monies and the views of their graduates. It would appear that both the size of the institution and the allocation of resources to student services influence graduates' views, particularly with respect to the Good Teaching Scale and the Overall Satisfaction Index. The relationships would appear both meaningful and soundly based.


Aikin, W.M. (1942). Adventure in American Education, Vol. 1: The Story of the Eight Year Study. Harper, New York.

Anderson, G.J. & Walberg, H.J. (1974). Assessing Classroom Learning Environment. In K. Marjoribanks (ed.) Environments for Learning. NFER, Windsor.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561-573.

Bryk, A.S. & Raudenbush, S.W. (1992). Hierarchical Linear Models: Applications and Data Analysis Methods. Sage, Beverly Hills, California.

Bryk, A.S., Raudenbush, S.W. & Congdon, R. (1996). HLM: Hierarchical Linear and Non Linear Modeling with HLM/2L and HLM/3L Programs. Scientific Software, Chicago, Illinois.

Comber, L.C. & Keeves, J.P. (1973). Science Education in Nineteen Countries: An Empirical Study. Almqvist and Wiksell, Stockholm.

DETYA (1998). Selected Higher Education Finance Statistics 1997. AGPS, Canberra.

Entwistle, N.J. & Ramsden, P. (1983). Understanding Student Learning. Croom Helm, London.

Fraser, B.J. (1993). Context: classroom and school climate. In D. Gabel (ed.) Handbook of Research in Science Teaching and Learning. Macmillan, New York.

Johnson, T.G. & Congdon, P. (1997). The Course Experience Questionnaire Presentation (Draft Report). Camberwell, ACER

Keeves, J.P. (1967). Students' attitudes concerning mathematics. Unpublished M.Ed. Thesis, University of Melbourne.

Keeves, J.P. (ed.) (1992). The IEA Study of Science III: Changes in Science Education and Achievement: 1970-1984. Pergamon Press, Oxford.

Kish, L. (1957). Confidence intervals for clustered samples. American Sociological Review, 22, 154-165.

Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

Pace, C.R. & Stern, G.C. (1958). An approach to the measurement of psychological characteristics of college environments. Journal of Educational Psychology, 49, 269-277.

Ramsden, P. (1991). Report on the Course Experience Questionnaire trial. In: R.Linke Performance Indicators in Higher Education, Vol 2. AGPS, Canberra.

Ramsden, P. (1998). The CEQ: Looking back and forward. Paper presented at the Course Experience Questionnaire Symposium, 1998, University of New South Wales, Sydney, 29-30 November, 1998.

Rasch, G. (1980). Probabilistic models of some intelligence and attainment tests. University of Chicago Press, Chicago. (Originally published by The Danish Institute for Educational Research, Copenhagen, 1960).

Stern, G.C. (1970). People in Context. Wiley, New York.

Trickett, E.J. & Moos, R.H. (1973). Social environments of junior high and high school classrooms. In K. Marjoribanks (ed.) Environments for Learning. NFER, Windsor.

Wolf, R.M. (1967). Construction of descriptive and attitude scales. In T. Husen (ed.) International Study of Achievement in Mathematics. Almqvist and Wiksell, Stockholm.

Wu, M., Adams, R.J. and Wilson, M.R. (1997). ConQuest: Generalised Item Response Modelling Software. Camberwell, ACER.

Authors: Dr Trevor Johnson is a Research Fellow at the Australian Council for Educational Research, Camberwell, Victoria. His research interests include sampling and educational research methodology.

Dr John Keeves is a Professorial Fellow within the School of Education, the Flinders University of South Australia, Bedford Park. His research interests are wide and varied. From a very strong initial interest in Mathematics and Science Education, he has extended his field of inquiry in these areas from Australia to a cross-national and comparative perspective. As a consequence, he has developed a keen interest in educational research methodology and measurement.

Please cite as: Johnson, T. G. and Keeves, J. P. (2000). Spending on the selling of wisdom. Issues in Educational Research, 10(1), 39-54.

[IIER Vol 10, 2000] [IIER Home]

© 2000 Issues in Educational Research
Last revision: 4 Sep 2013. URL:
HTML : Clare McBeath, Curtin University [] and Roger Atkinson []
During the period 14 Mar 2001 to 30 July 2001 the previous URL for this article,, recorded 154 accesses.