QJER: QR 5, 1989: Maxwell - reflections on 'Comments on Tertiary Entrance in Queensland: A Review'

[ Contents Vol 5, 1989 ] [ QJER Home ]

Reflections on "Comments on Tertiary
Entrance in Queensland: A Review"

Graham Maxwell
Assessment and Evaluation Research Unit
Department of Education
The University of Queensland

[Graham Maxwell was a consultant to the Working Party on Tertiary Entrance and continues to act as an honorary consultant to the Board of Senior Secondary School Studies. The views expressed in this paper are personal and should not be taken to constitute official opinions of the Board of Senior Secondary School Studies.]

"Comments on Tertiary Entrance in Queensland: A Review" (McGaw, 1989) represents an overwhelming endorsement of the overall thrust and most of the details of the report Tertiary Entrance in Queensland: A Review (Working Party on Tertiary Entrance, 1987). This endorsement by one of the most respected authorities on tertiary entrance in Australia should not be taken lightly. Tertiary entrance procedures involve matters of great complexity and subtlety. It is clear that the Working Party exercised considerable care and diligence in identifying the key issues, exploring alternative policies and procedures, and arriving at sensible recommendations. It is to be hoped that further deliberations about future arrangements concerning the interface between secondary schools and tertiary institutions will take as their starting point the analyses and recommendations of this report.

This paper offers some reflections on McGaw's comments. For the most part these reflections will reinforce or elaborate those comments. In a few instances I will take issue with McGaw's interpretation and proposed modification of the Working Party's recommendations, though it should be clear that we are substantially in agreement on the overall direction of change and differ on matters of detail. I would prefer that the recommendations be implemented with McGaw's suggested modifications than not be implemented at all.

Curriculum Diversity

McGaw points to the diversity of curriculum offerings in Queensland. As he suggests, there would seem to be an interrelationship of retention rates and curriculum diversity, as Williams (1987) has shown for the ACT. It would seem not to be accidental that the ACT and Queensland have both the highest retention rates and also the broadest curriculum provisions. Queensland's school-based approach to the Senior curriculum and methods of assessment appears to have allowed a more flexibly adaptive response to increased retention and in turn to have encouraged more students to feel that it is worthwhile staying on, that is, that there is provision for their needs. That this breadth of provision works is attested to by the flat distribution of choice of subjects in Queensland, that is, more students take up more of the options. In contrast, in Victoria the distribution of choice is much narrower (80 per cent of students choosing the top 15 subjects) and has not changed in the past 30 years (Ministerial Review of Postcompulsory Schooling, 1985, p. 8).

The diversity of Senior options and choice is actually greater than represented in Table One of the Working Party's report (Working Party on Tertiary Entrance, 1987, p.131). This is so in two ways: on the one hand, the eligibility rules for the TE Score require a minimum of only three subjects to be taken for a full four semesters with the possibility of taking each of the remaining eight units in a variety of subjects; on the other hand, students can go beyond the minimum requirement (of 20 units) by taking subjects other than Board Subjects, that is, Board Registered Subjects (previously, at the time of the report, called Board Registered School Subjects), School Subjects, or TAFE Subjects. The recommendation of the report for as few as three subjects to count for the tertiary entrance profile would allow even greater diversity of choice and was clearly intended to encourage students to take more non-Board subjects. This would have special advantages for the TAFE sector.

It is often suggested that the exclusion of non-Board subjects from the TE Score is unfair. There is nothing in principle to prevent the inclusion of Board Registered, School and TAFE subjects provided that certain criteria are met to ensure that the coherence of the system is maintained. The minimum requirements for inclusion of a subject would seem to be that the subject should involve an underlying component of intellectual skills including skills of analysis, reasoning, reading and writing such as are typical of other subjects included in the scaling, and that the subject should be submitted to the full accreditation and certification requirements of the Board (including monitoring and review procedures). There would, of course, be resource implications in any expansion of subjects beyond the current list of Board subjects and these additional resources would have to be assured.

Trade-off of Average Achievement and Number of Subjects

McGaw recognises the point and importance of calculating the recommended Composite Achievement Indicator in such a way that there is a reasonable compensation between "how well" and "how many" (Working Party on Tertiary Entrance, 1987 Recommendation 39 and Supplementary Paper ix). In fact, Western Australia has recently found, as the Working Party anticipated, that the system described by McGaw, resulting from the recommendations of the Ministerial Working Party on School Certification and Tertiary Admissions Procedures (1984), has distorted the curriculum choices of students and produced certain anomalies (Andrich, 1989). The Queensland Working Party's suggested formula (1987, p.158) might look fearsome but it is clearly important to have some such formula if an option to count as few as three subjects is not to result in a contraction of the curriculum for everyone to just three subjects (or in any case three subjects taken seriously). McGaw endorses the principle of a trade-off formula and appropriately states the principle that the bonus for taking extra subjects must be "not too great" but "not too small". Ultimately, the test of efficacy is not whether the formula is easily understood but whether it produces results that most informed observers consider fair and encourages most students to take more than the minimum number of subjects. Some examples which seemed reasonable to the Working Party are given in the report (Working Party on Tertiary Entrance, 1987, p. 157).

Tertiary Prerequisites

McGaw notes that the most constraining influence on choice of subjects continues to be the prerequisites of some tertiary courses. In an effort to keep their options open, many students take a pattern of subjects which satisfies prerequisites for the most prescriptive tertiary courses. The result is that many students take as many as four predominantly mathematical subjects (that is, Mathematics I, Mathematics II, Physics and Chemistry) when their interests, and the interests of the society, would probably be better served by a more balanced curriculum. The implications might be more precise definition of prerequisite knowledge and skills, revised tertiary curricula, bridging courses, alternative routes to the completion of degrees, even professional training at postgraduate level as in the USA. Increased retention rates and social pressures for secondary schools to provide a sound well-rounded education are placing enormous pressures on the interface with tertiary education. The need for tertiary courses to review their prerequisites is urgent and it is good to see that McGaw has urged such a review.

Multidimensionality

McGaw canvasses the multidimensionality issue without offering a clear resolution. In fact, of course, there is no resolution, only compromises which are more or less satisfactory. Definitions of what is satisfactory depend on the scope of the outcomes considered. The Working Party chose, although their report does not make this explicit, to take a broad view involving not just consideration of psychometric issues but also social and curriculum consequences, particularly in terms of backwash effects on the way in which students are likely to exercise their choices. Some of the reasoning in favour of retention of a single main index is explored in a Supplementary Paper (i) of the Working Party's report. Nothing that McGaw says denies the validity of that reasoning.

It must be realised, however, that any selection decision necessarily involves the placing of multidimensional information onto a single dimension (even if there are only two points on it, as there are eventually: either 1 = select or 0 = reject). There can be no question about the necessity, only about what information is included and how it is combined. Furthermore, though this is a different issue, there are always people who just miss out (even by a whisker) and who might have replaced those who just squeaked in if the type of information, the circumstances under which it was obtained, and the way it was aggregated had been slightly different.

The Working Party obviously considered the possibility of reporting simply a subject performance profile, leaving the problem of how to solve the aggregation problem, and therefore the multidimensionality problem, to those responsible for the selection decisions. Where such a system has been tried, notably New South Wales, the tertiary institutions have not in general sought to use the available information in more differentiated ways but have continued to calculate a general performance index (HSC Score). It is a clear case of how pressures of time, cost and feasibility play a big role in tertiary selection procedures. The system has become less, rather than more, sensitive to local circumstances which can be dealt with through anomaly identification and appeals procedures by a separate certifying authority such as the Queensland Board of Senior Secondary School Studies.

Further to the provision of a profile of subject results, the need for comparability of results within a subject across schools would require at least a partial return to public examinations (subject reference tests for scaling school-based subject assessments). An omnibus scaling test such as presently used can provide a suitable basis for scaling across subjects within each school (and then across schools) but would be completely inadequate for scaling each subject across schools. Currently, the adjusted Special Subject Assessments (SSAs) resulting from the first stage of scaling provide estimates of each student's general performance within the school not estimates of each student's subject performance across schools. This point is discussed in greater detail in Appendix 1 of the Working Party's report.

Bias

The question of whether there is an inbuilt bias in the calculation of TE Scores has been debated and researched for several years. Some recent research indicates that the matter of gender bias is extremely complex, and not all in the one direction, but that "when the differences in choice of subject combinations are accounted for, much of the variation in outcomes by gender is removed" (Allen, 1988). It is, at least, by no means clear that any patterns of difference in scaled results between boys and girls can be attributed to problems of multidimensionality and aggregation, as suggested by McGaw to have been demonstrated by Masters and Beswick for the ACT (Masters and Beswick, 1986; Committee for the Review of Tertiary Entrance Score Calculations in the Australian Capital Territory, 1986). In a recent report Daley (1989) disputes this claim and identifies such differences as being related to the differences between the types of tasks found on the scaling test and school assessments. Adams (1984) showed, however, that the situation is even more complicated than that and as Allen (1988) says, the difficulty is distinguishing "real" differences from "unfair" ones. The Working Party proposals were not specifically directed towards resolving such issues though there are two recommendations of indirect relevance: Recommendation 37 on addition of a Writing Task to ASAT; and Recommendation 38 on the introduction of anomaly detection procedures. Both of these recommendations have been implemented and their effects should be monitored, though the effects on gender differences are likely to be marginal. More research is obviously needed on this issue.

Subscales

Mention is made of the recommendations of the Committee for the Review of Tertiary Entrance Calculations ( 1986) that the ACT calculate two aggregates based on different groupings of subjects scaled separately to the so-called quantitative and verbal subscales of ASAT (which lack any demonstrated construct validity for such a purpose). This has proved to be quite a confused proposal with unfortunate consequences. Far from solving the perceived multidimensionality problem, this has made it worse. Implementation of that proposal, as should have been expected, has not been quite as recommended. An aggregate is still calculated but with different subjects scaled to different parts of ASAT. As Daley (1989) has shown, this is conceptually incoherent and worsens the very effects that it seeks to reduce, that is, it introduces some biases.

The Working Party's recommendation on subscales, as McGaw recognises, is an entirely different proposal aimed at providing additional information about student performance but without a sectioning of subjects. McGaw is quite right in claiming that there is no psychometric justification for the nesting of subscales within the Overall Achievement Positions (OAPs) and in recognising that this was a matter of "deliberate policy". The Working Party's concern was that one of the subscales (probably the "symbolising" dimension) might become the dominant scale, overriding even the OAP in importance and introducing unfortunate "backwash effects" on student choices of subjects. McGaw's analysis that such effects would occur anyway appears unassailable. Consequently, I would not wish to persist in arguing for the merit of nesting. Even so, it is important to draw attention to the possibility of untoward effects in the use of profiles in selection procedures and the need for widespread discussion before any such procedures are put in place.

The more fundamental issue, which informed the comments of Maxwell and Allen (1988) about "global" versus "regional" to which McGaw refers is the matter of banding and levels of precision in reporting the scaled results. This needs further discussion (see later).

Scaling

The necessity for some form of scaling needs to be continually stressed. McGaw rightly does not mince his words: "The abolition of scaling would, however, re-introduce the inequities that characterised the earlier system when there were clear benefits in taking subjects in the company of less able students."

Explanations of the present scaling procedures for TE Scores have been provided elsewhere (Maxwell, 1987; Maxwell, 1988). The general principle is to make sure that each student's TE Score (or OAP) is as independent as possible of which subjects that student chooses to take and of who else chooses to take those subjects. Paradoxically, of course, this requires that the general ability of students in each subject and school be taken into consideration. The scaling procedures can be seen as directed at removing those components of each student's scores that are arbitrarily related to the company they have kept (that is, the performance of other students in those subjects and that school). The Board of Senior Secondary School Studies may unwittingly contribute to popular misconceptions on this matter by publishing the means and standard deviations on ASAT for statewide subject groups. Such statistics are strictly meaningless in terms of the scaling procedures. What is taken into consideration is the distribution of ability of the group of students taking each subject within each school, which can vary from school to school and year to year (even though the statewide data are remarkably stable). The first stage of scaling can also be thought of (but not so accurately) as estimating what the subject results would have been like if everyone in the school had taken that subject.

Recent ongoing analyses show that the current scaling procedures actually work remarkably well. One analysis has shown generally strong relationships within schools between student average levels of achievement (AVLA) and rescaled aggregates (RAGs). Departures from strong relationships are now identified through a set of anomaly detection procedures and referred for special consideration to an appeals committee. The anomaly detection procedures also involve other tests of the possible lack of fit of the scaling model in particular situations but the total number of such cases is currently quite small.

Two points about anomaly detection procedures should be emphasised. One point is that this is simply part of the ongoing exploration of ways in which the intentions of the scaling system can be better realised. Over the years various procedural and calculational changes have been introduced, both minor and substantial, as new understandings have been reached about how the process can be improved. The scaling system is known to work better now than it did in the past. The second point is that further improvements, both in anomaly detection and in other technical matters, can almost certainly be effected and could be achieved more rapidly if additional resources could be found for the necessary research.

Banding

The question of banding is the one on which different people appear to take fundamentally different positions, though it may be possible to reconcile them. McGaw's position would seem to be that there is an underlying, infinitely divisible, continuous dimension of general performance and that the arbitrary level of precision represented by the rescaled aggregates taken to the nearest unit represents a meaningful level of precision with which to represent each student's position on the dimension. An alternative proposition would be that the scaling is discrete rather than continuous and that we might look to the characteristics of the data themselves to decide what level of discreteness in the output data might be most consistent with the level of discreteness of the input data and the effects of the scaling. This is, no doubt, a shocking proposition for adherents to classical measurement theories. However, it is possible to show that the assumptions of continuity and discreteness have different consequences for interpretation of the rescaled aggregates (or the recommended composite achievement index).

Lately I have been involved with several other people in analysing the characteristics of the scaling system through a process which we have termed 'perturbation analysis'. The essence of the process is to perturb the data a bit and see what happens. The central proposition is that a reasonable level of output precision would be one where on the one hand a small change in the input data produces little or no change in the output data and on the other hand a large change in the input data produces a noticeable change in the output data. It is necessary, of course, to operationalise the meaning of 'small' and 'large' in these contexts. A complete explanation must await a full report on these studies. A general overview must suffice here.

A variety of small discrete changes in the input data have been investigated. These have included arbitrary changes to some of the data at the level of the minimum change possible (one point on the input scales), the introduction of a twin with identical results, and the removal of the top or bottom 5 per cent of a school. What is examined is the effect on the other students. As the number of bands increases from 20 (proposed OAPs) to 100 (present TE Scores) to 1000 (RAGs), the effects become more substantial. That is, the number and size of changes of classification become larger. On the other hand, such perturbations produce for 20 bands what would generally be considered to be a reasonable number of changes by one band and essentially no changes by more than one band. The consequences for many students' TE Scores of such marginal changes in the input data relating to other students is quite alarmingly large and the RAGs are even more wildly unstable. There is considerable noise in the scaling system but the 'signal to noise' ratio would seem to be recognisably appropriate at about 20 equal-size bands. (The second requirement, of noticeable change in output for a large change in input, such as a change for a student by two levels of achievement in half their subjects, is also satisfied by 20 bands.)

Three further points need to be made. First, the use of a more fine-grained banding, as with TE Scores, may itself be a source of much of the present public dissatisfaction with the present system; parents and students are well aware, even if measurement experts are not, that the present system produces unstable results; much of this dissatisfaction could be expected to dissipate if the results were reported to a reasonable degree of precision, one that could be seen to actualise the aim of each student's final result being unaffected by the vagaries of the performance of other students in the same subject and school. Second, contrary to the claims of both McGaw ( 1989) and Sadler ( 1987), neither of whom have analysed real data, the perturbation analyses show that the instability at the top end is not less but more than that in the middle and that equal-size bands are more reasonable than unequal-size bands at all levels of overall achievement. Third, there seems no justification for going beyond the tolerance of the data for any purpose; we would not do so with scientific data; clearly, TE Scores, and a fortiori RAGs, are beyond the level of meaningful tolerance in the data and bring the system into disrepute when used for selection decisions.

Banding on Subscales

The point needs to be made that more research would need to be done on the recommended supplementary scales before they could be implemented. However, the general idea still seems feasible despite some initial negative reactions. It may not be necessary to adhere to the bands-within-bands proposal, though the cautions of the report about the possible repercussions of competing scales should be noted. Further, it should be noted that the correlations between these scales and the main scale can be expected to be much less than has been suggested. In a trial run for the Working Party the correlations were about .5 to .6; it must be remembered that each SAP depends on the choice of subjects (whether high or low weightings on that dimension) as well as on the quality of performance.

Reporting of Information

McGaw discusses what he describes as a 'potential conflict' between the criteria-and-standards based information of the levels of achievement and the explicitly normative information of TE Scores or OAPs. Rather than this being seen as a conflict, it would seem better to stress the different but complementary information provided. Levels of achievement indicate relative performance within each subject, and to some extent within each school (defined by expectations deemed to be appropriate within that subject, and to some extent within that school) whereas TE Scores or OAPs indicate relative performance overall (independent of subjects taken and school attended).

Furthermore, it must be remembered that standards for the levels of achievement need to be defined so as to represent an anticipated range of student performance. As such they cannot escape their underlying normative base; no useful set of standards can. SSAs require finer discrimination within levels of achievement. Such finer discriminations are in any case necessary as part of the system of monitoring and review.

The real conflict would appear to occur when the adjusted SSAs are interpreted erroneously as indicating comparative performance within the subject across all schools (rather than comparison across all subjects within a school). For this reason, it would appear preferable to require all SSAs to be reported to students and parents on a standard scale (say, mean and standard deviation of 62 and 12 respectively). Some schools have adopted the practice of prestandardising each set of SSAs to the group mean and standard deviation on ASAT, leading to interpretive confusion

Implementation

So far, several of the Working Party's recommendations could be said to have been implemented: Recommendations 11, 12 and 45 on QTAC and the timing of the first round of tertiary offers; Recommendations 36 and 37 on the addition of a Writing Task to ASAT (the two components together constituting what is now called the Common Scaling Test); and that part of Recommendation 50 relating to detection and correction of scaling anomalies. I agree with McGaw that more of the recommendations ought to be implemented and that it is not necessary to argue for all or nothing. Finally, the Working Party's report warns us that the issues concerning tertiary entrance are complex, that the present system has actually worked fairly well, that simple affordable alternatives are difficult to invent, and that alternatives whose consequences are not carefully considered before implementation may have serious consequences for the quality of education for many years. Whatever decisions are taken for the future, the importance of ongoing research is paramount.

References

Adams, R.J. (1984). Sex Bias in ASAT? (ACER Monograph No. 24). Hawthorn, Vic.: Australian Council for Educational Research.

Allen, J.R. (1988). ASAT and TE Scores: A Focus on Gender Differences. Brisbane: Board of Secondary School Studies.

Andrich, D. (1989). Upper-Secondary Certification and Tertiary Entrance: Review of Upper-Secondary Certification and Tertiary Entrance Procedures commissioned by the Minister for Education in Western Australia. Perth: (mimeo).

Committee for the Review of Tertiary Entrance Score Calculations in the Australian Capital Territory. (1986). Making admission to higher education fairer. (Chair: Dr Barry McGaw). Canberra: Australian Capital Territory Authority, Australian National University, Canberra College of Advanced Education.

Daley, D.J. (1989). Determining Relative Academic Achievement for Fair Admission to Higher Education. (Report to a Joint Committee of the Australian National University, the Canberra College of Advanced Education, and the ACT Schools Authority appointed to supervise research into Tertiary Entrance Score calculations). Canberra: (mimeo).

Masters, G.N. & Beswick, D.G. (1986). The construction of tertiary entrance scores: Principles and issues. Melbourne: Centre for the Study of Higher Education, University of Melbourne.

Maxwell, G.S. (1987). Scaling school-based assessments for calculating overall achievement positions. Appendix 1 in Working Party on Tertiary Entrance. Tertiary entrance in Queensland: A review. (Chair: Mr John Pitman). Brisbane: Joint Advisory Committee on Post-Secondary Education and Board of Secondary School Studies (pp. 190-200). [Also in The Tertiary Entrance Score - A Technical Handbook of Procedures. Brisbane: Board of (Senior) Secondary School Studies, 1988 (pp. 44- 52).]

Maxwell, G.S. (1988). The how and why of TE Scores. The Graduate Connection, Queensland edition, September 3-16 and 25-26.

Maxwell, G.S. & Allen, J.R. (1987). A rejoinder to the paper by D. R. Sadler: "An analysis of certain proposals contained in 'Tertiary Entrance in Queensland: A Review..." Brisbane: Board of Secondary School Studies.

McGaw, B. (1989). Comments on Tertiary Entrance in Queensland: A Review. Queensland Researcher, Vol.5, No.2, pp.25-44. http://www.iier.org.au/qjer/qr5/mcgaw.html

Ministerial Review of Post-Compulsory Schooling. (1985). Report Volume 1. (Chair: Ms Jean Blackburn). Melbourne: Ministerial Review of Post-Compulsory Schooling.

Ministerial Working Party on School Certification and Tertiary Admissions Procedures. (1984). Assessment in the upper secondary school in Western Australia. (Chair: Dr Barry McGaw). Perth: Western Australian Government Printer.

Sadler, D.R. (1987). An Analysis of certain proposals contained in Tertiary Entrance in Queensland: A Review with particular reference to the achievement position profile and step-wise selection. St Lucia: Assessment and Evaluation Research Unit, Department of Education, The University of Queensland.

Williams, T. (1987). Participation in education (ACER Research Monograph No. 30). Hawthorn, Vic.: Australian Council for Educational Research.

Working Party on Tertiary Entrance. (1987). Tertiary entrance in Queensland: A review. (Chair: Mr John Pitman). Brisbane: Joint Advisory Committee on Post-Secondary Education and Board of Secondary School Studies.

Please cite as: Maxwell, G. (1989). Reflections on "Comments on Tertiary Entrance in Queensland: A Review". Queensland Researcher, 5(1), 45-60. http://www.iier.org.au/qjer/qr5/maxwell.html

[ Contents Vol 5, 1989 ] [ QJER Home ]
Created 5 Apr 2007. Last revision: 5 Apr 2007.
URL: http://www.iier.org.au/qjer/qr5/maxwell.html