Assessing literacy: Beyond the frosted pane

Lenore Ferguson
Graduate School of Education
University of Queensland
Literacy is a highly complex skill. In today's world it is deemed that all citizens require these skills from an early age. Surveys to measure literacy standards, especially of school children, have been conducted in recent times. One of these, released in September 1997 by the Commonwealth of Australia, reported that approximately 30 per cent of students in grades 3 and 5 failed to meet 'proper' standards of literacy. A detailed analysis was conducted of student texts provided to the public domain from the survey data. This analysis shows that a singular feature consistently distinguishes texts regarded as 'well above standard' from those regarded as 'well below standard' - consistent use of conventional spelling. Qualities that were apparently less valued include following directions, keeping to the topic, maintaining a flow of ideas, taking an authoritative stance, and controlling quite complex grammatical structures. Attention should be given to success in the more demanding aspects of literacy as well as in control of surface features.

Literacy is an ability that human beings developed about 3000 BC to assist record keeping for trading activities in Sumeria (Gelb, 1963). In the interim it has grown into a complex, multifaceted ability encompassing an expanding range of media and demanding an increasing array of communication skills and understanding (Resnick & Resnick, 1997). Today, western society and developing economies have become dependent on all citizens having advanced literacy skills (Lo Bianco & Freebody, 1997).

One of the consequences of this expectation is the felt need of politicians to measure the literacy levels of all - students and adults alike. In the last twenty years in Australia there have been several major surveys to measure literacy performance in adults and children, three of them in September 1997 (Australian Bureau of Statistics, 1997; Bourke & Keeves, 1977; Bourke, Mills, Stanyon & Holzer, 1981; Commonwealth of Australia, 1997a, 1997b; Wickert, 1989).

A complicating factor is the competition over what kind of measures are suitable, valid and reliable. The surveys listed above used different materials and measures thereby making it difficult to make valid comparisons. Queensland, like many other school systems, has monitored students' literacy for over 60 years, using replicated and therefore comparable tests, to show that literacy levels were maintained or increased over that time (Duck, 1979; Jacobson, 1978; Peckman, Fifoot, & Byrne, 1988; Review and Evaluation Directorate, 1993). Since the National and Agreed Goals of Schooling were signed by State Education Ministers in Hobart in 1989, several States and the Federal government have introduced standardised testing on students' basic literacy skills. The results of these are generally reported widely in local and national newspapers.

Today, co-operative efforts are being made to establish literacy benchmarks to be used in schools across all Australian states and territories. As part of this national project, data were collected from 7,454 students in 400 schools across Australia in September 1996.


The Federal Minister for Schools, Education and Training, Dr David Kemp, commissioned the Australian Council for Educational Research to use these data in preparing a report on standards of literacy in Australian schools. The report was released on Monday, 15 September 1997. According to this report, Literacy Standards in Australia, 28 per cent of Year 3 students and 33 per cent of Year 5 students did not meet the writing standards of their respective Years. For reading, 27 per cent and 29 per cent of students in Years 3 and 5, respectively, failed to meet the standards described as 'proper'. Samples of students' work were included in the report to illustrate levels of performance at, above and below the expected standards in reading and writing for children in Years 3 and 5.

To help the general public appreciate the nature of the 'literacy problem', Dr Kemp provided writing samples of pupils in Years 3 and 5 for publication in The Australian on Tuesday 16 September 1997. They appeared in the article 'A teaching method fails its children'. These samples, additional to those provided in the published report were used to give the public an opportunity to determine what is meant by children writing 'properly'.

The sample texts were taken from three tasks that were common to the respective Year levels (Commonwealth of Australia, 1997b). These student samples were used to illustrate standards that were deemed to be 'well above' and 'well below' Year 3 and Year 5 standards of writing. These tasks directed students to:


Year 3: Write a letter to a magazine answering the question, 'Should birds be kept in cages?'.

Well above standard (Text A)Well below standard (Text B)
I think birds should not be cept in a cage because they need to fly and they need to get some air. And it needs to look around the earth.Birds shuold be cerd in cajr bcerce they are nisd pers and they hap you go to pars and I like birds very marg.

Year 3: Write a narrative about an adventure with a legendary creature (such as Roc, Big Foot or a Griffin).

Well above standard (Text C)Well below standard (Text D)
One day I was walking up a mountain in Port Stephens. I was walking slowly, breathing in the sweet aroma of a purple vilot. It smelt good. I was nearlly at the top of the mountain when suddenly I heard a strange noise. It sounded like a Roc screaching high up in the mountains.I went on a adventre with a Roc. it flue me & evry one to school & home. it took me on meney advents. I tock him in for nesw. & he skead the tchors out of ther with. Thay were so skead that ther skeltion com out.

Year 5: Explain your view on the topic, 'Kids and Money'.

Well above standard (Text E)Well below standard (Text F)
Parent's often complain ab out having to fork out pocket money. But I think the parents should stop and think. If they did't give out pocket money they would be the ones paying for CD's, books, toys, etc.I think that if you won't to be paid you shuold work for it for igsampal you should mo the lown wosh the car wask up after tea and all uther things.


The focus of this paper is not on the standards themselves. The focus is on a search for the qualities used in the survey to determine that one text of each pair is considered superior to the other. Texts can be assessed in a number of ways. Outcomes from several approaches can be compared to determine what it is about the texts that each values.

First, a traditional approach once used unquestioningly was a simple word count. Using this quantitative technique, Texts A, C and E (30, 53, 37 words) outperform Texts B, D and F (24, 48, 32 words), confirming the survey's results. Another abandoned approach from the United States is the T Unit count, that is, the number of principal clauses in a text (Hunt, 1965). This technique shows Texts A and B to be of equal levels with 3 principal or independent clauses each, Text C with 5 principal clauses to be inferior to Text D with 6 principal clauses, and Text E with 3 principal clauses to be superior to Text F with 1 principal clause. As this technique was abandoned because it ignored the complexity of texts with subordinate and embedded clauses, it would be appropriate to compare the total number of clauses in each pair of texts. On this basis, Text A with 5 clauses is seen to be superior to Text B with 4 clauses; Text C with 6 clauses is inferior to Text D with 7 clauses; and Text E with 5 clauses is inferior to Text F with 6 clauses.

Already it is clear that the choice of methods used to measure literacy performance can influence the results. Therefore statements about literacy standards need to be supported with precise explanations about assessment methods.

In the United Kingdom it was traditional to use global or impression marking (Hartog & Rhodes, 1936; Wiseman, 1961). Such marking, using implicit criteria, has been shown to be influenced heavily by surface features such as handwriting, spelling and punctuation (Charney, 1984; Roach, 1971; Wood, 1991). It would appear from a glance at the texts, therefore, that this method could result in the same outcome as the total word count, making Texts A, C and E superior to Texts B, D and F.

These approaches demonstrate that the most superficial of measures, word count and surface features, may have led to the findings reported in Literacy Standards in Australia. Measures that consider communicative understanding and complexity of thought, both represented through complex language structures, could support this outcome or produce different findings. Further examination is warranted.

An approach currently favoured in the United States, in the United Kingdom, and in Australia, is to assess components of a text individually and then assign an holistic or aggregated assessment by trading off strengths and weaknesses (Diedrich, 1974). Judgments on the components are usually made against predetermined criteria, the method being named for this process - criterion-referenced or standards-referenced assessment (Sadler, 1987). Studies show that this method forces markers to look more closely at aspects of texts (Gipps, 1994). Aspects overlooked in superficial impression marking are taken into account, thus altering the marker's assessment of the quality of a text.

On page 18 of the report (Commonwealth of Australia, 1997a) it states that the texts in the Australian survey were assessed on three features: the quality of thought (including students' abilities to express ideas, to write imaginatively, to develop an argument clearly and logically, and to support a point of view); language control (including spelling, punctuation, and vocabulary); and sense of purpose and audience. Following assessment, each text was determined to have scored above or below a set 'cut score'. It is unclear what marking procedure was employed. It may have been impression marking with the identified features used as a guide, or profiling against all features, or criterion-referenced assessment with markers trading off strengths against weaknesses.

As the relative values of these pairs of sample texts have been pre-determined, it should be possible to analyse the texts into components that are comparable and thus identify the features, or aspects of them, that have been used to distinguish 'above standard' from 'below standard' texts, that is, to determine criteria for the 'cut score'.


The preface of the report indicates that the focus on literacy ability and standards is to improve the functional capacity of young people. This is an affirmation of the theoretical basis of the national framework for assessment in English, English: A curriculum profile for Australian schools, and reveals the source of the features selected for assessment. The theoretical basis draws on Halliday's (1985) systemic functional model of language. It would be appropriate, therefore, to select this functional model as a basis for the analysis. Drawing on Halliday (1994), and using his non-technical terms, I use for analysis the functional components of Things, Events, Circumstances and Connectives that comprise the expression of ideas in all texts. Each text will be laid out in terms of these components. This will enable equitable comparisons to be made in functional terms and will also provide the means to make equitable comparisons in structural terms.

Year 3: Letter to a magazine ( Text A) (well above standard)


Ithinkbirds should not be cept in a cage
becausetheyneed to fly

andtheyneed to getsome air.
Anditneeds to look
around the earth.

Year 3: Letter to a magazine (Text B) (well below standard)


Birdsshoudd be cerd
in cajr
bcercetheyarenisd pers
andtheyhap you go
to pars
andIlikebirdsvery marg.

Year 3: Narrative adventure (Text C) (well above standard)

< td>I

Iwas walking
One day up a mountain in Port Stephens.

Iwas walking

breathing in the sweet aroma of purple vilot.


nearly at the top of the mountain
whenhearda strange noise.suddenly

like a Roc screaching high up in the mountains.

Year 3: Narrative adventure (Text D) (well below standard)


on a adventre with a Roc.

itflueme and evreyoneto school & home.

ittockmeon meney adventrs.

Itookhimin for nesw.
&hesceadthe tchorsout of ther with.

Thaywereso skead that ther skeltion com out.

Year 5: Explanation of view (Text E) (well above standard)


Parent'soften complain abouthaving to fork out pocket money.
ButIthinkthe parents should stop and think.
Iftheydid not give outpocket money

theywould bethe ones paying for CD's, books, toys, etc.

Year 5: Explanation of view (Text F) (well below standard)


Ithinkthat you shuold work for it for igsampal you should mo the lown wosh the car wash up after tea and all uther things.
ifyouwon'tto be paid


In the following discussion, the analyses of the three pairs of student texts will be drawn upon, where appropriate, as the texts are compared in terms of the quality of thought, language control, and sense of purpose and audience.

Texts A and B: letter to a magazine answering the question 'Should birds be kept in cages?'

There are no apparent differences in the quality of ideas presented in the two texts. Both texts address the topic and, as asked, present a personal view of the kind likely to be found in a letter to a magazine. The development of the central concept, birds, is negligible in both cases. Neither text identifies particular kinds of birds although they each offer one qualifying descriptor - need air in Text A, and are (nice pets) in Text B. This is clearly seen in those parts of the text categorised as 'Thing', above. Text components categorised as 'Event' describe what birds do. In Text A the birds fly and look; in Text B they have (to) go. Both texts are organised conventionally for letters expressing personal opinions: each writer makes a statement and then provides three arguments to support it.

The two writers show sensitivity to the context although they express their understanding differently. The writer of Text A takes a negative stance, opening the text with a statement of opinion expressed in the subjective first person, I think ... . In contrast, the writer of Text B takes an affirmative position, opening the text with an authoritative assertion expressed in the objective third person, Birds should ... . Their differing perspectives are captured by I think in text A and I like in Text B. The writer of Text A assumes a conservationist role focusing on the needs of birds, while the writer of Text B speaks as a pet lover focusing primarily on his/her own feelings. These personal and social perspectives are presumably intended to catch the sympathy of, or to persuade, readers of the magazine.

While the two texts are not grammatically identical, they parallel each other in clause structures. Each has one independent/principal clause, and three dependent/subordinate clauses. The dependent clauses are linked logically to the independent clause with the same conjunctions in the same sequence - because, and and and, indicating similar patterns of thinking. There is a minor problem with cohesion in Text A in the cohesive chain, birds, they, they, it; and a related difficulty in the generalised birds being kept in a cage. On the other hand, Text A uses an embedded noun clause, (that) birds should not be cept in a cage, as object of think. Both texts indicate attempts to make fine distinctions in meaning in that each text includes five instances of words expressing personal opinion to differentiate them from neutral description; that is, think, should, need to, need to, need to in Text A, and should, nice, (have to), like, very in Text B. It could be argued that some air in Text A refers more accurately to some space. Similarly, the repetition of need to could be debated: is the repetition deliberate for reasons of emphasis, or is it the result of an inability to find synonyms?

With respect to conventions of spelling and punctuation, there are considerable differences between the two texts. Text A of 30 words has 3.3 per cent spelling errors, while Text B of 24 words has 36 per cent spelling errors. The patterns of errors in Text B suggest that the writer could have a hearing impairment that impacts on spelling ability if a phonic orientation is used almost exclusively. The writer of Text A has apparently included the second sentence as an afterthought and thereby given it the erroneous status of a separate sentence, punctuating it as such. Conversely, the final clause of Text B, a personal appraisal, could have been represented in a separate sentence.

In summary, there are no features that distinguish the merits of the two texts in terms of the quality of their ideas. There are differences in language control. Text A shows minor lapses in cohesion, and in discriminating choice of vocabulary. It does, however, show some comparative complexity in incorporating an embedded noun clause. In the use of spelling conventions Text A is far superior. In trading off strengths and weaknesses, the markers appear to have placed high value on conventional spelling and the use of a noun clause to the extent that Text A is awarded 'well above standard' even though it exhibits lapses in cohesion and vocabulary choice. Perhaps the lapses were cancelled out by the use of a noun clause. Text B shows no grammatical lapses. Apart from that, it differs from Text A in two respects: it has no embedded noun clause and it has 33 per cent more spelling errors. One must presume that Text B is well below standard because it has no noun clause and/or it has a large proportion of spelling errors.

Texts C and D: narrative about an adventure with a legendary creature (such as Roc, Big Foot or a Griffin)

Narrative adventures require lots of action that bring excitement to the characters and, vicariously, to readers. The task asks the writers to include an imaginary creature as one of the characters and implies that at least one other obligatory character could be the writer.

Text C establishes a set ting, introduces something of a character, I, and mentions the legendary creature, Roc, in a phrase in the last sentence. As there are no interactions between the characters, there is no adventure and so the task directions are not followed properly. There is a narrative sequence of Events, but these events concern only one character, I. Text D is quite different. Characters include I, Roc, everyone (at school and home) and teachers. I initiated actions twice and Roc three times; the teachers reacted emotionally in being scared. Text D follows the task directions fully.

Development of ideas in a narrative adventure generally centre firstly around a range of interactive activities or Events, and then around Circumstances of places in which Events occur. In Text C the Events associated with the one character are limited to was walking, was walking, was, heard. Three of the seven Circumstances develop the mountain setting, the remaining four describing the 'how' of Events. In Text D the Events list the actions of three groups of characters: went, flue, tock, took, scead, were. The five Circumstances identify the various locations of five of these adventurous activities. Only Text D observes the standard conventions of narrative adventures.

The social purpose of narrative adventures is to relate an exciting tale where a problem arises and the character/s find a way to resolve it. Text C has no such problem unless hearing a strange noise can be counted as one. Text D describes types of activities involving the author and his friends, his/her teachers and a Roc. The problem arises when the Roc is taken to school for 'news' and the teachers are scared. The satisfying conclusion for the writer is that the teachers were so sceard that ther skeltion com out.

The contextual sensitivities of Texts C and D are also quite different. Text C is introspective, orientated as if from a dreamy wanderer who seems surprised at being interrupted by a strange noise. The anticipated audience seems to be a similarly reflective reader. Text D evokes a sense of shared fun. It is overtly written from the perspective of a young person of school age and intended for a peer audience who would equally enjoy the spectacle of teachers being scared.

The language control of the two texts is variable. Both texts present coherent ideas in six major clauses, all independent ones except for one dependent clause in Text C. Text D has a minor clause embedded in a postmodifying position, that ther skeltion com out. Text C includes two postmodifying phrases, breathing in the sweet aroma of vilot and screaching high up in the mountains. Both texts use repetition. In Text C was walking is used once to identify where, and the second time to explain how. In Text D took is repeated to highlight a reverse parallelism to show that taking the Roc to class was seen as a 'problem' for that context. Sceard is also repeated to reinforce the planned achievement in scaring the teachers.

Technical conventions are observed in Text C except for two misspellings in vilot, nearlly and screaching, 9 per cent in 53 words. Spelling conventions in Text D are not met in 33 per cent of 48 words, & is used instead of and, and three capital letters are missed at the beginning of marked sentences.

Overall, then, this is an interesting situation where Text C does not meet task specifications but is smooth and technically correct except for two words spelled incorrectly. Text D meets all task and communication specifications, controls complex postmodifying structures, but fails to observe spelling conventions. One can only presume that correctness in surface features is valued over meeting task and communicative expectations.

Texts E and F: explain your view on the topic 'Kids and Money'

These texts reveal similarities and differences in the quality of the ideas presented. Text E changes the topic to Parents and Pocket Money in an assertion in the opening sentence. This assertion is then challenged, the challenge being supported by a reason which is internally illogical and based on an apparently inalienable right of children to have the CDs, books, toys they want.

Ignoring the content, the overall organisation of the text is designed to present a conditional point of view. Text F addresses the topic as given, presenting a conditional point of view that is supported by illustrative examples of work that children could be expected to do.

For this task, no context was provided beyond the directive 'your view', so the writers were forced to imagine a purpose and audience or to assume that the test setters/markers were the only intended audience. As neither text includes Circumstances, it would appear that both writers took the second option. The writer of Text E takes the role of champion of children's rights. In contrast, Text F adopts the role of ethical mediator on the concept of children's responsibilities as members of a community.

Language control in the two texts varies slightly in quality. Each text comprises complete clauses that are linked with appropriate conjunctions. Text E uses a simple set of structures whereas Text F successfully incorporates layers of embedded noun clauses and an interpolated conditional clause within the first noun clause. In addition, the writer of Text F has effectively elided conjunctions, subjects, and a verb to enhance the flow of the text. The writer of Text E has chosen to use colloquial expressions, fork out and give out, when describing the distribution of pocket money by parents. This could be a response to the relatively informal structure of the task.

It is in the area of social consideration through language conventions that greater differences are evident. Spelling errors occur in 5 per cent of the 37 words in Text E, and in 27 per cent of the 32 words in Text F.

In brief, Text E does not address the topic, uses simple clause structures but spells most words correctly. Text F sticks to the topic, uses very complex clause structures but makes many spelling errors.


The issue at stake is crucial. How are these various aspects of literacy ability weighted in the trading off of strengths and weaknesses to arrive at a grade for literacy performance? The relative strengths and weaknesses described in the previous section are presented, summarily, in table 1: Comparisons of Pairs of Texts. Taking the information from the earlier analyses, superior performance in each named criterion is indicated by +, inferior performance by -, and equivalent performance is indicated by =.

Table 1: Comparisons of Pairs of Texts

Table 1

The relative weightings given to these four general aspects of literacy are self-evident. The first three of these are generally considered more demanding than the fourth, that is, keeping to the topic, being sensitive to the context, and using standard grammatical patterns. In each pair of texts those deemed to be 'below standard' in fact outperform those considered to be 'above standard'. Text A does use complex grammatical structures but Text B outperforms it in other areas of grammar. It is only in the fourth aspect, surface conventions, that the 'above standard' texts are superior - and only in the use of standard spelling.

By analogy, one can reject a house as unacceptable if it is clad in unfashionable materials or colours, but such a personal opinion does not alter the functional viability of the building. It is when the cladding is functionally unsuitable that there is some cause for concern. Greater concern is reserved for buildings that are not built to specifications.

Texts B, D and F are functional because, fundamentally, their writers are communicatively competent, building on some spelling knowledge, including grapho-phonic knowledge - albeit insufficient. What is of greater concern is that the ' above standard' texts are structurally inferior, and in two of the three cases fail to follow directions.

The 'substandard' texts illustrate the heart of literacy in action. Those deemed to be 'above standard' reveal the artificiality of literacy tests such as the one that caused these texts to be produced. They also challenge the validity of the results, given the well-documented circumstance of students writing longer and higher quality texts when they choose or manipulate the topic (Collins, 1993; Fine, 1985; Gradwohl & Schumacher, 1989).

These analyses bring into question the adequacy of the system used to evaluate their quality and, especially, raise questions about the capacity of any system to define literacy in simple terms. If competence in spelling is the goal, then more direct measures can be taken.

Literacy is about communicating effectively. Surely it is important that a text does not simply have more words and look good on the surface, but that it is also responsive to complex demands of the task and the context (Beaugrande, 1984). To assess standards of literacy ability on the quality of surface features alone is akin to appreciating the value of a picture window which boasts a frosted pane.


Putting to one side the issues of cut scores and a system to define them, what clear message do these gradings send to our youngsters? Young writers may be deemed to be performing well below a Year level standard if they: According to the gradings illustrated, they must ignore these crucial aspects of effective writing to concentrate on: If our students follow this advice they will ignore directions, stick to simple grammatical structures, and avoid words they are unsure of spelling correctly. This is a perfect recipe for inhibiting literacy development!

No one will deny that these young writers who have been deemed to be well below standard need to focus their attention on spelling. But they should be commended for what they can do in responding directly to a task requirement and in controlling complex grammatical structures, not roundly condemned primarily on the basis of poor skills in spelling.

As learners everywhere will realise, too, there is another possible interpretation for the relative differences between these pairs of texts. While concentrating on new skills of grammatical embedding, the writers of the 'substandard' texts would typically demonstrate a temporary poorer performance than usual in other areas such as spelling. These fluctuations occur until complementary skills are synchronised, through practice, to a point where they become virtually automatic. Anybody who has ever learned multifaceted skills, like driving a car, will appreciate the effect. It could be, then that these students are being penalised, and criticised, for attempting to develop their literacy ability!


Where the decision to use a cut score is explained in the report, the public is explicitly identified as the audience. The simplistic reduction of the complex ability, literacy, to one an aspect of one feature (spelling), misleads the public and misrepresents the ability.

Issues surrounding literacy can be understood and addressed more effectively if a profile shows the extent to which the identified features are controlled by young people. The general public should have no difficulty in understanding that students may demonstrate different levels of ability across the key areas. Profiles of individuals, the whole group, and sub-groups could show how well students communicated:

Results reported in this way have the capacity to acknowledge achievement where it exists as well as to highlight areas that need attention. This kind of reporting also has the capacity to enhance children's confidence in some aspects of literacy and so encourage them to improve others that need attention. In addition to a measure of performance, it gives a direction to future learning.


Author details: Lenore Ferguson teaches English curriculum to preservice secondary teachers. In addition to being passionate about the creative aspects of English curriculum, she is interested in the teaching and learning of literacy at all age levels. The analytic approach used in this paper was developed as part of her current doctoral studies.

Please cite as: Ferguson, L. (1997). Assessing literacy: Beyond the frosted pane. Queensland Journal of Educational Research, 13(1), 71-90. http://education.curtin.edu.au/iier/qjer/qjer13/ferguson.html

