The University of Glasgow
assessment code requires that assessment items for which
‘true’ numerical marks cannot be given should be marked in
“bands”. ‘True’ numerical assessment means
that there is a clear achievement difference between consecutive
marks, so that getting 45/50 means something different from getting
44/50 or 46/50 (examples include multiple choice questions, and
programs assessed against a suite of test cases). Items where
numerical marking is inappropriate include essays, design exercises
and presentations – in this case it is hard to say that there is
a clear achievement difference between getting 44/50 and 45/50. These
assessment items must, under the university guidelines, be marked in
grades with secondary bands. There are 22 bands:
A1 |
A2 |
A3 |
A4 |
A5 |
B1 |
B2 |
B3 |
C1 |
C2 |
C3 |
D1 |
D2 |
D3 |
E1 |
E2 |
E3 |
F1 |
F2 |
F3 |
G1 |
G2 |
H |
|
I welcomed this change when it was introduced five years ago, but
it presented us with a new problem: how to create a marking scheme for
an assessment item that is to be marked directly in bands. For
numerical assessment, we can associate marks for every evident unit of
achievement and add them all up at the end; we cannot do the same with
bands. In addition, providing feedback on how numerical marks have
been awarded is easy as students can see where they have gained and
lost marks. Giving students simply a single band for their work does
not help them understand where it was deficient.
The attached ‘bundle’ shows the method I have used for
such non-numerical assessment in HCI classes, through the use of a
matrix of criteria and achievement levels. This method allows for the
recording of students’ achievement in each of the important
criteria, and gives students useful feedback on how their overall band
has been determined. (See http://www.cs.kent.ac.uk/national/EPCOS/essay.pdf)
for a description of bundles and their motivation).
Bundle Title: Who needs numbers anyway?
Problem Statement: It is difficult to mark a piece of assessment directly into grades and secondary bands.
The Bundle: Assessing a piece of
assessment for which numerical marks are inappropriate.
The way it works is: A matrix is
drawn up, with the important criteria for the assessment item as the
column headings. If necessary, a proportional weight is associated
with each criterion. There are eight rows, each associated with a
grade, A through H. Each cell contains a description of the
performance expected for the criterion in the column for the grade in
the row; for example, a B level of achievement for the
‘Presentation Skills ’ criterion might be “Fluent,
easy to understand ”. While marking the submitted assessment
item, the marker places tick in each column, indicating the extent of
achievement for each criterion. Ticks may be placed at the bottom of
the cell (so as to represent that the criterion has only just reached
that level of achievement), or at the top (so as to indicate that the
student’s work nearly made it to the next grade). At the end of
the assessment process, the marker can write general comments below
the matrix. Once all criteria have been assessed, an overall band can
be awarded by observation of the pattern of ticks. This is not a
computational process: it is done by observation, consideration of
criteria weighting, and the marker’s judgement.
It works better if: The students have been given the
assessment matrix in advance.
It doesn't work if: There are more
than eight or so important criteria, as this can make it difficult to
come up with an overall judgement at the end. If there are many
criteria, it might be possible to create a numerical marking
scheme.
Solution
Statement: Using achievement descriptors for each grade and
for each criterion enables systematic non-numeric marking, and
provides useful feedback.
|
An example of this bundle in practice is
shown below, for an HCI design and evaluation assignment at the MSc
level (with all criteria of equal weight):
|
Evidence of iterative process |
Design documents |
Prototypes |
Evaluation methods |
Design process |
Introduction, conclusion, reflective discussion |
A |
At least three complete iterations, and all iterations completely documented |
Sufficient for prototype development |
Clearly adequate for the purposes of evaluation |
Appropriate choice of methods, well conducted, results clearly stated and discussed |
Appropriate, complete, and thoughtful use of evaluation results |
Insightful, addressing several relevant issues |
B |
At least three complete iterations, and all iterations completely documented |
Mostly sufficient for prototype development |
Mostly adequate for the purposes of evaluation |
Appropriate choice of methods, mostly well conducted, results clearly stated and discussed |
Appropriate and complete use of evaluation results |
Reasonable attempt, but omitting some important points |
C |
Two iterations, or three iterations incompletely documented |
Almost sufficient for prototype development |
Almost adequate for the purposes of evaluation |
Mostly reasonable choice of methods, mostly well conducted, results unclear |
Appropriate use of some of the evaluation results |
Some relevant issues highlighted and discussed |
D |
Two iterations, and iterations incompletely documented |
Not sufficient for prototype development |
Not adequate for the purposes of evaluation |
Mostly reasonable choice of methods, poorly conducted, results unclear |
Inappropriate or incomplete use of evaluation results |
Limited reflective discussion |
E,F,G |
Less than two iterations, and iterations incompletely documented |
Not sufficient for prototype development |
Not adequate for the purposes of evaluation |
Inappropriate choice of methods, poorly conducted, results unclear |
No obvious consideration of evaluation results |
No appropriate reflective discussion |
H |
No iterations |
No obvious effort |
Wholly inadequate |
No evaluation |
No obvious effort |
No reflective discussion |
Note that the requirements for A and B under "Evidence of iterative process" are identical; this is deliberate, and does not cause a problem. It means that no more than the three iterations that were requested in the assignment specification are expected for at least a B grade: more iterations will not increase the grade for this criterion, but better documentation will. In practise, it gives the marker a greater range with which to assess the extent of achievement with respect to documentation.
Below is an example of one of these matrices in use. In this case it was used for evaluating student presentations in a General Readings course, when the students chose their own topic for their presentation. [Note: although the copy quality is unfortunately poor, this example does help to illustrate the overall idea.]
Students typically welcome
the feedback that is provided by these matrices, but they are often
unsure how to use them in advance of submission. The matrices are very
easy to use when marking. It can be slightly tricky to determine an overall band if the criteria weighting vary
greatly, but this is not usually the case. This method has been
adopted by several colleagues in my department.
|