Session 1

What is in the assessor’s mind? The merits of Machine Learning to code decision statements when doing pairwise comparisons
Sven De Maeyer (University Antwerp, Belgium), Renske Bouwer (University Antwerp, Belgium) and Marije Lesterhuis (University Antwerp, Belgium)

Recently, Comparative Judgement is increasingly used as a methodology to assess text quality. In comparative judgement, assessors receive randomly composed pairs of texts and have to indicate which one is best. This method is a promising alternative to analytic rubrics leading to high reliable scores (Pollitt, 2012). But our understanding of the validity of the results remain limited.
Rich information on the validity of CJ can be obtained by analysing assessors’ statements on why they chose one text over the other. These decision statements contain opportunities to generate feedback towards the writers of the texts (which aspects have assessors taken into account ?), but also towards assessors (which aspects did you take into account when judging?). The only bottleneck to use these decision statements in implementations of CJ is the manual coding of these statements, which is a tedious and time-consuming task.

In this study we explore how well different machine learning (ML) algorithms reproduce the coding of decision statements. We used 2599 decision statements coming from a CJ assessment in which 64 assessors assessed 405 argumentative texts. These decision statements were manually coded on 7 aspects of text quality. We compared 3 different types of ML algorithms (‘k-Nearest Neighbours’, ’decision tree’ and ‘support vector machines) on their accuracy of replicating the manual coding. The results are very promising: a ‘support vector machine’ algorithm results in accuracy measures ranging from .95 to .99. In this presentation we will discuss the opportunities and caveats of using machine learning in this context.

Pollitt, A. (2012). Comparative judgement for assessment. International Journal of Technology and Design Education, 22(2), 157-170.