Stress-testing the Dawid–Skene model: Analysis of repeated categorical ratings

A common task in many disciplines is the classification of items into one of several categories, often by trained experts, or sometimes non-experts. For example, making judgements from medical images or providing a disease diagnosis from a series of medical tests. Often such judgements are hard to make and are error prone. This leads to judgements being repeated, either by the same or different experts, to lend extra credence to the judgement and any decisions that are subsequently made.

A popular approach for analysing such data is using the Dawid–Skene model, which is presented in the Stan manual. We explored this model and some of its variants to answer a few practical questions of interest. These include:

1. How does the model compare to naive approaches such as taking the ‘majority vote’ across multiple ratings? Surprisingly, not as well as you might expect. We explain this through comparisons with related, simpler, models to show where the strengths and weaknesses are.

2. How do we set a good prior? We present an alternative method to the one in the Stan manual. Our approach provides more flexibility and interpretability in the context of a real problem, and we show that we can usually expect it to perform better in practice.

3. Is using optimisation mode enough? It is certainly much faster than MCMC. We show what it does well at, and what is lost.

4. How does performance vary based on the number of ratings per item? The more ratings the better, of course, but we show that having at least three per item gives substantially higher accuracy irrespective of the method used.

Our results help to inform good analyses of repeated categorical rating data, as well as the design of studies using categorical ratings.

Documentation: https://github.com/jeffreypullin/rater/

Presenter biography:
Damjan Vukcevic

Damjan Vukcevic is a Senior Lecturer and Group Leader within Melbourne Integrative Genomics at the University of Melbourne. He is also the President of the Victorian Branch of the Statistical Society of Australia. He obtained his doctorate at the University of Oxford and was one of the authors of the first large-scale genome-wide association study, published in Nature in 2007. While specialising in statistical genomics, Damjan maintains a broad interest in the applications of statistics. He has worked on projects in many other areas, including respiratory health, life insurance, medical imaging, astrophysics, and election auditing.