AbstractManual sleep stage annotation is a time-consuming but often essential step in the analysis of sleep data. To address this bottleneck several algorithms have been proposed that automate this process, reporting performance levels that are on par with manual annotation according to measures of inter-rater agreement. Here we first demonstrate that inter-rater agreement can provide a biased and imprecise measure of annotation quality. We therefore develop a principled framework for assessing performance against a consensus annotation derived from multiple experienced sleep researchers. We then construct a new sleep stage classifier that combines automated feature extraction using linear discriminant analysis with inference based on vigilance state-dependent contextual information using a hidden Markov model. This produces automated annotation accuracies that exceed expert performance on rodent electrophysiological data. Furthermore, our classifier is shown to be robust to errors in the training data, robust to experimental manipulations, and compatible with different recording configurations. Finally, we demonstrate that the classifier identifies both successful and failed attempts to transition between vigilance states, which may offer new insights into the occurrence of short awake periods between REM and NREM sleep. We call our classifier ‘Somnotate’ and make an implementation available to the neuroscience community.