K-Nearest Neighbors as a Transparent Baseline for Automated EEG Sleep Staging

Authors

  • Ahmet Sertol Köksal Yozgat Bozok University

Keywords:

Distance metric, EEg, KNN, Sleep staging

Abstract

Sleep staging plays a key role in evaluating sleep health, but manually annotating polysomnography data is both time-consuming and prone to inconsistencies. This study presents the first comprehensive baseline evaluation of the K-Nearest Neighbors algorithm for EEG-based sleep staging, using a nested subject-wise cross-validation approach. We assessed 24 configurations combining six data scalers and four distance metrics on the ISRUC dataset. Overall, KNN delivered stable performance, with macro-F1 scores between 0.59 and 0.62 and Cohen’s κ ranging from 0.55 to 0.57. Among scalers, the Normalizer consistently performed the worst (macro-F1≈0.52), while Power transform, Standard, and Quantile scalers produced more reliable outcomes. The choice of distance metric had a relatively minor impact, but Euclidean distance offered the best trade-off—slightly improving accuracy while delivering runtimes up to five times faster than Cosine. Hyperparameter tuning consistently favored k ≈ 30 with distance weighting, indicating that extensive parameter searches may not be necessary. At the individual class level, Wake (F1≈0.79) and N3 (F1≈0.82) stages were identified with high accuracy, whereas N2 (F1≈0.68) was moderately accurate. REM (F1≈0.56) and particularly N1 (F1≈0.27) remained difficult to classify, though some setups improved performance by up to 8%. In summary, while KNN does not match deep learning in raw accuracy, it provides valuable benefits in terms of transparency, reproducibility, and interpretability. We recommend using Euclidean distance, k≈30 with distance weighting, and avoiding the Normalizer as a practical and interpretable baseline for future EEG-based sleep analysis.

Published

31.12.2025