A step towards better sleep science: a large sleep-wake scoring benchmark

For many years, sleep-wake scoring algorithms were created and validated with small and private datasets with no more than one hundred participants. In this paper, we devised the largest dataset up to date for the sleep-wake classification problem and analyzed the performance of popular traditional algorithms as well as state-of-the-art machine learning techniques to tackle this problem. By making this dataset public, researchers can use our data and results as a benchmark to develop newer algorithms.

We spend a substantial part of our lives sleeping and among many different types of activities sleep is one of the keys for our overall mental and physical well-being, especially for those living with a chronic health condition. One of the challenges for humanity to tackle in the 21st century is the spread of diseases linked to our busy and stressful lifestyle.

Technology can be a human ally in this fight. Although nowadays, we have a plethora of gadgets to monitor sleep and physical activity, we are facing many technical challenges for the analysis of sleep data, including the harmonization on how sleep data can be analyzed. That is the main motivation behind our research.

It is well-studied (for example, in this paper) that short and poor sleep contributes to chronic diseases such as obesity, insulin resistance, and diabetes. Naturally,  in order to reach such conclusions, researchers need to conduct studies monitoring and measuring people's sleep quality.

The most well-known method to study sleep is called polysomnography (PSG). Although PSG provides accurate and detailed information, a person that undergoes PSG has to sleep the whole night with a large number of wires and sensors attached to them as depicted below.

PSG provides, among other results, information whether a person is sleeping or awake in a given window of time (typically 15 or 30 seconds). This information is important to diagnose sleep disorders and to measure sleep quality. However, this can be obtained instead easily from wearables, such as smartwatches and actigraphy devices. Wearables are an attractive alternative to PSG because apart from being cheaper and unobtrusive, they allow doctors and researchers to study people's daily activity data for an extended period of time compared to studies made in sleep labs that only monitor one single night.

Wearables, however, need to be "calibrated" correctly, i.e., they need an algorithm to match the activity information captured by them to the sleep-wake states measured by PSG.

Many algorithms were developed in the past decades to determine whether a person is sleeping or not in a given time interval considering only the activity measured by the device. However, an existing challenge for actigraphy studies is comparing the performance of different algorithms due to the lack of standardized datasets. Our paper provides such standardized large dataset by exploring the data created in the MESA study. Such dataset aims to make the research in sleep science more accessible to researchers, as it fosters the creation and validation of new algorithms. The data from MESA can be downloaded, upon approval, at sleepdata.org Website, and after a preprocessing step with our code, researchers are can readily use and improve state-of-the-art algorithms for this task.

P.S. The publication of this article was funded by the Qatar National Library