In the wake of the COVID-19 pandemic, mass coronavirus testing has proven essential to governments in monitoring the spread of the disease, isolating infected individuals, and effectively “flattening the curve” of infections over time . However, this oropharyngeal swab test is physically invasive and must be performed by a trained clinician. Ideally, testing would be performed noninvasively at no cost and administered at the homes of potential patients to minimize contamination risk.
The World Health Organization (WHO) has reported that 67.7% of COVID-19 patients exhibit a “dry cough”, meaning that no mucus is produced, unlike the typical “wet cough” that occurs during a cold or allergies . Dry coughs can be distinguished from wet coughs by the sound they produce, which raises the question of whether the analysis of the cough sounds can give some insights about COVID-19. Such cough sounds analysis has proven successful in diagnosing respiratory conditions like pertussis , asthma, and pneumonia .
At the Embedded Systems Laboratory (ESL) at EPFL, we propose to leverage signal processing, pervasive computing, and Machine Learning (ML) to develop an Android application and website to automatically screen COVID-19 from the comfort of people’s homes. Test subjects will be able to simply download a mobile application, enter their symptoms, record an audio clip of their cough, and upload the data anonymously to our servers. We will then use audio signal processing and machine learning techniques to evaluate if there is some room for automatic or assisted COVID-19 screening.
The objective of this website is to collect a large number of sample recordings from patients that are known to have COVID-19. That’s why we ask to everybody that can provide us with a few seconds of cough sound to collaborate. It’s so easy!
All the data collected by this website is anonymous, and it is stored on a private server at the premises of the École Polytechnique Fédérale de Lausanne (EPFL), Switzerland. Each recording is associated with the timestamp in which it was received, and the geolocalization information if the user grants the corresponding permissions.
The data will be exclusively used for research purposes, and under no circumstances will they be sold or shared with third parties. Eventually, the dataset will be made publicly available to the research community.
Open Database Access
All of the cough recordings we have received until December 1, 2020 have been published in a Zenodo dataset that is available to the public and can be used for training and validating ML models for COVID-19 detection from cough sounds. A private testing dataset has been excluded from publication, so please contact us for more information about testing your models.
Open Code Access
We provide our cough preprocessing code, including a cough detection ML model in our c4science repository, which is also available to the public.
To learn more about how to use the COUGHVID dataset and preprocessing code, please consult our publication on Nature Scientific Data.
For any questions or suggestions, please send us an email to: email@example.com