Comparison of CNN-Learned vs. Handcrafted Features for Detection of Parkinson’s Disease Dysgraphia in a Multilingual Dataset

Loading...
Thumbnail Image

Authors

Galáž, Zoltán
Drotár, Peter
Mekyska, Jiří
Gazda, Matej
Mucha, Ján
Zvončák, Vojtěch
Smékal, Zdeněk
Faúndez Zanuy, Marcos
Castrillon, Reinel
Orozco-Arroyave, Juan Rafael

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

Frontiers
Altmetrics

Abstract

Parkinson’s disease dysgraphia (PDYS), one of the earliest signs of Parkinson’s disease (PD), has been researched as a promising biomarker of PD and as the target of a noninvasive and inexpensive approach to monitoring the progress of the disease. However, although several approaches to supportive PDYS diagnosis have been proposed (mainly based on handcrafted features (HF) extracted from online handwriting or the utilization of deep neural networks), it remains unclear which approach provides the highest discrimination power and how these approaches can be transferred between different datasets and languages. This study aims to compare classification performance based on two types of features: features automatically extracted by a pretrained convolutional neural network (CNN) and HF designed by human experts. Both approaches are evaluated on a multilingual dataset collected from 143 PD patients and 151 healthy controls in the Czech Republic, United States, Colombia, and Hungary. The subjects performed the spiral drawing task (SDT; a language-independent task) and the sentence writing task (SWT; a language-dependent task). Models based on logistic regression and gradient boosting were trained in several scenarios, specifically single language (SL), leave one language out (LOLO), and all languages combined (ALC). We found that the HF slightly outperformed the CNN-extracted features in all considered evaluation scenarios for the SWT. In detail, the following balanced accuracy (BACC) scores were achieved: SL—0.65 (HF), 0.58 (CNN); LOLO—0.65 (HF), 0.57 (CNN); and ALC—0.69 (HF), 0.66 (CNN). However, in the case of the SDT, features extracted by a CNN provided competitive results: SL—0.66 (HF), 0.62 (CNN); LOLO—0.56 (HF), 0.54 (CNN); and ALC—0.60 (HF), 0.60 (CNN). In summary, regarding the SWT, the HF outperformed the CNN-extracted features over 6%(mean BACC of 0.66 for HF, and 0.60 for CNN). In the case of the SDT, both feature sets provided almost identical classification performance (mean BACC of 0.60 for HF, and 0.58 for CNN).
Parkinson’s disease dysgraphia (PDYS), one of the earliest signs of Parkinson’s disease (PD), has been researched as a promising biomarker of PD and as the target of a noninvasive and inexpensive approach to monitoring the progress of the disease. However, although several approaches to supportive PDYS diagnosis have been proposed (mainly based on handcrafted features (HF) extracted from online handwriting or the utilization of deep neural networks), it remains unclear which approach provides the highest discrimination power and how these approaches can be transferred between different datasets and languages. This study aims to compare classification performance based on two types of features: features automatically extracted by a pretrained convolutional neural network (CNN) and HF designed by human experts. Both approaches are evaluated on a multilingual dataset collected from 143 PD patients and 151 healthy controls in the Czech Republic, United States, Colombia, and Hungary. The subjects performed the spiral drawing task (SDT; a language-independent task) and the sentence writing task (SWT; a language-dependent task). Models based on logistic regression and gradient boosting were trained in several scenarios, specifically single language (SL), leave one language out (LOLO), and all languages combined (ALC). We found that the HF slightly outperformed the CNN-extracted features in all considered evaluation scenarios for the SWT. In detail, the following balanced accuracy (BACC) scores were achieved: SL—0.65 (HF), 0.58 (CNN); LOLO—0.65 (HF), 0.57 (CNN); and ALC—0.69 (HF), 0.66 (CNN). However, in the case of the SDT, features extracted by a CNN provided competitive results: SL—0.66 (HF), 0.62 (CNN); LOLO—0.56 (HF), 0.54 (CNN); and ALC—0.60 (HF), 0.60 (CNN). In summary, regarding the SWT, the HF outperformed the CNN-extracted features over 6%(mean BACC of 0.66 for HF, and 0.60 for CNN). In the case of the SDT, both feature sets provided almost identical classification performance (mean BACC of 0.60 for HF, and 0.58 for CNN).

Description

Citation

Frontiers in Neuroinformatics. 2022, vol. 16, issue 1, p. 1-18.
https://www.frontiersin.org/articles/10.3389/fninf.2022.877139/full

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International
Citace PRO