Enhanced metabolomic predictions using concept drift analysis: identification and correction of confounding factors

Loading...
Thumbnail Image

Authors

Schwarzerová, Jana
Olešová, Dominika
Šabatová, Kateřina
Kvasnička, Aleš
Koštoval, Aleš
Friedecký, David
Sekora, Jiří
Dluhá, Jitka
Provazník, Valentýna
Popelinsky, Lubos

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

Oxford Academic
Altmetrics

Abstract

Motivation The increasing use of big data and optimized prediction methods in metabolomics requires techniques aligned with biological assumptions to improve early symptom diagnosis. One major challenge in predictive data analysis is handling confounding factors—variables influencing predictions but not directly included in the analysis. Results Detecting and correcting confounding factors enhances prediction accuracy, reducing false negatives that contribute to diagnostic errors. This study reviews concept drift detection methods in metabolomic predictions and selects the most appropriate ones. We introduce a new implementation of concept drift analysis in predictive classifiers using metabolomics data. Known confounding factors were confirmed, validating our approach and aligning it with conventional methods. Additionally, we identified potential confounding factors that may influence biomarker analysis, which could introduce bias and impact model performance. Availability and implementation Based on biological assumptions supported by detected concept drift, these confounding factors were incorporated into correction of prediction algorithms to enhance their accuracy. The proposed methodology has been implemented in Semi-Automated Pipeline using Concept Drift Analysis for improving Metabolomic Predictions (SAPCDAMP), an open-source workflow available at https://github.com/JanaSchwarzerova/SAPCDAMP.
Motivation The increasing use of big data and optimized prediction methods in metabolomics requires techniques aligned with biological assumptions to improve early symptom diagnosis. One major challenge in predictive data analysis is handling confounding factors—variables influencing predictions but not directly included in the analysis. Results Detecting and correcting confounding factors enhances prediction accuracy, reducing false negatives that contribute to diagnostic errors. This study reviews concept drift detection methods in metabolomic predictions and selects the most appropriate ones. We introduce a new implementation of concept drift analysis in predictive classifiers using metabolomics data. Known confounding factors were confirmed, validating our approach and aligning it with conventional methods. Additionally, we identified potential confounding factors that may influence biomarker analysis, which could introduce bias and impact model performance. Availability and implementation Based on biological assumptions supported by detected concept drift, these confounding factors were incorporated into correction of prediction algorithms to enhance their accuracy. The proposed methodology has been implemented in Semi-Automated Pipeline using Concept Drift Analysis for improving Metabolomic Predictions (SAPCDAMP), an open-source workflow available at https://github.com/JanaSchwarzerova/SAPCDAMP.

Description

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International
Citace PRO