Interpretable machine learning methods for predictions in systems biology from omics data

Sidak, David; Schwarzerová, Jana; Weckwerth, Wolfram; Waldherr, Steffen

doi:10.3389/fmolb.2022.926623

Interpretable machine learning methods for predictions in systems biology from omics data

Files

fmolb09926623.pdf (2.93 MB)

Date

2022-10-17

Authors

Sidak, David

Schwarzerová, Jana

Weckwerth, Wolfram

Waldherr, Steffen

Publisher

Frontiers

ORCID

0000-0003-2918-9313

Altmetrics

Abstract

Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.
Machine learning has become a powerful tool for systems biologists, from diagnosing cancer to optimizing kinetic models and predicting the state, growth dynamics, or type of a cell. Potential predictions from complex biological data sets obtained by “omics” experiments seem endless, but are often not the main objective of biological research. Often we want to understand the molecular mechanisms of a disease to develop new therapies, or we need to justify a crucial decision that is derived from a prediction. In order to gain such knowledge from data, machine learning models need to be extended. A recent trend to achieve this is to design “interpretable” models. However, the notions around interpretability are sometimes ambiguous, and a universal recipe for building well-interpretable models is missing. With this work, we want to familiarize systems biologists with the concept of model interpretability in machine learning. We consider data sets, data preparation, machine learning methods, and software tools relevant to omics research in systems biology. Finally, we try to answer the question: “What is interpretability?” We introduce views from the interpretable machine learning community and propose a scheme for categorizing studies on omics data. We then apply these tools to review and categorize recent studies where predictive machine learning models have been constructed from non-sequential omics data.

Keywords

multi-omics , interpretable machine learning , deep learning , explainable artificial intelligence , metabolomics , proteomics , transcriptomics , multi-omics , interpretable machine learning , deep learning , explainable artificial intelligence , metabolomics , proteomics , transcriptomics

Citation

Frontiers in Molecular Biosciences. 2022, vol. 9, issue October 2022, p. 1-28.
https://www.frontiersin.org/articles/10.3389/fmolb.2022.926623/full

Document type

Peer-reviewed

Document version

Published version

Language of document

en

DOI

10.3389/fmolb.2022.926623

URI

http://hdl.handle.net/11012/208577

Collections

Ústav biomedicínského inženýrství

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution 4.0 International

Citace PRO

Full item page

Interpretable machine learning methods for predictions in systems biology from omics data

Files

Date

Authors

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Altmetrics

Abstract

Description

Keywords

Citation

Document type

Document version

Date of access to the full text

Language of document

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Citace PRO