Navržení booleovské sítě na základě dat genové exprese u nemodelových organismů

Loading...
Thumbnail Image

Date

Authors

Breda, Maximilian

Mark

E

Journal Title

Journal ISSN

Volume Title

Publisher

Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií

ORCID

Abstract

Boolean network inference has emerged as an important method for studying gene regulation processes in biological systems. This thesis develops a comprehensive pipeline for inferring genome-wide Boolean networks from RNA-Seq data in the non-model organism Clostridium beijerinckii NRRL B-598. The main aim of this research is to construct a complete regulatory network without the need for pre-existing biological data. The study aims to identify regulatory connections involved in solvent production through an assessment of gene expression patterns across different experimental conditions. This network is built in two stages: R-based preprocessing for fractional counting of multi-mapped reads and Python-based Boolean rule inference with decision tree classifiers. The Bioconductor tools were used for alignment processing, while scikit-learn and bespoke algorithms were employed to build the Boolean rules. The network analysis results demonstrated outstanding performance with successful inference for all 5530 genes, resulting in 17935 regulatory edges, 99.98% of which satisfied high-quality metrics. Notably, the network has biologically plausible features such as sparse connectivity (density = 0.00059) and a high proportion of activation links (97.2%) indicating coordinated regulatory mechanisms. Nonetheless, acknowledging some limitations of this study is essential. The binary discretization of continuous expression data, along with the analysis of just one experimental condition, can risk oversimplifying complex regulatory mechanisms. Moreover, the absence of strict experimental validation that defines research on non-model organisms obstructs biological confirmation of proposed correlations. This limitation highlights the necessity of future experimental validation to verify the computational predictions. This research lays the groundwork for a genome-wide Boolean regulatory network for C. beijerinckii and offers a scalable framework that can be used with other non-model organisms.
Boolean network inference has emerged as an important method for studying gene regulation processes in biological systems. This thesis develops a comprehensive pipeline for inferring genome-wide Boolean networks from RNA-Seq data in the non-model organism Clostridium beijerinckii NRRL B-598. The main aim of this research is to construct a complete regulatory network without the need for pre-existing biological data. The study aims to identify regulatory connections involved in solvent production through an assessment of gene expression patterns across different experimental conditions. This network is built in two stages: R-based preprocessing for fractional counting of multi-mapped reads and Python-based Boolean rule inference with decision tree classifiers. The Bioconductor tools were used for alignment processing, while scikit-learn and bespoke algorithms were employed to build the Boolean rules. The network analysis results demonstrated outstanding performance with successful inference for all 5530 genes, resulting in 17935 regulatory edges, 99.98% of which satisfied high-quality metrics. Notably, the network has biologically plausible features such as sparse connectivity (density = 0.00059) and a high proportion of activation links (97.2%) indicating coordinated regulatory mechanisms. Nonetheless, acknowledging some limitations of this study is essential. The binary discretization of continuous expression data, along with the analysis of just one experimental condition, can risk oversimplifying complex regulatory mechanisms. Moreover, the absence of strict experimental validation that defines research on non-model organisms obstructs biological confirmation of proposed correlations. This limitation highlights the necessity of future experimental validation to verify the computational predictions. This research lays the groundwork for a genome-wide Boolean regulatory network for C. beijerinckii and offers a scalable framework that can be used with other non-model organisms.

Description

Citation

BREDA, M. Navržení booleovské sítě na základě dat genové exprese u nemodelových organismů [online]. Brno: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií. 2025.

Document type

Document version

Date of access to the full text

Language of document

en

Study field

bez specializace

Comittee

doc. Ing. Radovan Jiřík, Ph.D. (předseda) Ing. Martin Mézl, Ph.D. (místopředseda) Ing. Oto Janoušek, Ph.D. (člen) Ing. Jiří Chmelík, Ph.D. (člen) Ing. Martin Králík (člen)

Date of acceptance

2025-08-29

Defence

Student presented the results of his master thesis and the committee members were acquainted with the reviews. Doc. Jiřík asked: Did you create the masters thesis in time pressure? Student defended the master thesis with reservations and answered the questions.

Result of defence

práce byla úspěšně obhájena

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By

Citace PRO