Možnosti neuronových sítí využívajících transformery pro zpracování medicínských obrazů

Tato diplomová práce se zabývá možnostmi využití neuronových sítí založených na architektuře transformerů pro zpracování medicínských obrazů. Hlavním cílem bylo porovnat výkonnost modelů ResNet18 a Vision Transformer (ViT-B-16) na dvou odlišných datasetech, konkrétně Intel Image Classification a ChestXray. Modely byly optimalizovány pomocí frameworku Optuna a nakonec byl každý z nich trénován desetkrát pro zajištění robustnosti výsledků. ty ukazují, že modely využívající Vision Transformery dosahují vyšších hodnot váženého F1 skóre ve srovnání s modely ResNet18. Konkrétně dosáhl model ViT-B-16 nejvyššího F1 skóre 0,939 na datasetu Intel Image a 0,907 na datasetu ChestXray, zatímco ResNet18 dosáhl hodnot 0,883, respektivě 0,885. Statistické analýzy pomocí Wilcoxonova testu potvrdily, že rozdíly ve výkonnosti mezi modely jsou statisticky signifikantní, což naznačuje výhodu použití Vision Transformerů pro tyto úlohy. Uveden je také rozbor výpočetní náročnosti, která je pro ViT mnohem vyšší.
This thesis explores the potential of neural networks based on transformer architecture for medical image processing. The main objective was to compare the performance of ResNet18 and Vision Transformer (ViT-B-16) models on two distinct datasets, specifically Intel Image Classification and ChestXray. The models were optimized using the Optuna framework and subsequently trained ten times each to ensure robustness of the results. These results indicate that models utilizing Vision Transformers achieve higher weighted F1 scores compared to ResNet18 models. Specifically, the ViT-B-16 model achieved the highest F1 score of 0.939 on the Intel Image dataset and 0.907 on the ChestXray dataset, whereas ResNet18 achieved scores of 0.883 and 0.885, respectively. Statistical analyses using the Wilcoxon test confirmed that the differences in performance between the models are statistically significant, suggesting an advantage of using Vision Transformers for these tasks. An analysis of computational complexity is also provided, highlighting that ViT requires significantly higher computational resources.

Keywords

strojové učení , transformer , neuronová síť , zpracování medicínských obrazů , self-attention , vision transformer , machine learning , transformer , neural network , medical image processing , self-attention , vision transformer

Citation

VALÍK, T. Možnosti neuronových sítí využívajících transformery pro zpracování medicínských obrazů [online]. Brno: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií. 2024.

Language of document

cs

Study field

bez specializace

Comittee

prof. Ing. Martin Augustynek, Ph.D. (předseda) Ing. Martin Mézl, Ph.D. (místopředseda) Ing. Vratislav Harabiš, Ph.D. (člen) Ing. Filip Plešinger, Ph.D. (člen) Ing. Tomáš Vičar, Ph.D. (člen) prof. Ing. Valentýna Provazník, Ph.D. (člen)

Date of acceptance

2024-06-11

Defence

Student prezentoval výsledky své práce a komise byla seznámena s posudky. Ing. Vičar položil otázku: Proč používáte ResNet18? Student obhájil diplomovou práci a odpověděl na otázky členů komise a oponenta.

Result of defence

práce byla úspěšně obhájena

URI

http://hdl.handle.net/11012/247172

Collections

2024

Citace PRO

Full item page

Možnosti neuronových sítí využívajících transformery pro zpracování medicínských obrazů

Files

Date

Authors

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Abstract

Description

Keywords

Citation

Document type

Document version

Date of access to the full text

Language of document

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Citace PRO