Posilované učení pro hraní robotického fotbalu

Harag, Miroslav

Posilované učení pro hraní robotického fotbalu

but.committee	prof. Dr. Ing. Jan Černocký (předseda) doc. Ing. Lukáš Burget, Ph.D. (člen) doc. Ing. Vladimír Janoušek, Ph.D. (člen) Ing. Michal Hradiš, Ph.D. (člen) Ing. Jaroslav Rozman, Ph.D. (člen) Ing. František Grézl, Ph.D. (člen)	cs
but.defence	Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky oponenta a na další otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm A.	cs
but.jazyk	čeština (Czech)
but.program	Informační technologie a umělá inteligence	cs
but.result	práce byla úspěšně obhájena	cs
dc.contributor.advisor	Smrž, Pavel	cs
dc.contributor.author	Harag, Miroslav	cs
dc.contributor.referee	Fajčík, Martin	cs
dc.date.created	2025	cs
dc.description.abstract	Táto práca sa venuje posilňovanému učeniu a jeho aplikácii na vytvorenie agenta pre robotický futbal. Zameriava sa na málo preskúmaný fenomén – vzťahy medzi hodnotami akcií v tom istom stave. Štandardné algoritmy tieto hodnoty považujú za nezávislé, čo však nezohľadňuje realitu prostredí, kde rôzne akcie často vedú do podobných stavov. V práci je zavedený nový koncept konvergencie trajektórií, ktorý formálne popisuje podobnosť akcií na základe ich následných stavov. Na jeho základe je odvodený vzťah bočný odhad, umožňujúci rozšírenie získaných znalostí aj na nezvolené akcie. Tento prístup vedie k efektívnejšiemu využitiu skúseností, rýchlejšiemu učeniu a zníženiu výpočtovej náročnosti. Navrhnutá metóda Shift Tree Backup využíva tieto nové poznatky. Súčasťou návrhu je aj nový mechanizmus tvorby politiky nazvaný investičné prehľadávanie, ktorý ponúka alternatívny prístup k riadeniu rovnováhy medzi prieskumom a využívaním. Metóda bola experimentálne overená v komplexnom prostredí Google Research Football – Academy, kde v niektorých scenároch výrazne prekonala existujúce referenčné metódy ako PPO a IMPALA. Výsledky potvrdzujú potenciál navrhnutého prístupu a motivujú ďalší výskum v tejto oblasti.	cs
dc.description.abstract	This thesis focuses on reinforcement learning and its application to the development of an agent for robotic football. It addresses a rarely explored phenomenon – the relationships between the values of actions within the same state. Standard algorithms typically consider these values to be independent, which does not reflect the reality of environments where different actions often lead to similar states. The thesis introduces a novel concept called trajectory convergence, which formally describes the similarity between actions based on the states that follow them. Based on this concept, a relationship called lateral estimation is derived, allowing the extension of knowledge to actions that were not selected. This approach enables more efficient use of experience, faster learning, and reduced computational cost. The proposed method, Shift Tree Backup, incorporates these new insights. The design also includes a novel policy generation mechanism called investment-based exploration, which offers an alternative approach to balancing exploration and exploitation. The method was experimentally validated in the complex environment of the Google Research Football – Academy, where it significantly outperformed existing reference methods such as PPO and IMPALA in several scenarios. The results confirm the potential of the proposed approach and encourage further research in this area.	en
dc.description.mark	A	cs
dc.identifier.citation	HARAG, M. Posilované učení pro hraní robotického fotbalu [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. 2025.	cs
dc.identifier.other	161825	cs
dc.identifier.uri	http://hdl.handle.net/11012/254936
dc.language.iso	cs	cs
dc.publisher	Vysoké učení technické v Brně. Fakulta informačních technologií	cs
dc.rights	Standardní licenční smlouva - přístup k plnému textu bez omezení	cs
dc.subject	posilňované učenie	cs
dc.subject	Google Research Football	cs
dc.subject	konvergencia trajektórií	cs
dc.subject	Shift Tree Backup	cs
dc.subject	Tree Backup	cs
dc.subject	investičné prehľadávanie	cs
dc.subject	reinforcement learning	en
dc.subject	Google Research Football	en
dc.subject	trajectory convergence	en
dc.subject	Shift Tree Backup	en
dc.subject	Tree Backup	en
dc.subject	investment-based exploration	en
dc.title	Posilované učení pro hraní robotického fotbalu	cs
dc.title.alternative	Reinforcement Learning for RoboCup	en
dc.type	Text	cs
dc.type.driver	masterThesis	en
dc.type.evskp	diplomová práce	cs
dcterms.dateAccepted	2025-06-24	cs
dcterms.modified	2025-06-24-15:02:07	cs
eprints.affiliatedInstitution.faculty	Fakulta informačních technologií	cs
sync.item.dbid	161825	en
sync.item.dbtype	ZP	en
sync.item.insts	2025.08.27 02:04:22	en
sync.item.modts	2025.08.26 19:53:41	en
thesis.discipline	Strojové učení	cs
thesis.grantor	Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédií	cs
thesis.level	Inženýrský	cs
thesis.name	Ing.	cs

Files

Original bundle

Now showing 1 - 3 of 3

Name:: final-thesis.pdf
Size:: 1.61 MB
Format:: Adobe Portable Document Format
Description:: file final-thesis.pdf

Download

Name:: appendix-1.pdf
Size:: 369.23 KB
Format:: Adobe Portable Document Format
Description:: file appendix-1.pdf

Download

Name:: review_161825.html
Size:: 11.01 KB
Format:: Hypertext Markup Language
Description:: file review_161825.html

Download

Collections

2025