Aligning pre-trained models for spoken language translation

Sedláček, Šimon

Aligning pre-trained models for spoken language translation

but.committee	prof. Dr. Ing. Jan Černocký (předseda) doc. Ing. Lukáš Burget, Ph.D. (člen) doc. Mgr. Lukáš Holík, Ph.D. (člen) doc. RNDr. Pavel Smrž, Ph.D. (člen) doc. Ing. Vítězslav Beran, Ph.D. (člen) Ing. František Grézl, Ph.D. (člen)	cs
but.defence	Student nejprve prezentoval výsledky, kterých dosáhl v rámci své práce. Komise se poté seznámila s hodnocením vedoucího a posudkem oponenta práce. Student následně odpověděl na otázky přítomných. Komise se na základě posudku oponenta, hodnocení vedoucího, přednesené prezentace a odpovědí studenta na položené otázky rozhodla práci hodnotit stupněm A.	cs
but.jazyk	angličtina (English)
but.program	Informační technologie a umělá inteligence	cs
but.result	práce byla úspěšně obhájena	cs
dc.contributor.advisor	Kesiraju, Santosh	en
dc.contributor.author	Sedláček, Šimon	en
dc.contributor.referee	Beneš, Karel	en
dc.date.created	2024	cs
dc.description.abstract	Tato práce zkoumá nový end-to-end přístup k překladu mluveného jazyka (ST) využívající předtrénovaných modelů pro přepis řeči (ASR) a strojový překlad (MT), propojené malým spojovacím modulem (Q-Former, STE). Ten má za úkol překlenout mezeru mezi modalitami řeči a textu mapováním embedding reprezentací ASR enkodéru do latentního prostoru reprezentací MT modelu. Během trénování jsou zvolené ASR a MT model zmrazeny, laděny jsou pouze parametry spojovacího modulu. Trénování a evaluace jsou prováděny na datasetu How2, obsahujícím ST data z Angličtiny do Portugalštiny. V našich experimentech zjišťujeme, že většina sladěných systémů překonává referenční kaskádový ST systém, přičemž využívají stejné základní modely. Navíc, při zachování konstantní a ve srovnání malé (10M parametrů) velikosti spojovacího modulu, větší a silnější ASR a MT modely univerzálně zlepšují výsledky překladu. Zjišťujeme, že spojovací moduly mohou také sloužit jako doménové adaptéry pro zvolené základní systémy, kdy významně zlepšují výsledky překladu ve sladěném ST prostředí, a to i oproti holému MT výkonu daného MT modelu. Nakonec navrhujeme proceduru pro předtrénování spojovacího modulu s potenciálem snížit množství ST dat potřebných pro trénink obdobných sladěných systémů.	en
dc.description.abstract	In this work, we investigate a novel approach to end-to-end speech translation (ST) by leveraging pre-trained models for automatic speech recognition (ASR) and machine translation (MT) and connecting them with a small connector module (Q-Former, STE). The connector bridges the gap between the speech and text modalities, transforming the ASR encoder embeddings into the latent representation space of the MT encoder. During training, the foundation ASR and MT models are frozen, and only the connector parameters are tuned, optimizing for the ST objective. We train and evaluate our models on the How2 English to Portuguese ST dataset. In our experiments, aligned systems outperform our cascade ST baseline while utilizing the same foundation models. Additionally, while keeping the size of the connector module constant and small in comparison (10M parameters), increasing the size and capability of the ASR encoder and MT decoder universally improves translation results. We find that the connectors can also serve as domain adapters for the foundation models, significantly improving translation performance in the aligned ST setting, compared even to the base MT scenario. Lastly, we propose a pre-training procedure for the connector, with the potential for reducing the amount of ST data required for training similar aligned systems.	cs
dc.description.mark	A	cs
dc.identifier.citation	SEDLÁČEK, Š. Aligning pre-trained models for spoken language translation [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. 2024.	cs
dc.identifier.other	157031	cs
dc.identifier.uri	http://hdl.handle.net/11012/248577
dc.language.iso	en	cs
dc.publisher	Vysoké učení technické v Brně. Fakulta informačních technologií	cs
dc.rights	Standardní licenční smlouva - přístup k plnému textu bez omezení	cs
dc.subject	překlad mluveného jazyka	en
dc.subject	překlad řeči	en
dc.subject	sladění modelů	en
dc.subject	automatické rozpoznávání řeči	en
dc.subject	strojový překlad	en
dc.subject	transfer learning	en
dc.subject	transformery	en
dc.subject	Q-Former	en
dc.subject	doménová adaptace	en
dc.subject	spoken language translation	cs
dc.subject	speech translation	cs
dc.subject	model alignment	cs
dc.subject	automatic speech recognition	cs
dc.subject	machine translation	cs
dc.subject	transfer learning	cs
dc.subject	transformers	cs
dc.subject	Q-Former	cs
dc.subject	domain adaptation	cs
dc.title	Aligning pre-trained models for spoken language translation	en
dc.title.alternative	Aligning pre-trained models for spoken language translation	cs
dc.type	Text	cs
dc.type.driver	masterThesis	en
dc.type.evskp	diplomová práce	cs
dcterms.dateAccepted	2024-06-17	cs
dcterms.modified	2024-06-17-14:21:40	cs
eprints.affiliatedInstitution.faculty	Fakulta informačních technologií	cs
sync.item.dbid	157031	en
sync.item.dbtype	ZP	en
sync.item.insts	2025.03.26 15:38:06	en
sync.item.modts	2025.01.17 12:20:26	en
thesis.discipline	Strojové učení	cs
thesis.grantor	Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédií	cs
thesis.level	Inženýrský	cs
thesis.name	Ing.	cs

Files

Original bundle

Now showing 1 - 2 of 2

Name:: final-thesis.pdf
Size:: 2.26 MB
Format:: Adobe Portable Document Format
Description:: file final-thesis.pdf

Download

Name:: review_157031.html
Size:: 9.93 KB
Format:: Hypertext Markup Language
Description:: file review_157031.html

Download

Collections

2024