Rozpoznávácí sítě založené na konečných stavových převodnících pro dopředné a zpětné dekódování v rozpoznávání řeči

Hannemann, Mirko

Rozpoznávácí sítě založené na konečných stavových převodnících pro dopředné a zpětné dekódování v rozpoznávání řeči

but.jazyk	angličtina (English)
but.program	Výpočetní technika a informatika	cs
but.result	práce byla úspěšně obhájena	cs
dc.contributor.advisor	Burget, Lukáš	en
dc.contributor.author	Hannemann, Mirko	en
dc.contributor.referee	AD, Ralf Schlüter,	en
dc.contributor.referee	Novák,, Miroslav	en
dc.date.created		cs
dc.description.abstract	Pomocí matematického formalismu váhovaných konečných stavových převodníků (weighted finite state transducers WFST) může být formulována řada úloh včetně automatického rozpoznávání řeči (automatic speech recognition ASR). Dnešní ASR systémy široce využívají složených pravděpodobnostních modelů nazývaných dekódovací grafy nebo rozpoznávací sítě. Ty jsou z jednotlivých komponent konstruovány pomocí WFST operací, např. kompozice. Každá komponenta je zde zdrojem znalostí a omezuje vyhledávání nejlepší cesty ve složeném grafu v operaci zvané dekódování. Využití koherentního teoretického rámce garantuje, že výsledná struktura bude optimální podle definovaného kritéria. WFST mohou být v rámci daného polookruhu (semi-ring) optimalizovány pomocí determinizace a minimalizace. Aplikací těchto algoritmů získáme optimální strukturu pro prohledávání, optimální distribuce vah je pak získána aplikací "weight pushing" algoritmu. Cílem této práce je zdokonalit postupy a algoritmy pro konstrukci optimálních rozpoznávacích sítí. Zavádíme alternativní weight pushing algoritmus, který je vhodný pro důležitou třídu modelů -- převodníky jazykového modelu (language model transducers) a obecně pro všechny cyklické WFST a WFST se záložními (back-off) přechody. Představujeme také způsob konstrukce rozpoznávací sítě vhodné pro dekódování zpětně v čase, které prokazatelně produkuje ty samé pravděpodobnosti jako dopředná síť. K tomuto účelu jsme vyvinuli algoritmus pro exaktní reverzi back-off jazykových modelů a převodníků, které je reprezentují. Pomocí zpětných rozpoznávacích sítí optimalizujeme dekódování: ve statickém dekodéru je využíváme pro dvoustupňové dekódování (dopředné a zpětné vyhledávání). Tento přístup --- "sledovací" dekódování (tracked decoding) --- umožnuje zahrnout výsledky vyhledávání z prvního stupně do druhého stupně tak, že se sledují hypotézy obsažené v rozpoznávacím grafu (lattice) prvního stupně. Výsledkem je podstatné zrychlení dekódování, protože tato technika umožnuje prohledávat s variabilním prohledávacím paprskem (search beam) -- ten je povětšinou mnohem užší než u základního přístupu. Ukazujeme rovněž, že uvedenou techniku je možné využít v dynamickém dekodéru tím, že postupně zjemňujeme rozpoznávání. To navíc vede i k částečné paralelizaci dekódování.	en
dc.description.abstract	Many tasks can be formulated in the mathematical framework of weighted finite state transducers (WFST). This is also the case for automatic speech recognition (ASR). Nowadays, ASR makes extensive use of composed probabilistic models -- called decoding graphs or recognition networks. They are constructed from the individual components via WFST operations like composition. Each component is a probabilistic knowledge source that constrains the search for the best path through the composed graph -- called decoding. The usage of a coherent framework guarantees, that the resulting automata will be optimal in a well-defined sense. WFSTs can be optimized with the help of determinization and minimization in a given semi-ring. The application of these algorithms results in the optimal structure for search and the optimal distribution of weights is achieved by applying a weight pushing algorithm. The goal of this thesis is to further develop the recipes and algorithms for the construction of optimal recognition networks. We introduce an alternative weight pushing algorithm, that is suitable for an important class of models -- language model transducers, or more generally cyclic WFSTs and WFSTs with failure (back-off) transitions. We also present a recipe to construct recognition networks, which are suitable for decoding backwards in time, and which, at the same time, are guaranteed to give exactly the same probabilities as the forward recognition network. For that purpose, we develop an algorithm for exact reversal of back-off language models and their corresponding language model transducers. We apply these backward recognition networks in an optimization technique: In a static network decoder, we use it for a two-pass decoding setup (forward search and backward search). This approach is called tracked decoding and allows to incorporate the first pass decoding into the second pass decoding by tracking hypotheses from the first pass lattice. This technique results in significant speed-ups, since it allows to decode with a variable beam width, which is most of the time much smaller than the baseline beam. We also show that it is possible to apply the algorithms in a dynamic network decoder by using the incrementally refining recognition setup. This additionally leads to a partial parallelization of the decoding.	cs
dc.description.mark	P	cs
dc.identifier.citation	HANNEMANN, M. Rozpoznávácí sítě založené na konečných stavových převodnících pro dopředné a zpětné dekódování v rozpoznávání řeči [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. .	cs
dc.identifier.other	128165	cs
dc.identifier.uri	http://hdl.handle.net/11012/187246
dc.language.iso	en	cs
dc.publisher	Vysoké učení technické v Brně. Fakulta informačních technologií	cs
dc.rights	Standardní licenční smlouva - přístup k plnému textu bez omezení	cs
dc.subject	Automatické rozpoznávání řeči	en
dc.subject	dekodování řeči	en
dc.subject	rozpoznávací sítě	en
dc.subject	váhované konečné stavové automaty	en
dc.subject	jazykové modely	en
dc.subject	Automatic speech recognition	cs
dc.subject	LVCSR decoding	cs
dc.subject	recognition networks	cs
dc.subject	weighted finite state transducers	cs
dc.subject	N-gram language models	cs
dc.subject	weight pushing	cs
dc.title	Rozpoznávácí sítě založené na konečných stavových převodnících pro dopředné a zpětné dekódování v rozpoznávání řeči	en
dc.title.alternative	Finite-state based recognition networks for forward-backward speech decoding	cs
dc.type	Text	cs
dc.type.driver	doctoralThesis	en
dc.type.evskp	dizertační práce	cs
dcterms.modified	2020-05-10-17:46:43	cs
eprints.affiliatedInstitution.faculty	Fakulta informačních technologií	cs
sync.item.dbid	128165	en
sync.item.dbtype	ZP	en
sync.item.insts	2025.03.27 12:12:17	en
sync.item.modts	2025.01.15 19:18:01	en
thesis.discipline	Výpočetní technika a informatika	cs
thesis.grantor	Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédií	cs
thesis.level	Doktorský	cs
thesis.name	Ph.D.	cs