Extrakce informací z biomedicínských textů

Knoth, Petr

Extrakce informací z biomedicínských textů

but.jazyk	čeština (Czech)
but.program	Informační technologie	cs
but.result	práce byla úspěšně obhájena	cs
dc.contributor.advisor	Smrž, Pavel	cs
dc.contributor.author	Knoth, Petr	cs
dc.contributor.referee	Burget, Radek	cs
dc.date.created		cs
dc.description.abstract	V poslední době bylo vynaloženo velké úsilí k tomu, aby byly biomedicínské znalosti, typicky uložené v podobě vědeckých článků, snadněji přístupné a bylo možné je efektivně sdílet. Ve skutečnosti ale nestrukturovaná podstata těchto textů způsobuje velké obtíže při použití technik pro získávání a vyvozování znalostí. Anotování entit nesoucích jistou sémantickou informaci v textu je prvním krokem k vytvoření znalosti analyzovatelné počítačem. V této práci nejdříve studujeme metody pro automatickou extrakci informací z textů přirozeného jazyka. Dále zhodnotíme hlavní výhody a nevýhody současných systémů pro extrakci informací a na základě těchto znalostí se rozhodneme přijmout přístup strojového učení pro automatické získávání exktrakčních vzorů při našich experimentech. Bohužel, techniky strojového učení často vyžadují obrovské množství trénovacích dat, která může být velmi pracné získat. Abychom dokázali čelit tomuto nepříjemnému problému, prozkoumáme koncept tzv. bootstrapping techniky. Nakonec ukážeme, že během našich experimentů metody strojového učení pracovaly dostatečně dobře a dokonce podstatně lépe než základní metody. Navíc v úloze využívající techniky bootstrapping se podařilo významně snížit množství dat potřebných pro trénování extrakčního systému.	cs
dc.description.abstract	Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable. In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.	en
dc.description.mark	A	cs
dc.identifier.citation	KNOTH, P. Extrakce informací z biomedicínských textů [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. .	cs
dc.identifier.other	25149	cs
dc.identifier.uri	http://hdl.handle.net/11012/52673
dc.language.iso	cs	cs
dc.publisher	Vysoké učení technické v Brně. Fakulta informačních technologií	cs
dc.rights	Standardní licenční smlouva - přístup k plnému textu bez omezení	cs
dc.subject	extrakce informací	cs
dc.subject	strojové učení	cs
dc.subject	zpracování přirozeného jazyka	cs
dc.subject	information extraction	en
dc.subject	machine learning	en
dc.subject	natural language processing	en
dc.title	Extrakce informací z biomedicínských textů	cs
dc.title.alternative	Information Extraction from Biomedical Texts	en
dc.type	Text	cs
dc.type.driver	masterThesis	en
dc.type.evskp	diplomová práce	cs
dcterms.modified	2020-05-09-23:40:33	cs
eprints.affiliatedInstitution.faculty	Fakulta informačních technologií	cs
sync.item.dbid	25149	en
sync.item.dbtype	ZP	en
sync.item.insts	2025.03.26 15:03:01	en
sync.item.modts	2025.01.15 19:34:13	en
thesis.discipline	Inteligentní systémy	cs
thesis.grantor	Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédií	cs
thesis.level	Inženýrský	cs
thesis.name	Ing.	cs

Files

Original bundle

Now showing 1 - 2 of 2

Name:: final-thesis.pdf
Size:: 672.33 KB
Format:: Adobe Portable Document Format
Description:: file final-thesis.pdf

Download

Name:: review_25149.html
Size:: 1.43 KB
Format:: Hypertext Markup Language
Description:: file review_25149.html

Download

Collections

2007