Metody extrakce informace z textových dokumentů

Sychra, Tomáš

Metody extrakce informace z textových dokumentů

Files

final-thesis.pdf (1.58 MB)

review_25003.html (1.44 KB)

Authors

Sychra, Tomáš

Advisor

Bartík, Vladimír

Referee

Burget, Radek

Mark

A

Publisher

Vysoké učení technické v Brně. Fakulta informačních technologií

Abstract

Získávání znalostí z textových dokumentů představuje podmnožinu obecného získávání dat - dataminingu. Textové dokumenty však mají vlastnosti odlišné od běžných databází. Tato práce obsahuje přehled metod použitelných pro dolování informací z textů. Nejpoužívanější dolovací úlohou je klasifikace. Popíši možné přístupy při klasifikování dokumentů. V závěru představím algoritmus Winnow, který by měl při klasifikaci dosahovat dobrých výsledků v porovnání s ostatními algoritmy. Součástí práce je i popis implementace algoritmu Winnow a přehled dosažených výsledků.
Knowledge discovery in text documents is part of data mining. However, text documents have different properties in comparison to regular databases. This project contains an overview of methods for knowledge discovery in text documents. The most frequently used task in this area is document classification. Various approaches for text classification will be described. Finally, I will present algorithm Winnow that should perform better than any other algorithm for classification. There is a description of Winnow implementation and an overview of experimental results.

Keywords

textové dokumenty , extrakce , extrakce informace , klasifikace , kategorizace , lineární klasifikace , Winnow , Balanced Winnow , Positive Winnow , text documents , information extraction , knowledge discovery , classification , categorization , linear classification , Winnow , Balanced Winnow , Positive Winnow

Citation

SYCHRA, T. Metody extrakce informace z textových dokumentů [online]. Brno: Vysoké učení technické v Brně. Fakulta informačních technologií. .

Language of document

cs

Study field

Informační systémy

Result of defence

práce byla úspěšně obhájena

URI

http://hdl.handle.net/11012/53244

Collections

2007

Citace PRO

Full item page

Metody extrakce informace z textových dokumentů

Files

Date

Authors

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Abstract

Description

Keywords

Citation

Document type

Document version

Date of access to the full text

Language of document

Study field

Comittee

Date of acceptance

Defence

Result of defence

DOI

URI

Collections

Endorsement

Review

Supplemented By

Referenced By

Citace PRO