The Classification of Documents in Malay and Indonesian Using the Naive Bayesian Method Uses Words and Phrases as a Training Set

Loading...
Thumbnail Image

Authors

Wijaya, Marvin Chandra

Advisor

Referee

Mark

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Automation and Computer Science, Brno University of Technology

ORCID

Altmetrics

Abstract

Malay Language and Indonesian Language are two closely related languages, sharing a lot in common in the meanings of words and grammar. Classifying the two languages automatically using a tool is a challenge because the two languages are very similar. The classification method that is widely used today is the Naive Bayesian method. This method needs to be implemented in a particular way to increase the level of classification accuracy. In this study, a new method was used, by using a training set in the form of words and phrases instead of just using a training set in the form of words only. With this method, the level of classification accuracy of the two languages is increased.

Description

Citation

Mendel. 2020 vol. 26, č. 2, s. 23-28. ISSN 1803-3814
https://mendel-journal.org/index.php/mendel/article/view/116

Document type

Peer-reviewed

Document version

Published version

Date of access to the full text

Language of document

en

Study field

Comittee

Date of acceptance

Defence

Result of defence

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license
Citace PRO