음성음운형태론연구 29집 3호 박선우

2024.01.31 09:05

홍보이사_2 조회 수:108

The purpose of this study is to test models that automatically classify Korean nouns

into native Korean, Sino-Korean, and loanwords by applying a machine learning

model, naïve Bayes classification. In this study, 500 native Korean words, Sino-

Korean words, and loanwords were collected, and after romanizing and decomposing

them into bigram and trigram lists, the bigrams and trigrams were entered into the

naïve Bayes classifier. We tested models with and without syllable boundaries, and

found that both the bigram and trigram models were over 80% accurate. Contrary to

the expectation that the performance of the models would improve as more

information about Korean phonotactics was included in the training and validation

data, the difference in performance between the bigram and trigram models was not

significant. The model that included syllable boundaries in the phoneme sequence

information had slightly higher accuracy than the model without syllable boundary

information. When comparing the classification results of all five models, the

accuracy of the bigram model with syllable boundaries was 83.55%, which was the

best. For now, we have modified the model to consider only phoneme sequence

information and syllable boundaries, but it is expected that the accuracy of the model

can be improved by training the model while excluding bigrams and trigrams, which

occur in similar proportions in all categories, and by increasing the size of the data.

Keywords

phonotactics, native Korean, Sino-Korean, loanword, machine learning, Naïve Bayes classification, bigram model, trigram model

첫 페이지 25 26 27 28 29 30 31 32 33 34 35 끝 페이지

댓글 0