slider01 slider02 slider03

DOI: http://dx.doi.org/10.17959/sppm.2023.29.3.329

PDF: 본문파일


음소배열정보 기반 한국어 고유어, 한자어, 차용어의 머신러닝 분류

박선우 (계명대학교)

Abstract

The purpose of this study is to test models that automatically classify Korean nouns 
into native Korean, Sino-Korean, and loanwords by applying a machine learning 
model, naïve Bayes classification. In this study, 500 native Korean words, Sino- 
Korean words, and loanwords were collected, and after romanizing and decomposing 
them into bigram and trigram lists, the bigrams and trigrams were entered into the 
naïve Bayes classifier. We tested models with and without syllable boundaries, and 
found that both the bigram and trigram models were over 80% accurate. Contrary to 
the  expectation  that  the  performance  of  the  models  would  improve  as  more 
information about Korean phonotactics was included in the training and validation 
data, the difference in performance between the bigram and trigram models was not 
significant. The model that included syllable boundaries in the phoneme sequence 
information had slightly higher accuracy than the model without syllable boundary 
information.  When  comparing  the  classification  results  of  all  five  models,  the 
accuracy of the bigram model with syllable boundaries was 83.55%, which was the 
best. For now, we have modified the model to consider only phoneme sequence 
information and syllable boundaries, but it is expected that the accuracy of the model 
can be improved by training the model while excluding bigrams and trigrams, which 
occur in similar proportions in all categories, and by increasing the size of the data. 

Keywords
phonotactics, native Korean, Sino-Korean, loanword, machine learning, Naïve Bayes classification, bigram model, trigram model 
번호 제목 글쓴이 날짜 조회 수
공지 [음성음운형태론연구] 온라인 논문 투고 안내 (2023년 1월 14일 수정) Manager 2016.09.02 32261
공지 [음성음운형태론연구] 논문 투고시 유의사항 (2023년 1월 14일 수정) Manager 2013.04.27 40807
670 음성음운형태론연구 29집 1호 Hyebae Yoo 홍보이사_2 2023.05.28 320
669 음성음운형태론연구 29집 1호 Miyeon Ahn 홍보이사_2 2023.05.28 294
668 음성음운형태론연구 29집 1호 Jae-Hyun Sung, Tae-Jin Yoon, Soohyun Kwon, Gwanhi Yun 홍보이사_2 2023.05.28 286
667 음성음운형태론연구 29집 1호 박선우 홍보이사_2 2023.05.28 237
666 음성음운형태론연구 28집 3호 목록 홍보이사_2 2023.01.14 367
665 음성음운형태론연구 28집 3호 Hong, Soonhyun 홍보이사_2 2023.01.14 292
664 음성음운형태론연구 28집 3호 정인기 홍보이사_2 2023.01.14 305
663 음성음운형태론연구 28집 3호 Lee, Goun 홍보이사_2 2023.01.14 300
662 음성음운형태론연구 28집 3호 Kim, Jong-mi and U-ri Go 홍보이사_2 2023.01.14 277
661 음성음운형태론연구 28집 3호 Kim, Jonny Jungyun, Amy J. Schafer and Katie Drager 홍보이사_2 2023.01.14 304
660 음성음운형태론연구 28집 2호 목록 홍보이사_2 2022.09.13 509
659 음성음운형태론연구 28집 2호 Hwang, Young 홍보이사_2 2022.09.13 392
658 음성음운형태론연구 28집 2호 Jeong, Sunwoo 홍보이사_2 2022.09.13 372
657 음성음운형태론연구 28집 2호 이주희 홍보이사_2 2022.09.13 415
656 음성음운형태론연구 28집 2호 Lee, Minkyung 홍보이사_2 2022.09.13 365
655 음성음운형태론연구 28집 2호 Park, Hyunsu and Joo-Kyeong Lee 홍보이사_2 2022.09.13 338
654 음성음운형태론연구 28집 1호 목록 홍보이사 2022.05.14 607
653 음성음운형태론연구 28집 1호 Hwangbo, Hyun Jin and Youngju Choi. 홍보이사 2022.05.14 407
652 음성음운형태론연구 28집 1호 Hong, Soonhyun 홍보이사 2022.05.14 397
651 음성음운형태론연구 28집 1호 Hong, Sung-Hoon 홍보이사 2022.05.14 379