2022.01.10 10:53
Cho, Hyesun. 2021. Predicting the gender of Korean personal names using fastText. Studies in Phonetics, Phonology and Morphology 27.3. 483-500. Male and female names tend to have distinct phonotactic characteristics in many languages. This paper explores the use of fastText, a neural-network text-classifier using sub-word information, in predicting the gender of Korean personal names, and compares the results with the results from a maximum-entropy model of phonotactics (Hayes and Wilson 2008). In this study, fastText is trained with training data consisting of 6400 Korean personal names, labeled with male and female. The model is tested with testing data of 35 Korean names. The fastText results positively correlated with Korean speakers’ ratings on the gender of the names. It outperformed the maximum-entropy model in terms of correlation with human ratings and accuracy of the labels. Yet, while the maximum-entropy model has OT-style constraints allowing generative linguists to interpret the results, fastText does not offer such interpretability. An error analysis is presented for the names where the models made incorrect predictions, using OT-style constraints.
Keywords: name, gender, phonotactics, maximum-entropy, neural network, fastText, sound symbolism
[pdf]