Artificial neural network detection of the Korean glide /w/:
Minimal spectral sampling combined with contextual predictors in spontaneous speech
Soonhyun Hong (Inha University)
Abstract
This study investigates acoustic and contextual predictors for identifying the Korean glide /w/ in vowel-only and /w/-vowel sequences in spontaneous speech, employing supervised machine learning methods. Artificial neural network classifiers were trained and evaluated using five-fold stratified cross-validation on spectral features (F1, F2, F3) sampled at different temporal points within vowel and /w/-vowel tokens. The results show that simple models using only F2 values sampled at vowel onset and 20% vowel duration, or at 20% and 50% of /w/-vowel duration, perform comparably to more complex models with full dynamic formant trajectories. However, classifiers relying exclusively on temporal spectral cues achieve limited detection accuracy (maximum F1-score of 0.651). Incorporating key contextual predictors including vowel identity, preceding consonant place and manner, and word-internal position considerably improves performance, enabling early-sampled models to achieve an F1-score of 0.81, outperforming later-sampled models. Feature importance analyses underscore the critical roles of spectral and contextual predictors, whereas prosodic and demographic variables (F0, gender, (w)V duration) contribute only minimally. The findings highlight that precise and efficient detection of the Korean glide /w/ can be effectively accomplished by integrating minimal spectral information with essential contextual features.
Keywords
/w/ detection, Seoul Corpus, vowel identity, formant transitions, neural network