OVERCOMING OVERFITTING IN MONKEY VOCALIZATION CLASSIFICATION: USING LSTM AND LOGISTIC REGRESSION

Suryasatriya Trihandaru; Hanna Arini Parhusip; Abdiel Wilyar Goni

doi:10.30598/barekengvol19iss2pp973-986

Suryasatriya Trihandaru Master of Data Science Study Program, Faculty of Science and Mathematics, Universitas Kristen Satya Wacana, Indonesia https://orcid.org/0000-0002-7147-1673
Hanna Arini Parhusip Master of Data Science Study Program, Faculty of Science and Mathematics, Universitas Kristen Satya Wacana, Indonesia https://orcid.org/0000-0002-0129-830X
Abdiel Wilyar Goni Master of Data Science Study Program, Faculty of Science and Mathematics, Universitas Kristen Satya Wacana, Indonesia https://orcid.org/0009-0004-5859-6174

DOI: https://doi.org/10.30598/barekengvol19iss2pp973-986

Keywords: MFCC, Classification, LSTM, Logistic Regression

Abstract

The problem of overfitting in a classification task involving animal vocalizations, namely squirrel monkeys, golden lion tamarins, and tailed macaques, is handled in this project. Acoustic features extracted for the audio data used in this research are MFCCs. The classification of subjects was done using the LSTM model. However, several architectures with LSTM also presented the problem of overfitting. To overcome this, a logistic regression model was used, which had a classification accuracy of 100%. These results indicate that for such a classification problem, logistic regression may be more appropriate than the complex architecture of LSTMs. Several LSTM architectures have been presented in this study to give an overall review of the observed challenges. Although the capability of LSTM in handling sequential data is very promising, sometimes simpler models might be preferred, as indicated by the results. This is a single-dataset work, and the findings may not generalize well to other domains. The work contributes much-needed insight into the choice of models for audio classification tasks and identifies the trade-off between model complexity and performance

Downloads

Download data is not yet available.

References

A. Berdasco, G. López, I. Diaz, L. Quesada, and L. A. Guerrero, “USER EXPERIENCE COMPARISON OF INTELLIGENT PERSONAL ASSISTANTS: ALEXA, GOOGLE ASSISTANT, SIRI AND CORTANA,” 2019, p. 51. doi: 10.3390/proceedings2019031051.

P. K. Murali, M. Kaboli, and R. Dahiya, “INTELLIGENT IN‐VEHICLE INTERACTION TECHNOLOGIES,” Adv. Intell. Syst., vol. 4, no. 2, p. 2100122, 2022, doi: 10.1002/aisy.202100122.

Y. Iliev and G. Ilieva, “A FRAMEWORK FOR SMART HOME SYSTEM WITH VOICE CONTROL USING NLP METHODS,” Electron., vol. 12, no. 1, pp. 1–13, 2023, doi: 10.3390/electronics12010116.

N. K. Manaswi, DEEP LEARNING WITH APPLICATIONS USING PYTHON: CHATBOTS AND FACE, OBJECT, AND SPEECH RECOGNITION WITH TENSORFLOW AND KERAS. Bangalore, Karnataka, India, 2018. [Online]. Available: https://www.hlevkin.com/hlevkin/45MachineDeepLearning/DL/Deep Learning with Applications Using Python.pdf

B. Fernandes and K. Mannepalli, “SPEECH EMOTION RECOGNITION USING DEEP LEARNING LSTM FOR TAMIL LANGUAGE,” Pertanika J. Sci. Technol., vol. 29, no. 3, pp. 1915–1936, 2021, doi: 10.47836/pjst.29.3.33.

A. Mahmood and U. Kose, “SPEECH RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORKS AND MFCC ALGORITHM,” Adv. Artif. Intell. Res., vol. 1, no. 1, pp. 6–12, 2021, [Online]. Available: https://dergipark.org.tr/en/pub/aair/issue/59650/768432

dan T. D. D. S. U. Bhandari, H. S. Kumbhar, V. K. Harpale, “ON THE EVALUATION AND IMPLEMENTATION OF LSTM MODEL FOR SPEECH EMOTION RECOGNITION USING MFCC,” in Proceedings of International Conference on Computational Intelligence and Data Engineering, 2022, pp. 421–434.

M. S. Beauchamp, “FACE AND VOICE PERCEPTION : MONKEY SEE , MONKEY HEAR,” vol. 31, no. 9, pp. 1–7, 2021, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0960982221003043

K. J. Devi, A. A. Devi, and K. Thongam, “AUTOMATIC SPEAKER RECOGNITION USING MFCC AND ARTIFICIAL NEURAL NETWORK,” Int. J. Innov. Technol. Explor. Eng., vol. 9, no. 1, pp. 39–42, 2019, doi: 10.35940/ijitee.A1010.1191S19.

Z. K. Abdul and A. K. Al-Talabani, “MEL FREQUENCY CEPSTRAL COEFFICIENT AND ITS APPLICATIONS: A REVIEW,” IEEE Access, vol. 10, pp. 122136–122158, 2022, doi: 10.1109/ACCESS.2022.3223444.

S. M. Widodo, E. Siswanto, and O. Sudjana, “PENERAPAN METODE MEL FREQUENCY CEPTRAL COEFFICIENT DAN LEARNING VECTOR QUANTIZATION UNTUK TEXT-DEPENDENT SPEAKER IDENTIFICATION,” J. Telemat., vol. 11, no. 1, pp. 15–20, 2016, [Online]. Available: https://journal.ithb.ac.id/telematika/article/view/147/pdf

A. Abdo et al., “PARTIAL PRE-EMPHASIS FOR PLUGGABLE 400 G SHORT-REACH COHERENT SYSTEMS,” Futur. Internet, vol. 11, no. 12, pp. 1–10, 2019, doi: 10.3390/FI11120256.

M. Labied, A. Belangour, M. Banane, and A. Erraissi, “AN OVERVIEW OF AUTOMATIC SPEECH RECOGNITION PREPROCESSING TECHNIQUES,” 2022 Int. Conf. Decis. Aid Sci. Appl. DASA 2022, pp. 804–809, 2022, doi: 10.1109/DASA54658.2022.9765043.

H. Manus, “AN ULTRA-PRECISE FAST FOURIER TRANSFORM,” Sci. Talks, vol. 4, no. December, pp. 1–26, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2772569322000974

D. D. Pramesti, D. C. R. Novitasari, F. Setiawan, and H. Khaulasari, “LONG-SHORT TERM MEMORY (LSTM) FOR PREDICTING VELOCITY AND DIRECTION SEA SURFACE CURRENT ON BALI STRAIT,” BAREKENG J. Ilmu Mat. dan Terap., vol. 16, no. 2, pp. 451–462, 2022, doi: 10.30598/barekengvol16iss2pp451-462.

I. M. Nur, R. Nugrahanto, and F. Fauzi, “CRYPTOCURRENCY PRICE PREDICTION: A HYBRID LONG SHORT-TERM MEMORY MODEL WITH GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSCEDASTICITY,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 3, pp. 1575–1584, 2023, doi: 10.30598/barekengvol17iss3pp1575-1584.

T. Xayasouk, H. M. Lee, and G. Lee, “AIR POLLUTION PREDICTION USING LONG SHORT-TERM MEMORY (LSTM) AND DEEP AUTOENCODER (DAE) MODELS,” Sustain., vol. 12, no. 6, 2020, doi: 10.3390/su12062570.

M. Kowsher et al., “LSTM-ANN & BiLSTM-ANN: HYBRID DEEP LEARNING MODELS FOR ENHANCED CLASSIFICATION ACCURACY,” Procedia Comput. Sci., vol. 193, pp. 131–140, 2021, doi: 10.1016/j.procs.2021.10.013.

K. S. Mohamed, “BATCH GRADIENT LEARNING ALGORITHM WITH SMOOTHING L1 REGULARIZATION FOR FEEDFORWARD NEURAL NETWORKS,” Computers, vol. 12, no. 1, pp. 1–15, 2023, doi: 10.3390/computers12010004.

C. Tallec and Y. Ollivier, “CAN RECURRENT NEURAL NETWORKS WARP TIME?,” in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018, pp. 1–13. [Online]. Available: https://typeset.io/pdf/can-recurrent-neural-networks-warp-time-s4vftv8ksl.pdf

K. P. Adjei, A. G. Finstad, W. Koch, and R. B. O’Hara, “MODELLING HETEROGENEITY IN THE CLASSIFICATION PROCESS IN MULTI-SPECIES DISTRIBUTION MODELS CAN IMPROVE PREDICTIVE PERFORMANCE,” Ecol. Evol., vol. 14, no. 3, 2024, doi: https://doi.org/10.1002/ece3.11092.

OVERCOMING OVERFITTING IN MONKEY VOCALIZATION CLASSIFICATION: USING LSTM AND LOGISTIC REGRESSION

Abstract

Downloads

References

Editorial Office

Contact Info