OVERCOMING OVERFITTING IN MONKEY VOCALIZATION CLASSIFICATION: USING LSTM AND LOGISTIC REGRESSION
Abstract
The problem of overfitting in a classification task involving animal vocalizations, namely squirrel monkeys, golden lion tamarins, and tailed macaques, is handled in this project. Acoustic features extracted for the audio data used in this research are MFCCs. The classification of subjects was done using the LSTM model. However, several architectures with LSTM also presented the problem of overfitting. To overcome this, a logistic regression model was used, which had a classification accuracy of 100%. These results indicate that for such a classification problem, logistic regression may be more appropriate than the complex architecture of LSTMs. Several LSTM architectures have been presented in this study to give an overall review of the observed challenges. Although the capability of LSTM in handling sequential data is very promising, sometimes simpler models might be preferred, as indicated by the results. This is a single-dataset work, and the findings may not generalize well to other domains. The work contributes much-needed insight into the choice of models for audio classification tasks and identifies the trade-off between model complexity and performance
Downloads
References
A. Berdasco, G. López, I. Diaz, L. Quesada, and L. A. Guerrero, “USER EXPERIENCE COMPARISON OF INTELLIGENT PERSONAL ASSISTANTS: ALEXA, GOOGLE ASSISTANT, SIRI AND CORTANA,” 2019, p. 51. doi: 10.3390/proceedings2019031051.
P. K. Murali, M. Kaboli, and R. Dahiya, “INTELLIGENT IN‐VEHICLE INTERACTION TECHNOLOGIES,” Adv. Intell. Syst., vol. 4, no. 2, p. 2100122, 2022, doi: 10.1002/aisy.202100122.
Y. Iliev and G. Ilieva, “A FRAMEWORK FOR SMART HOME SYSTEM WITH VOICE CONTROL USING NLP METHODS,” Electron., vol. 12, no. 1, pp. 1–13, 2023, doi: 10.3390/electronics12010116.
N. K. Manaswi, DEEP LEARNING WITH APPLICATIONS USING PYTHON: CHATBOTS AND FACE, OBJECT, AND SPEECH RECOGNITION WITH TENSORFLOW AND KERAS. Bangalore, Karnataka, India, 2018. [Online]. Available: https://www.hlevkin.com/hlevkin/45MachineDeepLearning/DL/Deep Learning with Applications Using Python.pdf
B. Fernandes and K. Mannepalli, “SPEECH EMOTION RECOGNITION USING DEEP LEARNING LSTM FOR TAMIL LANGUAGE,” Pertanika J. Sci. Technol., vol. 29, no. 3, pp. 1915–1936, 2021, doi: 10.47836/pjst.29.3.33.
A. Mahmood and U. Kose, “SPEECH RECOGNITION BASED ON CONVOLUTIONAL NEURAL NETWORKS AND MFCC ALGORITHM,” Adv. Artif. Intell. Res., vol. 1, no. 1, pp. 6–12, 2021, [Online]. Available: https://dergipark.org.tr/en/pub/aair/issue/59650/768432
dan T. D. D. S. U. Bhandari, H. S. Kumbhar, V. K. Harpale, “ON THE EVALUATION AND IMPLEMENTATION OF LSTM MODEL FOR SPEECH EMOTION RECOGNITION USING MFCC,” in Proceedings of International Conference on Computational Intelligence and Data Engineering, 2022, pp. 421–434.
M. S. Beauchamp, “FACE AND VOICE PERCEPTION : MONKEY SEE , MONKEY HEAR,” vol. 31, no. 9, pp. 1–7, 2021, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0960982221003043
K. J. Devi, A. A. Devi, and K. Thongam, “AUTOMATIC SPEAKER RECOGNITION USING MFCC AND ARTIFICIAL NEURAL NETWORK,” Int. J. Innov. Technol. Explor. Eng., vol. 9, no. 1, pp. 39–42, 2019, doi: 10.35940/ijitee.A1010.1191S19.
Z. K. Abdul and A. K. Al-Talabani, “MEL FREQUENCY CEPSTRAL COEFFICIENT AND ITS APPLICATIONS: A REVIEW,” IEEE Access, vol. 10, pp. 122136–122158, 2022, doi: 10.1109/ACCESS.2022.3223444.
S. M. Widodo, E. Siswanto, and O. Sudjana, “PENERAPAN METODE MEL FREQUENCY CEPTRAL COEFFICIENT DAN LEARNING VECTOR QUANTIZATION UNTUK TEXT-DEPENDENT SPEAKER IDENTIFICATION,” J. Telemat., vol. 11, no. 1, pp. 15–20, 2016, [Online]. Available: https://journal.ithb.ac.id/telematika/article/view/147/pdf
A. Abdo et al., “PARTIAL PRE-EMPHASIS FOR PLUGGABLE 400 G SHORT-REACH COHERENT SYSTEMS,” Futur. Internet, vol. 11, no. 12, pp. 1–10, 2019, doi: 10.3390/FI11120256.
M. Labied, A. Belangour, M. Banane, and A. Erraissi, “AN OVERVIEW OF AUTOMATIC SPEECH RECOGNITION PREPROCESSING TECHNIQUES,” 2022 Int. Conf. Decis. Aid Sci. Appl. DASA 2022, pp. 804–809, 2022, doi: 10.1109/DASA54658.2022.9765043.
H. Manus, “AN ULTRA-PRECISE FAST FOURIER TRANSFORM,” Sci. Talks, vol. 4, no. December, pp. 1–26, 2022, [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2772569322000974
D. D. Pramesti, D. C. R. Novitasari, F. Setiawan, and H. Khaulasari, “LONG-SHORT TERM MEMORY (LSTM) FOR PREDICTING VELOCITY AND DIRECTION SEA SURFACE CURRENT ON BALI STRAIT,” BAREKENG J. Ilmu Mat. dan Terap., vol. 16, no. 2, pp. 451–462, 2022, doi: 10.30598/barekengvol16iss2pp451-462.
I. M. Nur, R. Nugrahanto, and F. Fauzi, “CRYPTOCURRENCY PRICE PREDICTION: A HYBRID LONG SHORT-TERM MEMORY MODEL WITH GENERALIZED AUTOREGRESSIVE CONDITIONAL HETEROSCEDASTICITY,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 3, pp. 1575–1584, 2023, doi: 10.30598/barekengvol17iss3pp1575-1584.
T. Xayasouk, H. M. Lee, and G. Lee, “AIR POLLUTION PREDICTION USING LONG SHORT-TERM MEMORY (LSTM) AND DEEP AUTOENCODER (DAE) MODELS,” Sustain., vol. 12, no. 6, 2020, doi: 10.3390/su12062570.
M. Kowsher et al., “LSTM-ANN & BiLSTM-ANN: HYBRID DEEP LEARNING MODELS FOR ENHANCED CLASSIFICATION ACCURACY,” Procedia Comput. Sci., vol. 193, pp. 131–140, 2021, doi: 10.1016/j.procs.2021.10.013.
K. S. Mohamed, “BATCH GRADIENT LEARNING ALGORITHM WITH SMOOTHING L1 REGULARIZATION FOR FEEDFORWARD NEURAL NETWORKS,” Computers, vol. 12, no. 1, pp. 1–15, 2023, doi: 10.3390/computers12010004.
C. Tallec and Y. Ollivier, “CAN RECURRENT NEURAL NETWORKS WARP TIME?,” in 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, 2018, pp. 1–13. [Online]. Available: https://typeset.io/pdf/can-recurrent-neural-networks-warp-time-s4vftv8ksl.pdf
K. P. Adjei, A. G. Finstad, W. Koch, and R. B. O’Hara, “MODELLING HETEROGENEITY IN THE CLASSIFICATION PROCESS IN MULTI-SPECIES DISTRIBUTION MODELS CAN IMPROVE PREDICTIVE PERFORMANCE,” Ecol. Evol., vol. 14, no. 3, 2024, doi: https://doi.org/10.1002/ece3.11092.
Copyright (c) 2025 Suryasatriya Trihandaru, Hanna Arini Parhusip, Abdiel Wilyar Goni

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.