PERFORMANCE COMPARISON OF GRADIENT-BASED CONVOLUTIONAL NEURAL NETWORK OPTIMIZERS FOR FACIAL EXPRESSION RECOGNITION
Abstract
A convolutional neural network (CNN) is one of the machine learning models that achieve excellent success in recognizing human facial expressions. Technological developments have given birth to many optimizers that can be used to train the CNN model. Therefore, this study focuses on implementing and comparing 14 gradient-based CNN optimizers to classify facial expressions in two datasets, namely the Advanced Computing Class 2022 (ACC22) and Extended Cohn-Kanade (CK+) datasets. The 14 optimizers are classical gradient descent, traditional momentum, Nesterov momentum, AdaGrad, AdaDelta, RMSProp, Adam, Radam, AdaMax, AMSGrad, Nadam, AdamW, OAdam, and AdaBelief. This study also provides a review of the mathematical formulas of each optimizer. Using the best default parameters of each optimizer, the CNN model is trained using the training data to minimize the cross-entropy value up to 100 epochs. The trained CNN model is measured for its accuracy performance using training and testing data. The results show that the Adam, Nadam, and AdamW optimizers provide the best performance in model training and testing in terms of minimizing cross-entropy and accuracy of the trained model. The three models produce a cross-entropy of around 0.1 at the 100th epoch with an accuracy of more than 90% on both training and testing data. Furthermore, the Adam optimizer provides the best accuracy on the testing data for the ACC22 and CK+ datasets, which are 100% and 98.64%, respectively. Therefore, the Adam optimizer is the most appropriate optimizer to be used to train the CNN model in the case of facial expression recognition.
Downloads
References
B. Niu, Z. Gao, and B. Guo, “Facial Expression Recognition with LBP and ORB Features,” Comput. Intell. Neurosci., 2021, doi: 10.1155/2021/8828245.
D. O. Melinte and L. Vladareanu, “Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer,” Sensors (Switzerland), vol. 20, no. 8, 2020, doi: 10.3390/s20082393.
C. N. W. Geraets et al., “Virtual reality facial emotion recognition in social environments: An eye-tracking study,” Internet Interv., vol. 25, 2021, doi: 10.1016/j.invent.2021.100432.
C. H. Chen, I. J. Lee, and L. Y. Lin, “Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders,” Res. Dev. Disabil., vol. 36, pp. 396–403, 2015, doi: 10.1016/j.ridd.2014.10.015.
N. Samadiani et al., “A review on automatic facial expression recognition systems assisted by multimodal sensor data,” Sensors (Switzerland), vol. 19, no. 8, 2019, doi: 10.3390/s19081863.
A. Mahmood, S. Hussain, K. Iqbal, and W. S. Elkilani, “Recognition of Facial Expressions under Varying Conditions Using Dual-Feature Fusion,” Math. Probl. Eng., vol. 2019, 2019, doi: 10.1155/2019/9185481.
S. Li and W. Deng, “Deep facial expression recognition: a survey,” J. Image Graph., vol. 25, no. 11, pp. 2306–2320, 2020, doi: 10.11834/jig.200233.
I. H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 6, 2021, doi: 10.1007/s42979-021-00815-1.
S. Sun, Z. Cao, H. Zhu, and J. Zhao, “A Survey of Optimization Methods from a Machine Learning Perspective,” IEEE Trans. Cybern., vol. 50, no. 8, pp. 3668–3681, 2020, doi: 10.1109/TCYB.2019.2950779.
S. Nurdiati, M. K. Najib, F. Bukhari, M. R. Ardhana, S. Rahmah, and T. P. Blante, “Perbandingan AlexNet dan VGG untuk Pengenalan Ekspresi Wajah pada Dataset Kelas Komputasi Lanjut.” 2022.
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010, 2010, pp. 94–101. doi: 10.1109/CVPRW.2010.5543262.
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” in IEEE Transactions on Neural Networks and Learning Systems, 2021, pp. 1–21. doi: 10.1109/tnnls.2021.3084827.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 1998, vol. 86, no. 11, pp. 2278–2323. doi: 10.1109/5.726791.
M. Z. Alom et al., “A state-of-the-art survey on deep learning theory and architectures,” Electron., vol. 8, no. 3, 2019, doi: 10.3390/electronics8030292.
S. Park and N. Kwak, “Analysis on the dropout effect in convolutional neural networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10112 LNCS. pp. 189–204, 2017. doi: 10.1007/978-3-319-54184-6_12.
Z. I. Botev, D. P. Kroese, R. Y. Rubinstein, and P. L’Ecuyer, “The cross-entropy method for optimization,” Handb. Stat., vol. 31, pp. 35–59, 2013, doi: 10.1016/B978-0-444-53859-8.00003-5.
J. Lu, “Gradient Descent, Stochastic Optimization, and Other Tales,” arXiv Prepr., no. arXiv:2205.00832, 2022.
S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv Prepr., no. arXiv:1609.04747, 2016.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” in Parallel distributed processing, vol. 1, Cambridge, MA: MIT Press, 1986, pp. 318–362.
N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Networks, vol. 12, no. 1, pp. 145–151, 1999, doi: 10.1016/S0893-6080(98)00116-6.
Y. Nesterov, “A method for unconstrained convex minimization problem with the rate of convergence O(1/k^2),” Dokl. AN USSR, vol. 269, pp. 543–547, 1983.
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, pp. 2121–2159, 2011.
M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv Prepr., no. arXiv:1212.5701, 2012.
T. Tieleman and G. E. Hinton, “Coursera: Neural networks for machine learning,” University of Toronto, Technical Report, 2012. https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr., no. arXiv:1412.6980, pp. 1–15, 2017.
L. Liu et al., “On the Variance of the Adaptive Learning Rate and Beyond,” arXiv Prepr., no. arXiv:1908.03265v4, 2021.
S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of Adam and beyond,” in 6th International Conference on Learning Representations, ICLR 2018, 2018, pp. 1–23.
T. T. Phuong and L. T. Phong, “On the Convergence Proof of AMSGrad and a New Version,” IEEE Access, vol. 7, pp. 61706–61716, 2019, doi: 10.1109/ACCESS.2019.2916341.
T. Dozat, “Incorporating Nesterov Momentum into Adam,” in ICLR Workshop, 2016, no. 1, pp. 2013–2016.
I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, ICLR 2019, 2019, pp. 1–19.
I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in 5th International Conference on Learning Representations, ICLR 2017, 2017, pp. 1–16.
C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng, “Training GaNs with optimism,” in 6th International Conference on Learning Representations, ICLR 2018, 2018, pp. 1–30.
J. Zhuang et al., “AdaBelief optimizer: Adapting stepsizes by the belief in observed gradients,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 18795–18806, 2020.
Copyright (c) 2022 Sri Nurdiati, Mohamad Khoirun Najib, Fahren Bukhari, Refi Revina, Fitra Nuvus Salsabila
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.