• Sri Nurdiati Department of Mathematics, Faculty of Mathematics and Natural Sciences, IPB University
  • Mohamad Khoirun Najib Department of Mathematics, Faculty of Mathematics and Natural Sciences, IPB University
  • Fahren Bukhari Department of Mathematics, Faculty of Mathematics and Natural Sciences, IPB University
  • Refi Revina Department of Mathematics, Faculty of Mathematics and Natural Sciences, IPB University
  • Fitra Nuvus Salsabila Department of Mathematics, Faculty of Mathematics and Natural Sciences, IPB University
Keywords: AlexNet architecture, , confusion matrix, convolutional neural network, deep learning, facial expression recognition, gradient-based optimizer


A convolutional neural network (CNN) is one of the machine learning models that achieve excellent success in recognizing human facial expressions. Technological developments have given birth to many optimizers that can be used to train the CNN model. Therefore, this study focuses on implementing and comparing 14 gradient-based CNN optimizers to classify facial expressions in two datasets, namely the Advanced Computing Class 2022 (ACC22) and Extended Cohn-Kanade (CK+) datasets. The 14 optimizers are classical gradient descent, traditional momentum, Nesterov momentum, AdaGrad, AdaDelta, RMSProp, Adam, Radam, AdaMax, AMSGrad, Nadam, AdamW, OAdam, and AdaBelief. This study also provides a review of the mathematical formulas of each optimizer. Using the best default parameters of each optimizer, the CNN model is trained using the training data to minimize the cross-entropy value up to 100 epochs. The trained CNN model is measured for its accuracy performance using training and testing data. The results show that the Adam, Nadam, and AdamW optimizers provide the best performance in model training and testing in terms of minimizing cross-entropy and accuracy of the trained model. The three models produce a cross-entropy of around 0.1 at the 100th epoch with an accuracy of more than 90% on both training and testing data. Furthermore, the Adam optimizer provides the best accuracy on the testing data for the ACC22 and CK+ datasets, which are 100% and 98.64%, respectively. Therefore, the Adam optimizer is the most appropriate optimizer to be used to train the CNN model in the case of facial expression recognition.


Download data is not yet available.


B. Niu, Z. Gao, and B. Guo, “Facial Expression Recognition with LBP and ORB Features,” Comput. Intell. Neurosci., 2021, doi: 10.1155/2021/8828245.

D. O. Melinte and L. Vladareanu, “Facial expressions recognition for human–robot interaction using deep convolutional neural networks with rectified adam optimizer,” Sensors (Switzerland), vol. 20, no. 8, 2020, doi: 10.3390/s20082393.

C. N. W. Geraets et al., “Virtual reality facial emotion recognition in social environments: An eye-tracking study,” Internet Interv., vol. 25, 2021, doi: 10.1016/j.invent.2021.100432.

C. H. Chen, I. J. Lee, and L. Y. Lin, “Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders,” Res. Dev. Disabil., vol. 36, pp. 396–403, 2015, doi: 10.1016/j.ridd.2014.10.015.

N. Samadiani et al., “A review on automatic facial expression recognition systems assisted by multimodal sensor data,” Sensors (Switzerland), vol. 19, no. 8, 2019, doi: 10.3390/s19081863.

A. Mahmood, S. Hussain, K. Iqbal, and W. S. Elkilani, “Recognition of Facial Expressions under Varying Conditions Using Dual-Feature Fusion,” Math. Probl. Eng., vol. 2019, 2019, doi: 10.1155/2019/9185481.

S. Li and W. Deng, “Deep facial expression recognition: a survey,” J. Image Graph., vol. 25, no. 11, pp. 2306–2320, 2020, doi: 10.11834/jig.200233.

I. H. Sarker, “Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 6, 2021, doi: 10.1007/s42979-021-00815-1.

S. Sun, Z. Cao, H. Zhu, and J. Zhao, “A Survey of Optimization Methods from a Machine Learning Perspective,” IEEE Trans. Cybern., vol. 50, no. 8, pp. 3668–3681, 2020, doi: 10.1109/TCYB.2019.2950779.

S. Nurdiati, M. K. Najib, F. Bukhari, M. R. Ardhana, S. Rahmah, and T. P. Blante, “Perbandingan AlexNet dan VGG untuk Pengenalan Ekspresi Wajah pada Dataset Kelas Komputasi Lanjut.” 2022.

P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010, 2010, pp. 94–101. doi: 10.1109/CVPRW.2010.5543262.

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” in IEEE Transactions on Neural Networks and Learning Systems, 2021, pp. 1–21. doi: 10.1109/tnnls.2021.3084827.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” in Proceedings of the IEEE, 1998, vol. 86, no. 11, pp. 2278–2323. doi: 10.1109/5.726791.

M. Z. Alom et al., “A state-of-the-art survey on deep learning theory and architectures,” Electron., vol. 8, no. 3, 2019, doi: 10.3390/electronics8030292.

S. Park and N. Kwak, “Analysis on the dropout effect in convolutional neural networks,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10112 LNCS. pp. 189–204, 2017. doi: 10.1007/978-3-319-54184-6_12.

Z. I. Botev, D. P. Kroese, R. Y. Rubinstein, and P. L’Ecuyer, “The cross-entropy method for optimization,” Handb. Stat., vol. 31, pp. 35–59, 2013, doi: 10.1016/B978-0-444-53859-8.00003-5.

J. Lu, “Gradient Descent, Stochastic Optimization, and Other Tales,” arXiv Prepr., no. arXiv:2205.00832, 2022.

S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv Prepr., no. arXiv:1609.04747, 2016.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” in Parallel distributed processing, vol. 1, Cambridge, MA: MIT Press, 1986, pp. 318–362.

N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Networks, vol. 12, no. 1, pp. 145–151, 1999, doi: 10.1016/S0893-6080(98)00116-6.

Y. Nesterov, “A method for unconstrained convex minimization problem with the rate of convergence O(1/k^2),” Dokl. AN USSR, vol. 269, pp. 543–547, 1983.

J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, pp. 2121–2159, 2011.

M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv Prepr., no. arXiv:1212.5701, 2012.

T. Tieleman and G. E. Hinton, “Coursera: Neural networks for machine learning,” University of Toronto, Technical Report, 2012.

D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” arXiv Prepr., no. arXiv:1412.6980, pp. 1–15, 2017.

L. Liu et al., “On the Variance of the Adaptive Learning Rate and Beyond,” arXiv Prepr., no. arXiv:1908.03265v4, 2021.

S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of Adam and beyond,” in 6th International Conference on Learning Representations, ICLR 2018, 2018, pp. 1–23.

T. T. Phuong and L. T. Phong, “On the Convergence Proof of AMSGrad and a New Version,” IEEE Access, vol. 7, pp. 61706–61716, 2019, doi: 10.1109/ACCESS.2019.2916341.

T. Dozat, “Incorporating Nesterov Momentum into Adam,” in ICLR Workshop, 2016, no. 1, pp. 2013–2016.

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, ICLR 2019, 2019, pp. 1–19.

I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” in 5th International Conference on Learning Representations, ICLR 2017, 2017, pp. 1–16.

C. Daskalakis, A. Ilyas, V. Syrgkanis, and H. Zeng, “Training GaNs with optimism,” in 6th International Conference on Learning Representations, ICLR 2018, 2018, pp. 1–30.

J. Zhuang et al., “AdaBelief optimizer: Adapting stepsizes by the belief in observed gradients,” Adv. Neural Inf. Process. Syst., vol. 33, pp. 18795–18806, 2020.

How to Cite
S. Nurdiati, M. Najib, F. Bukhari, R. Revina, and F. Salsabila, “PERFORMANCE COMPARISON OF GRADIENT-BASED CONVOLUTIONAL NEURAL NETWORK OPTIMIZERS FOR FACIAL EXPRESSION RECOGNITION”, BAREKENG: J. Math. & App., vol. 16, no. 3, pp. 927-938, Sep. 2022.