EFFECTIVENESS OF DIMENSIONALITY REDUCTION METHODS ON DATA WITH NON-LINEAR RELATIONSHIPS

Lukmanul Hakim; Asep Saefuddin; Kusman Sadik; Anwar Fitrianto; Bagus Sartono

doi:10.30598/barekengvol20iss3pp2507-2522

Lukmanul Hakim Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0001-7152-3624
Asep Saefuddin Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0002-1694-9515
Kusman Sadik Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0001-8361-8057
Anwar Fitrianto Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0001-7050-3082
Bagus Sartono Statistics and Data Science, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0003-1115-4737

DOI: https://doi.org/10.30598/barekengvol20iss3pp2507-2522

Keywords: Autoencoders, Missing value, Neural Network, Non Linier, Outlier, Principal Component Analysis

Abstract

The phenomenon of big data presents distinct challenges in the analysis process, especially when the data contains a very large number of variables. High complexity, potential redundancy, and the risk of overfitting are major issues that must be addressed through dimensionality reduction techniques. Principal Component Analysis (PCA) is a common method effective for data with linear relationships but has limitations in identifying nonlinear patterns. This research aims to improve performance of classification by introducing autoencoder for dealing with nonlinear relationship, data noise, missing values, outliers, and data with various scales. This study employs a quantitative approach through analysis of simulated and empirical data in the form of the Village Development Index from the Central Statistics Agency, which contains variables with various measurement scales. Both dimensionality reduction methods—PCA and neural network-based autoencoders—are tested across various data scenarios. The evaluation is conducted based on their effectiveness in preserving data structure, as well as the Mean Squared Error (MSE) values in the reconstruction process. The results indicate that PCA excels in computational efficiency and accuracy for data with linear relationships. In contrast, the autoencoder demonstrates superior performance in detecting nonlinear patterns, achieving lower Mean Squared Error (MSE) values with stable MSE standard deviations. Additionally, the autoencoder proves to be more robust in handling missing values and outliers compared to PCA. The selection of dimensionality reduction methods highly depends on the characteristics of the analyzed data. Autoencoders represent a superior alternative for handling complex and nonlinear data, although they require model parameter tuning. Further research is recommended to explore the influence of network architecture and training strategies of autoencoders on dimensionality reduction performance.

Downloads

Download data is not yet available.

References

H. K. Yaseen and A. M. Obaid, “BIG DATA: DEFINITION, ARCHITECTURE &AMP; APPLICATIONS,” JOIV Int. J. Informatics Vis., vol. 4, no. 1, pp. 45–51, Feb. 2020, doi: https://doi.org/10.30630/joiv.4.1.292

M. Ashraf et al., “A SURVEY ON DIMENSIONALITY REDUCTION TECHNIQUES FOR TIME-SERIES DATA,” IEEE Access, vol. 11, pp. 42909–42923, 2023, doi: https://doi.org/10.1109/ACCESS.2023.3269693

R. Ramachandran, G. Ravichandran, and A. Raveendran, “EVALUATION OF DIMENSIONALITY REDUCTION TECHNIQUES FOR BIG DATA,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), IEEE, Mar. 2020, pp. 226–231, doi: https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00043

C. JI et al., “BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES,” J. Interconnect. Networks, vol. 13, no. 03n04, p. 1250009, Sep. 2012, doi: https://doi.org/10.1142/S0219265912500090

L. H. Nguyen and S. Holmes, “TEN QUICK TIPS FOR EFFECTIVE DIMENSIONALITY REDUCTION,” PLOS Comput. Biol., vol. 15, no. 6, p. e1006907, Jun. 2019, doi: https://doi.org/10.1371/journal.pcbi.1006907

W. Zhang, H. Cheng, S. Zhan, M. Luo, F. Wang, and Z. Huang, “DIMENSIONALITY REDUCTION AND MACHINE LEARNING BASED MODEL OF SOFTWARE COST ESTIMATION,” Front. Phys., vol. 12, Mar. 2024, doi: https://doi.org/10.3389/fphy.2024.1324719

S. Nanga et al., “REVIEW OF DIMENSION REDUCTION METHODS,” J. Data Anal. Inf. Process., vol. 09, no. 03, pp. 189–231, 2021, doi: https://doi.org/10.4236/jdaip.2021.93013

N. Ahmad and A. B. Nassif, “DIMENSIONALITY REDUCTION: CHALLENGES AND SOLUTIONS,” ITM Web Conf., vol. 43, p. 01017, Mar. 2022, doi: https://doi.org/10.1051/itmconf/20224301017

M. DPatil and S. S. Sane, “DIMENSION REDUCTION: A REVIEW,” Int. J. Comput. Appl., vol. 92, no. 16, pp. 23–29, Apr. 2014, doi: https://doi.org/10.5120/16094-5390

N. Abd-Alsabour, “ON THE ROLE OF DIMENSIONALITY REDUCTION,” J. Comput., pp. 571–579, 2018, doi: https://doi.org/10.17706/jcp.13.5.571-579

D. M. Abdulqader, A. M. Abdulazeez, and D. Q. Zeebaree, “MACHINE LEARNING SUPERVISED ALGORITHMS OF GENE SELECTION: A REVIEW,” Technol. Reports Kansai Univ, vol. 62, no. 03, pp. 233–244, 2020.

B. M. Salih Hasan and A. M. Abdulazeez, “A REVIEW OF PRINCIPAL COMPONENT ANALYSIS ALGORITHM FOR DIMENSIONALITY REDUCTION,” J. Soft Comput. Data Min., vol. 02, no. 01, Apr. 2021, doi: https://doi.org/10.30880/jscdm.2021.02.01.003

S. Velliangiri, S. Alagumuthukrishnan, and S. I. Thankumar joseph, “A REVIEW OF DIMENSIONALITY REDUCTION TECHNIQUES FOR EFFICIENT COMPUTATION,” Procedia Comput. Sci., vol. 165, pp. 104–111, 2019, doi: https://doi.org/10.1016/j.procs.2020.01.079

H. K. Palo, S. Sahoo, and A. K. Subudhi, “DIMENSIONALITY REDUCTION TECHNIQUES: PRINCIPLES, BENEFITS, AND LIMITATIONS,” in Data Analytics in Bioinformatics, Wiley, 2021, pp. 77–107, doi: https://doi.org/10.1002/9781119785620.ch4

J. T. Vogelstein et al., “SUPERVISED DIMENSIONALITY REDUCTION FOR BIG DATA,” Nat. Commun., vol. 12, no. 1, p. 2872, May 2021, doi: https://doi.org/10.1038/s41467-021-23102-2

K. Pearson, “LIII. ON LINES AND PLANES OF CLOSEST FIT TO SYSTEMS OF POINTS IN SPACE,” London, Edinburgh, Dublin Philos. Mag. J. Sci., vol. 2, no. 11, pp. 559–572, Nov. 1901, doi: https://doi.org/10.1080/14786440109462720

H. Hotelling, “ANALYSIS OF A COMPLEX OF STATISTICAL VARIABLES INTO PRINCIPAL COMPONENTS.,” J. Educ. Psychol., vol. 24, no. 6, pp. 417–441, Sep. 1933, doi: https://doi.org/10.1037/h0071325

J. Liu, Lai Xu, A. Caprihana, and V. D. Calhoun, “EXTRACTING PRINCIPLE COMPONENTS FOR DISCRIMINANT ANALYSIS OF FMRI IMAGES,” in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Mar. 2008, pp. 449–452, doi: https://doi.org/10.1109/ICASSP.2008.4517643

N. Salem and S. Hussein, “DATA DIMENSIONAL REDUCTION AND PRINCIPAL COMPONENTS ANALYSIS,” Procedia Comput. Sci., vol. 163, pp. 292–299, 2019, doi: https://doi.org/10.1016/j.procs.2019.12.111

Monica Casella, Pasquale Dolce, Michela Ponticorvo, and Davide Marocco, “AUTOENCODERS AS AN ALTERNATIVE APPROACH TO PRINCIPAL COMPONENT ANALYSIS FOR DIMENSIONALITY REDUCTION. AN APPLICATION ON SIMULATED DATA FROM PSYCHOMETRIC MODELS,” Proccedings Third Symp. Psychol. Technol., 2021. doi: https://doi.org/10.1109/MetroXRAINE54828.2022.9967686

P. Switzer, “EXTENSIONS OF LINEAR DISCRIMINANT ANALYSIS FOR STATISTICAL CLASSIFICATION OF REMOTELY SENSED SATELLITE IMAGERY,” J. Int. Assoc. Math. Geol., vol. 12, no. 4, pp. 367–376, Aug. 1980, doi: https://doi.org/10.1007/BF01029421.

K. V. Ravi Kanth, D. Agrawal, A. El Abbadi, and A. Singh, “DIMENSIONALITY REDUCTION FOR SIMILARITY SEARCHING IN DYNAMIC DATABASES,” Comput. Vis. Image Underst., vol. 75, no. 1–2, pp. 59–72, Jul. 1999, doi: https://doi.org/10.1006/cviu.1999.0762.

M. Al Digeil et al., “PCA-ENHANCED AUTOENCODERS FOR NONLINEAR DIMENSIONALITY REDUCTION IN LOW DATA REGIMES,” Proc. Can. Conf. Artif. Intell., Jun. 2023, doi: https://doi.org/10.21428/594757db.05a13011.

L. Qu and Y. Pei, “A COMPREHENSIVE REVIEW ON DISCRIMINANT ANALYSIS FOR ADDRESSING CHALLENGES OF CLASS-LEVEL LIMITATIONS, SMALL SAMPLE SIZE, AND ROBUSTNESS,” Processes, vol. 12, no. 7, p. 1382, Jul. 2024, doi: https://doi.org/10.3390/pr12071382.

W. Wang, Y. Huang, Y. Wang, and L. Wang, “GENERALIZED AUTOENCODER: A NEURAL NETWORK FRAMEWORK FOR DIMENSIONALITY REDUCTION,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Jun. 2014, pp. 496–503, doi: https://doi.org/10.1109/CVPRW.2014.79.

C. Hu, X. Hou, and Y. Lu, “IMPROVING THE ARCHITECTURE OF AN AUTOENCODER FOR DIMENSION REDUCTION,” in 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops, IEEE, Dec. 2014, pp. 855–858, doi: https://doi.org/10.1109/UIC-ATC-ScalCom.2014.50.

Y. Wang, H. Yao, S. Zhao, and Y. Zheng, “DIMENSIONALITY REDUCTION STRATEGY BASED ON AUTO-ENCODER,” in Proceedings of the 7th International Conference on Internet Multimedia Computing and Service, New York, NY, USA: ACM, Aug. 2015, pp. 1–4, doi:https://doi.org/10.1145/2808492.2808555.

J. Wang, H. He, and D. V. Prokhorov, “A FOLDED NEURAL NETWORK AUTOENCODER FOR DIMENSIONALITY REDUCTION,” Procedia Comput. Sci., vol. 13, pp. 120–127, 2012, doi: https://doi.org/10.1016/j.procs.2012.09.120.

I.T. Jolliffe, Principal Component Analysis, Second Edition. New York: Springer-Verlag, , 1986. doi: https://doi.org/10.1007/978-1-4757-1904-8.

H. Abdi and L. J. Williams, “PRINCIPAL COMPONENT ANALYSIS,” WIREs Comput. Stat., vol. 2, no. 4, pp. 433–459, Jul. 2010, doi: https://doi.org/10.1002/wics.101.

M. Z. Nasution, “PENERAPAN PRINCIPAL COMPONENT ANALYSIS (PCA) DALAM PENENTUAN FAKTOR DOMINAN YANG MEMPENGARUHI PRESTASI BELAJAR SISWA (Studi Kasus : SMK Raksana 2 Medan),” J. Teknol. Inf., vol. 3, no. 1, p. 41, Jul. 2019, doi: https://doi.org/10.36294/jurti.v3i1.686.

R. A. Johnson and D. W. Wichern, “APPLIED MULTIVARIATE STATISTICAL ANALYSIS.,” Biometrics, vol. 54, no. 3, p. 1203, Sep. 1998, doi: https://doi.org/10.2307/2533879.

H. Li, M. Trocan, M. Sawan, and D. Galayko, “SERIAL DECODERS-BASED AUTO-ENCODERS FOR IMAGE RECONSTRUCTION,” Appl. Sci., vol. 12, no. 16, p. 8256, Aug. 2022, doi: https://doi.org/10.3390/app12168256.

A. Mehrabinezhad, M. Teshnehlab, and A. Sharifi, “AUTOENCODER-PCA-BASED ONLINE SUPERVISED FEATURE EXTRACTIONSELECTION APPROACH,” J. Artif. Intell. Data Min. (JAIDM), vol. 11, pp. 525–534, 2023.

S. Chen and W. Guo, “AUTO-ENCODERS IN DEEP LEARNING—A REVIEW WITH NEW PERSPECTIVES,” Mathematics, vol. 11, no. 8, p. 1777, Apr. 2023, doi: https://doi.org/10.3390/math11081777.

G. Dong, G. Liao, H. Liu, and G. Kuang, “A REVIEW OF THE AUTOENCODER AND ITS VARIANTS: A COMPARATIVE PERSPECTIVE FROM TARGET RECOGNITION IN SYNTHETIC-APERTURE RADAR IMAGES,” IEEE Geosci. Remote Sens. Mag., vol. 6, no. 3, pp. 44–68, Sep. 2018, doi: https://doi.org/10.1109/MGRS.2018.2853555

F. Laakom, J. Raitoharju, A. Iosifidis, and M. Gabbouj, “Reducing redundancy in the bottleneck representation of autoencoders,” Pattern Recognit. Lett., vol. 178, pp. 202–208, Feb. 2024, doi: 10.1016/j.patrec.2024.01.013. doi: https://doi.org/10.1016/j.patrec.2024.01.013

E. Palut, “FROM PRINCIPAL SUBSPACES TO PRINCIPAL COMPONENTS WITH LINEAR AUTOENCODERS,” Mach. Learn., 2018.

M. P. Gore, A. A. Shinde, and A. Choudhury, “APPLYING PRINCIPAL COMPONENT ANALYSIS AND AUTOENCODERS FOR DIMENSIONALITY REDUCTION IN DATA STREAM,” Int. J. Innov. Res. Eng. Manag., vol. 11, no. 5, pp. 121–126, Oct. 2024, doi: https://doi.org/10.55524/ijirem.2024.11.5.17.

D. Cacciarelli and M. Kulahci, “HIDDEN DIMENSIONS OF THE DATA: PCA VS AUTOENCODERS,” Qual. Eng., vol. 35, no. 4, pp. 741–750, Oct. 2023, doi: https://doi.org/10.1080/08982112.2023.2231064.

R. Cardoso Pereira, M. Seoane Santos, P. Pereira Rodrigues, and P. Henriques Abreu, “REVIEWING AUTOENCODERS FOR MISSING DATA IMPUTATION: TECHNICAL TRENDS, APPLICATIONS AND OUTCOMES,” J. Artif. Intell. Res., vol. 69, pp. 1255–1285, Dec. 2020, doi: https://doi.org/10.1613/jair.1.12312

A. Abhaya and B. K. Patra, “AN EFFICIENT METHOD FOR AUTOENCODER BASED OUTLIER DETECTION,” Expert Syst. Appl., vol. 213, p. 118904, Mar. 2023, doi: https://doi.org/10.1016/j.eswa.2022.118904

N. Olabi and M.V Zurich, “DIMENSIONALITY REDUCTION FOR DISTRIBUTED MACHINE LEARNING,” University of Zurich, 2025.

T. O. Hodson, T. M. Over, and S. S. Foks, “MEAN SQUARED ERROR, DECONSTRUCTED,” J. Adv. Model. Earth Syst., vol. 13, no. 12, Dec. 2021, doi: https://doi.org/10.1029/2021MS002681

A. G. Priya Varshini, K. Anitha Kumari, D. Janani, and S. Soundariya, “COMPARATIVE ANALYSIS OF MACHINE LEARNING AND DEEP LEARNING ALGORITHMS FOR SOFTWARE EFFORT ESTIMATION,” J. Phys. Conf. Ser., vol. 1767, no. 1, p. 012019, Feb. 2021, doi: https://doi.org/10.1088/1742-6596/1767/1/012019

EFFECTIVENESS OF DIMENSIONALITY REDUCTION METHODS ON DATA WITH NON-LINEAR RELATIONSHIPS

Abstract

Downloads

References

Most read articles by the same author(s)

Editorial Office

Contact Info