OUTLIER DETECTION ON HIGH DIMENSIONAL DATA USING MINIMUM VECTOR VARIANCE (MVV)

  • Andi Harismahyanti A. Department of Statistics and Data Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University
  • Indahwati Indahwati Department of Statistics and Data Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University
  • Anwar Fitrianto Department of Statistics and Data Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University
  • Erfiani Erfiani Department of Statistics and Data Science, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University
Keywords: outlier detection, high dimension, mahalanobis distance, Minimum Vector Variance

Abstract

High-dimensional data can occur in actual cases where the variable p is larger than the number of observations n. The problem that often occurs when adding data dimensions indicates that the data points will approach an outlier. Outliers are part of observations that do not follow the data distribution pattern and are located far from the data center. The existence of outliers needs to be detected because it can lead to deviations from the analysis results. One of the methods used to detect outliers is the Mahalanobis distance. To obtain a robust Mahalanobis distance, the Minimum Vector Variance (MVV) method is used. This study will compare the MVV method with the classical Mahalanobis distance method in detecting outliers in non-invasive blood glucose level data, both at p>n and n>p. The test results show that the MVV method is better for n>p. MVV shows more effective results in identifying the minimum data group and outlier data points than the classical method.

Downloads

Download data is not yet available.

References

M. Rochayani, “Hybrid Undersampling, Regularization, and Decision Tree Methods for Classification of High Dimensional Data with Unbalanced Classes,” 2020, Accessed: Mar. 27, 2022. [Online]. Available: http://repository.ub.ac.id/183689/.

T. RAHMATIKA, “Support Vector Machine for Multiclass Imbalanced on High Dimensional Data,” 2020, Accessed: Mar. 27, 2022. [Online]. Available: http://etd.repository.ug.ac.id/penelitian/detail/183304.

E. Herdiani, P. Sari, NS-J. of P. Conference, and undefined 2019, “Detection of Outliers in Multivariate Data using Minimum Vector Variance Method,” iopscience.iop.org , doi: 10.1088/1742-6596/1341/9/092004.

E. Wahyuni, SS- Science, undefined technology, undefined Engineering, and undefined 2020, “A Comparison of Outlier Detection Techniques in Data Mining,” seminar.uad.ac.id , Accessed: Mar. 26, 2022. [Online]. Available: http://seminar.uad.ac.id/index.php/STEEEM/article/download/2878/805.

GN-J. of AI System and undefined 2016, “Detection of Transaction Outliers Using Visualization-Olap in Private Higher Education Data Warehouses,” publications.dinus.ac.id , Accessed: Mar. 26, 2022. [Online]. Available: http://publikasi.dinus.ac.id/index.php/jais/article/view/1184.

J. Mei, M. Liu, Y. Wang, HG-I. transactions on, and undefined 2015, “Learning a Mahalanobis distance-based dynamic time warping measure for multivariate time series classification,” ieeexplore.ieee.org , Accessed: Mar. 28, 2022. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/7104107/.

M. FARUK, "Comparison of MVV and FMCD methods in detecting outliers in a normal multivariate data observation," 2008, Accessed: Mar. 27, 2022. [Online]. Available: http://etd.repository.ugm.ac.id/home/detail_pencarian/39086.

D. Juniardi, MM-BBI Mathematics, and undefined Statistika, “USE OF MINIMUM VECTOR VARIANCE (MVV) METHOD AND CONFIRMATION ANALYSIS IN DETECTING OUTLIER,” journal.untan.ac.id , vol. 01, no. 1, pp. 31–40, 2012, Accessed: Mar. 25, 2022. [Online]. Available: https://jurnal.untan.ac.id/index.php/jbmstr/article/view/5187.

Juniardi DKMNM, “USE OF MINIMUM VECTOR VARIANCE (MVV) METHOD AND CONFIRMATION ANALYSIS IN DETECTING OUTLIER,” Bimaster Bul. science. Matt. stats. and Ter. , vol. 3, no. 01, March. 2014, doi:10.26418/BBIMST.V3I01.5187.

K. Aurelia, “Non-invasive Estimation of Blood Glucose Levels Using Partial Least Square Regression with Multiple Summary Approaches,” 2020, [Online]. Available: https://repository.ipb.ac.id/handle/123456789/104399.

MP Boni et al. , “Mahalanobis Distance And Pca,” 2018.

C. Leys, O. Klein, Y. Dominicy, and C. Ley, “Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance,” J. Exp. soc. Psychol. , vol. 74, pp. 150–156, Jan. 2018, doi:10.1016/J.JESP.2017.09.011.

R. Johnson, DW- Statistics, and undefined 2015, “Applied multivariate statistical analysis,” statistics.columbian.gwu.edu , Accessed: Mar. 28, 2022. [Online]. Available: https://statistics.columbian.gwu.edu/sites/g/files/zaxdzs1911/f/downloads/Syllabus Stat 6215.G Wang Fall 2015.pdf.

DE Herwindiati and SM Isa, “The Robust Principal Component Using Minimum Vector Variance,” Proc. World Congr. eng. , vol. 1, pp. 325–329, 2009.

N. Mukhtar, “ANALYSIS OF MAIN COMPONENTS OF ROBUST USING MINIMUM VECTOR VARIANCE METHOD NURHARDIANTI MUKHTAR'S thesis,” no. April, 2019.

Published
2022-09-01
How to Cite
[1]
A. A., I. Indahwati, A. Fitrianto, and E. Erfiani, “OUTLIER DETECTION ON HIGH DIMENSIONAL DATA USING MINIMUM VECTOR VARIANCE (MVV)”, BAREKENG: J. Math. & App., vol. 16, no. 3, pp. 797-804, Sep. 2022.