PRINCIPAL COMPONENT ANALYSIS-VECTOR AUTOREGRESSIVE INTEGRATED (PCA-VARI) MODEL USING DATA MINING APPROACH TO CLIMATE DATA IN THE WEST JAVA REGION

  • Devi Munandar Mathematics Department, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran
  • Budi Nurani Ruchjana Mathematics Department, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran
  • Atje Setiawan Abdullah Computer Science Department, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran https://orcid.org/0000-0002-3877-3522
Keywords: climate, data mining, forecasting, irf, kdd, pca-vari, variable reduction

Abstract

Over a long time, atmospheric changes have been caused by natural phenomena. This study uses the Principal Component Analysis (PCA) model combined with Vector Autoregressive Integrated (VARI) called the PCA-VARI model through the data mining approach. PCA reduces ten variables of climate data into two principal components during ten years (2001-2020) of climate data from NASA Prediction Of Worldwide Energy Resources. VARI is a non-stationary multivariate time series to model two or more variables that influence each other using a differencing process. The Knowledge Discovery in Database (KDD) method was conducted for empirical analysis. Pre-processing is an analysis of raw climate data. The data mining process determines the proportion of each component of PCA and is selected as variables in the VARI process. The postprocessing is by visualizing and interpreting the PCA-VARI model. Variables of solar radiation and precipitation are strongly correlated with each measurement location data. A forecast of the interaction of variables between locations is shown in the results of Impulse Response Function (IRF) visualization, where the climate of the West Java region, especially the Lembang and Bogor areas, has strong response climate locations, which influence each other.

Downloads

Download data is not yet available.

References

R. Kishimoto, T. Shimura, N. Mori, and H. Mase, “Statistical modeling of global mean wave height considering principal component analysis of sea level pressures and its application to future wave height projection,” Hydrol. Res. Lett., vol. 11, no. 1, pp. 51–57, 2017, doi: 10.3178/hrl.11.51.

B. J. Washington and L. Seymour, “An adapted vector autoregressive expectation maximization imputation algorithm for climate data networks,” Wiley Interdiscip. Rev. Comput. Stat., vol. 12, no. 6, 2020, doi: 10.1002/wics.1494.

S. Ankamah, K. S. Nokoe, and W. A. Iddrisu, “Modelling Trends of Climatic Variability and Malaria in Ghana Using Vector Autoregression,” Malar. Res. Treat., vol. 2018, 2018, doi: 10.1155/2018/6124321.

F. Pretis, “Econometric modelling of climate systems: The equivalence of energy balance models and cointegrated vector autoregressions,” J. Econom., vol. 214, no. 1, pp. 256–273, 2020, doi: 10.1016/j.jeconom.2019.05.013.

S. Mamipour, M. Yahoo, and S. Jalalvandi, “An empirical analysis of the relationship between the environment, economy, and society: Results of a PCA-VAR model for Iran,” Ecol. Indic., vol. 102, pp. 760–769, 2019, doi: 10.1016/j.ecolind.2019.03.039.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques, 3rd ed. Massachusetts: Elsevier Inc., 2012.

J. Niu et al., “A comparative study on application of data mining technique in human shape clustering: Principal component analysis vs. factor analysis,” in Proceedings of the 2010 5th IEEE Conference on Industrial Electronics and Applications, ICIEA 2010, 2010, pp. 2014–2018, doi: 10.1109/ICIEA.2010.5515577.

W. L. Cerón, J. Molina-Carpio, I. Ayes Rivera, R. V Andreoli, M. T. Kayano, and T. Canchala, “A principal component analysis approach to assess CHIRPS precipitation dataset for the study of climate variability of the La Plata Basin, Southern South America,” Nat. Hazards, vol. 103, no. 1, pp. 767–783, 2020, doi: 10.1007/s11069-020-04011-x.

Snedecor, G. W., and W. G. Cochran, Statistical Methods. Iowa State University Press, 1989.

T. Singh, A. Ghosh, and N. Khandelwal, “Dimensional reduction and feature selection: Principal component analysis for data mining,” Radiology, vol. 285, no. 3, p. 1055, 2017, doi: 10.1148/radiol.2017171604.

R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, Six editio. New Jersey: Pearson Prentice Hall, 2007.

H. Anton and C. Rorrers, Elementary Linear Algebra, 11th ed. Wiley, 2014.

G. E. P. Box and G. M. Jenkins, Time Series Analysis Forecasting and Control. Holden-Day. Inc, 1976.

D. A. Dickey and W. A. Fuller, “Distribution of the Estimators for Autoregressive Time Series with a Unit Root,” J. Am. Stat. Assoc., vol. 74, no. 366, pp. 427–431, 1979, doi: 10.1080/01621459.1979.10482531.

P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting, 2nd ed. New York: Springer-Verlag, 2002.

G. E. P. Box and D. Cox, “An analysis of transformations,” J. R. Stat. Soc., vol. B.26, no. 2, pp. 211–252, 1964.

A. Hoyyi, Tarno, D. A. I Maruddani, and R. Rahmawati, “Vector autoregressive model approach for forecasting outflow cash in Central Java,” 2018, vol. 1025, no. 1, doi: 10.1088/1742-6596/1025/1/012105.

Y. Nalita, R. Rahani, E. R. Tirayo, T. Toharudin, and B. N. Ruchjana, “Ordinary least square and maximum likelihood estimation of VAR(1) model’s parameters and it’s application on covid-19 in China 2020,” 2021, doi: 10.1088/1742-6596/1722/1/011002.

C. W. J. Granger, “Investigating Causal Relations by Econometric Models and Cross-spectral Methods,” Econometrica, vol. 37, no. 3, pp. 424–438, 1969, doi: 10.2307/1912791.

E. Chandra and P. Ajitha, “PCA for heterogeneous data sets in a distributed data mining,” 2011, doi: 10.1145/1980422.1980451.

Y. Yu and D. Wang, “Similarity Study of Hydrological Time Series Based on Data Mining,” Adv. Intell. Syst. Comput., vol. 1303, pp. 1049–1055, 2021, doi: 10.1007/978-981-33-4572-0_150.

X. Du and F. Zhu, “A novel principal components analysis (PCA) method for energy absorbing structural design enhanced by data mining,” Adv. Eng. Softw., vol. 127, pp. 17–27, 2019, doi: 10.1016/j.advengsoft.2018.10.005.

J. C. L. Chan and J.-E. Shi, “Application of projection-pursuit principal component analysis method to climate studies,” Int. J. Climatol., vol. 17, no. 1, pp. 103–113, 1997.

S. M. Shaharudin, N. Ahmad, N. H. Zainuddin, and N. S. Mohamed, “Identification of rainfall patterns on hydrological simulation using robust principal component analysis,” Indones. J. Electr. Eng. Comput. Sci., vol. 11, no. 3, pp. 1162–1167, 2018, doi: 10.11591/ijeecs.v11.i3.pp1162-1167.

M. A. Shahin, M. A. Ali, and A. B. M. S. Ali, Vector Autoregression (VAR) modeling and forecasting of temperature, humidity, and cloud coverage, vol. 9789401786. 2014.

Published
2022-03-21
How to Cite
[1]
D. Munandar, B. Ruchjana, and A. Abdullah, “PRINCIPAL COMPONENT ANALYSIS-VECTOR AUTOREGRESSIVE INTEGRATED (PCA-VARI) MODEL USING DATA MINING APPROACH TO CLIMATE DATA IN THE WEST JAVA REGION”, BAREKENG: J. Math. & App., vol. 16, no. 1, pp. 099-112, Mar. 2022.