TIME SERIES IMPUTATION USING VAR-IM (CASE STUDY: WEATHER DATA IN METEOROLOGICAL STATION OF CITEKO)

  • Muhammad Edy Rizal Department of Statistics, FMIPA, IPB University
  • Aji H Wigena Department of Statistics, FMIPA, IPB University
  • Farit M Afendi Department of Statistics, FMIPA, IPB University
Keywords: multivariate imputation, VAR-IM, weather data

Abstract

Univariate imputation methods are defined as imputation methods that only use the information of the target variable to estimate missing values. While univariate imputation methods are convenient and flexible since no other variable is required, multivariate imputation methods can potentially improve imputation accuracy given that the other variables are relevant to the target variable. Many multivariate imputation methods have been proposed, one of which is Vector Autoregression Imputation Method (VAR-IM). This study aims to compare imputation results of VAR-IM-based methods and univariate imputation methods on time series data, specifically on long lag seasonal data such as daily weather data. Three modified VAR-IM methods were studied using simulations with three steps: deletion, imputation, and evaluation. The deletion step was conducted using six different schemes with six missing proportions. The simulations were conducted on secondary daily weather data collected from meteorological station of Citeko from January 1, 1991, to June 22, 2013. Nine weather variables were examined, that is the minimum, maximum, and average temperatures, average humidity, rainfall rate, duration of solar radiation, maximum and average wind speed, as well as wind direction at maximum speed. The simulation results show that the three modified VAR-IM methods can improve the accuracies in around 75% of cases. The simulation results also show that imputation results of VAR-IM-based methods tend to be more stable in accuracy as the missing proportion increase compared to the imputation results of univariate imputation methods.

Downloads

Download data is not yet available.

References

K. K. Tung and J. Zhou, “Using data to attribute episodes of warming and cooling in instrumental records,” in Proceedings of the National Academy of Sciences of the United States of America, 2013, vol. 110, no. 6, pp. 2058–2063. doi: 10.1073/pnas.1212471110.

G. Pastorello et al., “Observational data patterns for time series data quality assessment,” in Proceedings - 2014 IEEE 10th International Conference on eScience, eScience 2014, 2014, vol. 1. doi: 10.1109/eScience.2014.45.

S. Hunziker et al., “Identifying, attributing, and overcoming common data quality issues of manned station observations,” Int. J. Climatol., vol. 37, no. 11, pp. 4131–4145, 2017, doi: 10.1002/joc.5037.

D. C. Montgomery, C. L. Jennings, and M. Kulahci, Introduction Time Series Analysis and Forecasting, 2nd ed. John Wiley & Sons, Inc.: Hoboken, New Jersey, 2016.

J. A. Saunders, N. Morrow-Howell, E. Spitznagel, P. Doré, E. K. Proctor, and R. Pescarino, “Imputing missing data: A comparison of methods for social work researchers,” Soc. Work Res., vol. 30, no. 1, 2006, doi: 10.1093/swr/30.1.19.

A. Jadhav, D. Pramod, and K. Ramanathan, “Comparison of Performance of Data Imputation Methods for Numeric Dataset,” Appl. Artif. Intell., vol. 33, no. 10, 2019, doi: 10.1080/08839514.2019.1637138.

H. Demirhan and Z. Renwick, “Missing value imputation for short to mid-term horizontal solar irradiance data,” Appl. Energy, vol. 225, 2018, doi: 10.1016/j.apenergy.2018.05.054.

R. Y. P. Muflihah, “Perbandingan Teknik Interpolasi Berbasis R Dalam Estimasi Data Curah Hujan Bulanan Yang Hilang Di Sulawesi,” J. Meteorol. DAN Geofis., vol. 18, no. 3, pp. 107–111, 2017, [Online]. Available: https://puslitbang.bmkg.go.id/jmg/index.php/jmg/article/view/343

Y. Li, Z. Li, and L. Li, “Missing traffic data: Comparison of imputation methods,” IET Intell. Transp. Syst., vol. 8, no. 1, 2014, doi: 10.1049/iet-its.2013.0052.

S. Moritz and T. Bartz-Beielstein, “imputeTS: Time Series Missing Value Imputation in R,” R J., vol. 9, no. 1, pp. 207–218, 2017, doi: 10.32614/RJ-2017-009.

G. E. A. P. A. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised learning,” Appl. Artif. Intell., vol. 17, pp. 519–533, 2003, doi: 10.1080/713827181.

A. Flores, H. Tito, and C. Silva, “Local Average of Nearest Neighbors: Univariate Time Series Imputation,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 8, 2019, Accessed: Jun. 15, 2022. [Online]. Available: www.ijacsa.thesai.org

N. Savarimuthu and S. Karesiddaiah, “An unsupervised neural network approach for imputation of missing values in univariate time series data,” Concurr. Comput. Pract. Exp., vol. 33, no. 9, 2021, doi: 10.1002/cpe.6156.

W. Y. Lai, K. K. Kuok, S. Gato-Trinidad, and K. X. L. Derrick, “A study on sequential K-nearest neighbor (SKNN) imputation for treating missing rainfall data,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 3, pp. 363–368, May 2019, doi: 10.30534/ijatcse/2019/05832019.

Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent Neural Networks for Multivariate Time Series with Missing Values,” vol. 8, p. 6085, 2018, doi: 10.1038/s41598-018-24271-9.

W. Cao, H. Zhou, D. Wang, Y. Li, J. Li, and L. Li, “BRITS: Bidirectional recurrent imputation for time series,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, pp. 6775–6785, 2018, Accessed: Jul. 25, 2021. [Online]. Available: https://proceedings.neurips.cc/paper/2018/hash/734e6bfcd358e25ac1db0a4241b95651-Abstract.html

Y. Luo, Y. Zhang, X. Cai, and X. Yuan, “E 2 GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation,” 2019.

Y. Liu, R. Yu, S. Zheng, E. Zhan, and Y. Yue, “NAOMI: Non-autoregressive multiresolution sequence imputation,” Adv. Neural Inf. Process. Syst., vol. 32, 2019, Accessed: Jul. 25, 2021. [Online]. Available: https://github.com/felixykliu/NAOMI

S. Liu and P. C. M. Molenaar, “iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models,” Behav. Res. Methods, vol. 46, no. 4, 2014, doi: 10.3758/s13428-014-0444-4.

F. Bashir and H. L. Wei, “Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm,” Neurocomputing, vol. 276, 2018, doi: 10.1016/j.neucom.2017.03.097.

K. Penny, C. Yozgat, C. İyigün, M. Türkeş, C. Yozgatligil, and S. Aslan, “Comparison of missing value imputation methods in time series: the case of Turkish meteorological data,” Theor. Appl. Climatol., vol. 112, pp. 143–167, 2013, doi: 10.1007/s00704-012-0723-x.

D. D. A. Nofianto, A. Djuraidah, and A. Rizki, “Penerapan Algoritme Expectation- Maximization with Bootstrapping (EMB) untuk Pendugaan Data Hilang Curah Hujan Kabupaten Indramayu,” IPB University, 2017. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/88373

H. Lütkepohl, New introduction to multiple time series analysis. 2005. doi: 10.1007/978-3-540-27752-1.

Published
2022-12-15
How to Cite
[1]
M. Rizal, A. Wigena, and F. Afendi, “TIME SERIES IMPUTATION USING VAR-IM (CASE STUDY: WEATHER DATA IN METEOROLOGICAL STATION OF CITEKO)”, BAREKENG: J. Math. & App., vol. 16, no. 4, pp. 1373-1384, Dec. 2022.