TIME SERIES IMPUTATION USING VAR-IM (CASE STUDY: WEATHER DATA IN METEOROLOGICAL STATION OF CITEKO)
Abstract
Univariate imputation methods are defined as imputation methods that only use the information of the target variable to estimate missing values. While univariate imputation methods are convenient and flexible since no other variable is required, multivariate imputation methods can potentially improve imputation accuracy given that the other variables are relevant to the target variable. Many multivariate imputation methods have been proposed, one of which is Vector Autoregression Imputation Method (VAR-IM). This study aims to compare imputation results of VAR-IM-based methods and univariate imputation methods on time series data, specifically on long lag seasonal data such as daily weather data. Three modified VAR-IM methods were studied using simulations with three steps: deletion, imputation, and evaluation. The deletion step was conducted using six different schemes with six missing proportions. The simulations were conducted on secondary daily weather data collected from meteorological station of Citeko from January 1, 1991, to June 22, 2013. Nine weather variables were examined, that is the minimum, maximum, and average temperatures, average humidity, rainfall rate, duration of solar radiation, maximum and average wind speed, as well as wind direction at maximum speed. The simulation results show that the three modified VAR-IM methods can improve the accuracies in around 75% of cases. The simulation results also show that imputation results of VAR-IM-based methods tend to be more stable in accuracy as the missing proportion increase compared to the imputation results of univariate imputation methods.
Downloads
References
K. K. Tung and J. Zhou, “Using data to attribute episodes of warming and cooling in instrumental records,” in Proceedings of the National Academy of Sciences of the United States of America, 2013, vol. 110, no. 6, pp. 2058–2063. doi: 10.1073/pnas.1212471110.
G. Pastorello et al., “Observational data patterns for time series data quality assessment,” in Proceedings - 2014 IEEE 10th International Conference on eScience, eScience 2014, 2014, vol. 1. doi: 10.1109/eScience.2014.45.
S. Hunziker et al., “Identifying, attributing, and overcoming common data quality issues of manned station observations,” Int. J. Climatol., vol. 37, no. 11, pp. 4131–4145, 2017, doi: 10.1002/joc.5037.
D. C. Montgomery, C. L. Jennings, and M. Kulahci, Introduction Time Series Analysis and Forecasting, 2nd ed. John Wiley & Sons, Inc.: Hoboken, New Jersey, 2016.
J. A. Saunders, N. Morrow-Howell, E. Spitznagel, P. Doré, E. K. Proctor, and R. Pescarino, “Imputing missing data: A comparison of methods for social work researchers,” Soc. Work Res., vol. 30, no. 1, 2006, doi: 10.1093/swr/30.1.19.
A. Jadhav, D. Pramod, and K. Ramanathan, “Comparison of Performance of Data Imputation Methods for Numeric Dataset,” Appl. Artif. Intell., vol. 33, no. 10, 2019, doi: 10.1080/08839514.2019.1637138.
H. Demirhan and Z. Renwick, “Missing value imputation for short to mid-term horizontal solar irradiance data,” Appl. Energy, vol. 225, 2018, doi: 10.1016/j.apenergy.2018.05.054.
R. Y. P. Muflihah, “Perbandingan Teknik Interpolasi Berbasis R Dalam Estimasi Data Curah Hujan Bulanan Yang Hilang Di Sulawesi,” J. Meteorol. DAN Geofis., vol. 18, no. 3, pp. 107–111, 2017, [Online]. Available: https://puslitbang.bmkg.go.id/jmg/index.php/jmg/article/view/343
Y. Li, Z. Li, and L. Li, “Missing traffic data: Comparison of imputation methods,” IET Intell. Transp. Syst., vol. 8, no. 1, 2014, doi: 10.1049/iet-its.2013.0052.
S. Moritz and T. Bartz-Beielstein, “imputeTS: Time Series Missing Value Imputation in R,” R J., vol. 9, no. 1, pp. 207–218, 2017, doi: 10.32614/RJ-2017-009.
G. E. A. P. A. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised learning,” Appl. Artif. Intell., vol. 17, pp. 519–533, 2003, doi: 10.1080/713827181.
A. Flores, H. Tito, and C. Silva, “Local Average of Nearest Neighbors: Univariate Time Series Imputation,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 8, 2019, Accessed: Jun. 15, 2022. [Online]. Available: www.ijacsa.thesai.org
N. Savarimuthu and S. Karesiddaiah, “An unsupervised neural network approach for imputation of missing values in univariate time series data,” Concurr. Comput. Pract. Exp., vol. 33, no. 9, 2021, doi: 10.1002/cpe.6156.
W. Y. Lai, K. K. Kuok, S. Gato-Trinidad, and K. X. L. Derrick, “A study on sequential K-nearest neighbor (SKNN) imputation for treating missing rainfall data,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 8, no. 3, pp. 363–368, May 2019, doi: 10.30534/ijatcse/2019/05832019.
Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent Neural Networks for Multivariate Time Series with Missing Values,” vol. 8, p. 6085, 2018, doi: 10.1038/s41598-018-24271-9.
W. Cao, H. Zhou, D. Wang, Y. Li, J. Li, and L. Li, “BRITS: Bidirectional recurrent imputation for time series,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, pp. 6775–6785, 2018, Accessed: Jul. 25, 2021. [Online]. Available: https://proceedings.neurips.cc/paper/2018/hash/734e6bfcd358e25ac1db0a4241b95651-Abstract.html
Y. Luo, Y. Zhang, X. Cai, and X. Yuan, “E 2 GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation,” 2019.
Y. Liu, R. Yu, S. Zheng, E. Zhan, and Y. Yue, “NAOMI: Non-autoregressive multiresolution sequence imputation,” Adv. Neural Inf. Process. Syst., vol. 32, 2019, Accessed: Jul. 25, 2021. [Online]. Available: https://github.com/felixykliu/NAOMI
S. Liu and P. C. M. Molenaar, “iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models,” Behav. Res. Methods, vol. 46, no. 4, 2014, doi: 10.3758/s13428-014-0444-4.
F. Bashir and H. L. Wei, “Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm,” Neurocomputing, vol. 276, 2018, doi: 10.1016/j.neucom.2017.03.097.
K. Penny, C. Yozgat, C. İyigün, M. Türkeş, C. Yozgatligil, and S. Aslan, “Comparison of missing value imputation methods in time series: the case of Turkish meteorological data,” Theor. Appl. Climatol., vol. 112, pp. 143–167, 2013, doi: 10.1007/s00704-012-0723-x.
D. D. A. Nofianto, A. Djuraidah, and A. Rizki, “Penerapan Algoritme Expectation- Maximization with Bootstrapping (EMB) untuk Pendugaan Data Hilang Curah Hujan Kabupaten Indramayu,” IPB University, 2017. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/88373
H. Lütkepohl, New introduction to multiple time series analysis. 2005. doi: 10.1007/978-3-540-27752-1.
Copyright (c) 2022 Muhammad Edy Rizal, Aji H Wigena, Farit M Afendi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.