PREDICTION INTERVALS IN MACHINE LEARNING: RESIDUAL BOOTSTRAP AND QUANTILE REGRESSION FOR CASH FLOW ANALYSIS

Wa Ode Rahmalia Safitri; Farit Mochamad Afendi; Budi Susetyo

doi:10.30598/barekengvol19iss3pp1625-1636

Wa Ode Rahmalia Safitri Statistics and Data Science Department, Faculty of Mathematics and Natural Sciences, IPB University, Indonesia https://orcid.org/0009-0005-2641-867X
Farit Mochamad Afendi Statistics and Data Science Department, Faculty of Mathematics and Natural Sciences, IPB University, Indonesia https://orcid.org/0009-0006-9172-9455
Budi Susetyo Statistics and Data Science Department, Faculty of Mathematics and Natural Sciences, IPB University, Indonesia https://orcid.org/0000-0001-7772-3897

DOI: https://doi.org/10.30598/barekengvol19iss3pp1625-1636

Keywords: Bootstrap Method, CatBoost, LightGBM, Prediction Intervals, Quantile Regression, XGBoost

Abstract

Time series forecasting often faces challenges in producing reliable predictions due to inherent uncertainty in dynamic systems. While point predictions are commonly used, they may not adequately capture this uncertainty, especially in financial systems where forecasting accuracy directly impacts decision-making. Prediction intervals offer a solution by providing a range of likely outcomes rather than single-point estimates. This study implements multivariate time series forecasting using gradient boosting algorithms (XGBoost, CatBoost, and LightGBM) to predict cash flow patterns in banking transactions, focusing on constructing reliable prediction intervals. Using transaction data from Bank Rakyat Indonesia (BRI), the research analyzes both office and e-channel transactions with different lag structures based on Granger Causality tests. Model performance was evaluated using RMSLE, MAE, and MAPE metrics, with RMSLE chosen as primary due to its ability to handle skewed distributions. LightGBM achieved best performance for office cash-in transactions (RMSLE: 0.2395), while CatBoost outperformed others for office cash-out (RMSLE: 0.2848), e-channel cash-in (RMSLE: 0.3946), and e-channel cash-out (RMSLE: 0.4221). For prediction intervals, two methods were compared: Residual Bootstrap with 500 samples and Quantile Regression. Residual Bootstrap generally produced coverage probabilities closer to the 80% level (i.e., 10–90% prediction interval), especially for office transactions, while maintaining narrower interval widths. In contrast, Quantile Regression tended to generate wider intervals and often overestimated uncertainty, resulting in overly high coverage in some cases. However, both methods showed clear limitations when applied to e-channel transactions, particularly for cash-in e-channel, where coverage probabilities fell below 50% due to high volatility and irregular transaction patterns. Unlike previous work focused only on point forecasts, this study offers insights into forecast uncertainty by evaluating how well each method quantifies, providing practical guidance for financial institutions aiming to improve risk management through interval-based forecasting.

Downloads

Download data is not yet available.

References

D. C. Montgomery, C. L. Jennings, dan M. Kulahci, “WILEY SERIES IN PROBABILITY AND STATISTICS,” 2015.

H. Peters, “PREDICTION INTERVALS IN MACHINE LEARNING,” https://medium.com/@heinrichpeters/prediction-intervals-in-machine-learning-a2faa36b320c.

B. Efron, “BOOTSTRAP METHODS: ANOTHER LOOK AT THE JACKKNIFE,” The Annals of Statistics, vol. 7, no. 1, hlm. 1–26, Jan 1979, https://doi.org/10.1214/aos/1176344552.

R. Koenker dan G. Bassett, “REGRESSION QUANTILES,” Econometrica, vol. 46, no. 1, hlm. 33–50, 1978, https://doi.org/10.2307/1913643.

Rob Hyndman dan G. Athanasopoulos, FORECASTING: PRINCIPLES AND PRACTICE, 3rd ed. OTexts, 2021.

A. Susanti, Suhartono, H. J. Setyadi, M. Taruk, Haviluddin, dan P. P. Widagdo, “FORECASTING INFLOW AND OUTFLOW OF MONEY CURRENCY IN EAST JAVA USING A HYBRID EXPONENTIAL SMOOTHING AND CALENDAR VARIATION MODEL,” dalam Journal of Physics: Conference Series, Institute of Physics Publishing, Mar 2018. https://doi.org/10.1088/1742-6596/979/1/012096.

Fahmi, “THE ACCURACY FORECASTING OF CASH INFLOW AND CASH OUTFLOW USING DETERMINISTIC, STOCHASTIC AND HYBRIDIZATION MODELS,” Jurnal Manajemen dan Perbankan (JUMPA), vol. 11, no. 2, hlm. 1–11, 2024, https://doi.org/10.55963/jumpa.v11i2.628.

N. A. Salehah, “PENERAPAN MODEL HYBRID ARIMAX-QUANTILE REGRESSION UNTUK PERAMALAN INFLOW DAN OUTFLOW PECAHAN UANG KARTAL DI JAWA TIMUR,” Institut Teknologi Sepuluh Nopember, Surabaya, 2017.

P. Cogneau dan V. Zakamouline, “BOOTSTRAP METHODS FOR FINANCE: REVIEW AND ANALYSIS,” Mei 2010.

T. Chen dan C. Guestrin, “XGBOOST: A SCALABLE TREE BOOSTING SYSTEM,” dalam Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, dalam KDD ’16. New York, NY, USA: Association for Computing Machinery, 2016, hlm. 785–794. https://doi.org/10.1145/2939672.2939785.

A. Mehdary, A. Chehri, A. Jakimi, dan R. Saadane, “HYPERPARAMETER OPTIMIZATION WITH GENETIC ALGORITHMS AND XGBOOST: A STEP FORWARD IN SMART GRID FRAUD DETECTION,” Sensors, vol. 24, no. 4, 2024, https://doi.org/10.3390/s24041230.

G. Ke dkk., “LightGBM: A HIGHLY EFFICIENT GRADIENT BOOSTING DECISION TREE,” 2017. [Daring]. Tersedia pada: https://github.com/Microsoft/LightGBM.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, dan A. Gulin, “CatBoost: Unbiased Boosting With Categorical Features,” dalam Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, dan R. Garnett, Ed., Curran Associates, Inc., 2018. [Daring]. Tersedia pada: https://proceedings.neurips.cc/paper_files/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf

V. L. Justus, V. B. Rodrigues, dan A. R. dos S. Sousa, “Bootstrap CONFIDENCE INTERVALS: A COMPARATIVE SIMULATION STUDY,” Apr 2024, [Daring]. Tersedia pada: http://arxiv.org/abs/2404.12967

E. Paparoditis dan H. L. Shang, “BOOTSTRAP PREDICTION BANDS FOR FUNCTIONAL TIME SERIES,” Apr 2020, [Daring]. Tersedia pada: http://arxiv.org/abs/2004.03971.

R. Koenker dan K. F. Hallock, “QUANTILE REGRESSION,” Journal of Economic Perspectives, vol. 15, no. 4, hlm. 143–156, Des 2001, https://doi.org/10.1257/jep.15.4.143.

R. Y. Assoc, R. Rao Kurada, dan S. Pattem Asst, “AN APPROACH TO IDENTIFY ACCURATE MACHINE LEARNING MODEL TO BUILD HUMAN STRESS LEVEL PREDICTION SYSTEM,” 2023. [Daring]. Tersedia pada: https://ssrn.com/abstract=4379061.

A. Botchkarev, “A NEW TYPOLOGY DESIGN OF PERFORMANCE METRICS TO MEASURE ERRORS IN MACHINE LEARNING REGRESSION ALGORITHMS,” Interdisciplinary Journal of Information, Knowledge, and Management, vol. 14, hlm. 45–76, 2019, https://doi.org/10.28945/4184.

A. Jadon, A. Patil, dan S. Jadon, “A COMPREHENSIVE SURVEY OF REGRESSION BASED LOSS FUNCTIONS FOR TIME SERIES FORECASTING,” Nov 2022, [Daring]. Tersedia pada: http://arxiv.org/abs/2211.02989.

K. Bandara, R. J. Hyndman, dan C. Bergmeir, “MSTL: A SEASONAL-TREND DECOMPOSITION ALGORITHM FOR TIME SERIES WITH MULTIPLE SEASONAL PATTERNS,” Jul 2021, [Daring]. Tersedia pada: http://arxiv.org/abs/2107.13462.

C. Amornbunchornvej, E. Zheleva, dan T. Y. Berger-Wolf, “VARIABLE-LAG GRANGER CAUSALITY FOR TIME SERIES ANALYSIS,” Des 2019, https://doi.org/10.1109/DSAA.2019.00016.

A. Faricha dkk., “COMPARISON STUDY OF TRANSFER FUNCTION AND ARTIFICIAL NEURAL NETWORK FOR CASH FLOW ANALYSIS AT BANK RAKYAT INDONESIA,” International Journal of Electrical and Computer Engineering, vol. 12, no. 6, hlm. 6635–6644, Des 2022, https://doi.org/10.11591/ijece.v12i6.pp6635-6644.

PREDICTION INTERVALS IN MACHINE LEARNING: RESIDUAL BOOTSTRAP AND QUANTILE REGRESSION FOR CASH FLOW ANALYSIS

Abstract

Downloads

References

Editorial Office

Contact Info