COMPARISON OF DOUBLE RANDOM FOREST AND LONG SHORT-TERM MEMORY METHODS FOR ANALYZING ECONOMIC INDICATOR DATA

ABSTRACT


INTRODUCTION
Time series data consists of sequences of observations collected over time [1]. Methods to analyze time series data have been widely developed, including several algorithms in machine learning [2]. Machine learning is an alternative method in analyzing time series data and it has been reported as powerful when the data is nonlinear [3]. Machine learning is categorized into two different types, supervised and unsupervised learning. Supervised learning creates a statistical model to predict or estimate an outcome based on one or more inputs, while unsupervised learning finds useful patterns in a dataset without output variables [4].
Some research compared machine learning and ARIMA as well as regression models in modeling and forecasting new Covid-19 cases in Nigeria [5]. Mualifah, et al. compared the performances of GARCH, LSTM, and hybrid GARCH-LSTM models for analyzing the dynamical pattern of stock price volatility of PT Bumi Resources Minerals Tbk in 2022 [6]. These researches showed that machine learning based methods outperform other methods. Another research was also conducted using tree-based methods applied to time series data. The research compared ARIMA, Decision Tree (DT), Random Forest (RF), and Gradient Boosted Trees (GBT) to predict monthly gold prices. The result of this study found that RF is superior to the other methods [7]. Sunwoo Han, Hyunjoong Kim, and Yung-Seop Lee developed a new supervised learning method called Double Random Forest (DRF) in 2020. This method was developed to overcome the limitation of Random Forest (RF) when the data is under-fitting. While RF has the best performance using minimum node size, it may have a chance to be under-fitting, or the size of the tree is not large enough. The previous study compared DRF and RF using 34 datasets with the best performance on minimum node size or under-fitting in RF. The result of this research found that DRF outperforms RF [8].
The other machine learning methods, more specifically Neural Networks (NNs), have the ability to model nonlinear data with high forecasting accuracy and minimal initial assumptions [2]. Long Short-Term Memory Networks (LSTMs) are one of the NNs methods which has the capability to analyze time series data. The main advantage of LSTMs is the ability to remember information for long time periods [9]. The previous research used LSTMs to forecast COVID-19 outbreaks in Egypt [10]. In addition, a comparative study found that LSTMs outperform ARIMA in forecasting cryptocurrency prices [11].
DRF and LSTMs have abilities to analyze under-fitting and nonlinear data, respectively. Therefore, this study aims to present a comparative study between DRF and LSTMs using under-fitting, nonunderfitting, and nonlinear data from Indonesia's economic indicators. The indicators used in this study are Export, Import, Official Reserves Assets, and Exchange Rates. These indicators were used because they have fluctuating patterns that tend to be nonlinear and will be suitable if these data are analyzed using machine learning methods. The best performance method was chosen using the least value of Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE).

RESEARCH METHODS
This study used Indonesia's economy indicators. The indicators used in this study are Export, Import, Official Reserve Assets in a million USD, and Exchange Rates obtained from Bank Indonesia's and International Monetary Fund's websites. These are monthly time series data from January 2010 to December 2021. DRF and LSTMs methods use output and input variables in the analysis process. The input variable used in this study is the data on previous lag (t-1), and the data at time t was used as the output variable in this study. The hyperparameters optimized in DRF are node size, defined as the minimum number of samples a terminal node should hold, and ntree, defined as the number of trees that will be created [8]. The hyperparameters optimized in LSTMs are epoch, defined as the number of times the algorithm will be run for the entire data [12], and learning rate, defined as a parameter that controls the amount of weight change that is updated during the training process in response to the error value generated during the training process. It can affect the convergence time of the gradient descent [4].
The analysis stages in this study are as follows. 1. Examining each indicator used in this study, whether the data is considered under-fit, non-under-fit, or nonlinear.
2. Carrying out the training process on the training data on all split groups of time series cross-validation using each hyperparameter combination of DRF and LSTMs. 3. Calculating RMSE, MAPE, and MAE from test data on all split groups of time series cross-validation data for each parameter combination. The model with optimal parameter is selected based on the least average value of RMSE, MAPE, and MAE from the combination of hyperparameters used on all split groups of time series cross-validation data. 4. The best model selected from step 3 is used to create a model using DRF and LSTMs with a proportion of 83% for training data and 17% for testing data. 5. Comparing the performance of DRF and LSTMs.

Double Random Forest (DRF)
The ensemble method is a method that combines prediction results from some baseline models to obtain enhanced performance [13]. Double Random Forest is a new ensemble method developed by Sunwoo Han, Hyunjoong Kim, and Yung-Seop Lee in 2020. This method can increase the can performance of Random Forest (RF) while RF is under-fitting because it can create a bigger tree which can reduce the bias in prediction. Under-fitting in RF can be identified using relative test MAPE, defined as (the MAPE value of RF under the given node size) / (the MAPE value of RF when node size is set to its default). Therefore, if the relative MAPE is greater than 1, RF with the largest trees is more accurate than that with the smaller trees. After all, this means that RF may under-fit. Here is the algorithm of DRF [8].
1. Each regression tree in DRF is created by using all the training data (D) with size . 2. Steps of selecting the best splitting: a. For a given node t, a random sampling of size is carried out with bootstrap, if the number of samples at node t is greater than × 0.1. If it does not meet the requirements, then bootstrap will not be done. b. Randomly select ≈ 3 or √ features. c. Find the best split features and cut points using the random feature subset. d. Send down the data using the best split features and cut points. e. Steps a to d are repeated until the stopping rules are met to obtain the estimation result from a regression tree. 3. Steps 1-2 are repeated until DRF creates b regression trees. 4. The prediction result from DRF is obtained using each tree's average prediction.

Long Short-Term Memory Networks (LSTMs)
Long Short-Term Memory Networks were introduced by Sepp Hochreiter and Jurgen Schimidhuber in 1997 [14]. LSTMs are special kinds of Recurrent Neural Networks (RNNs) and have capability to learn long-term temporal or sequential dependencies [15]. LSTMs contain an input gate, an output gate, a forget gate, and internal state (cell memory) [16]. Figure 1 shows the architecture of LSTMs [14]. : the weight matrixes of input gate, output gate, forget gate, and internal state ℎ , ℎ , ℎ , ℎ : the recurrent weights 2. Computing the error between the resulted data and input data of each layer.
3. The error is reversely propagated to the input gate, cell, and forget gate. 4. The optimization algorithm based on the error term is used to update the weight of each gate. 5. Stages 1-4 are reiterated for a set number of iterations until the biases and weights optimal value can be obtained.

Measure of Forecast Accuracy
The model performance of DRF and LSTMs is assessed by three measures of forecast accuracy, those are RMSE, MAPE, and MAE. The model with the least RMSE, MAPE, and MAE is considered the best performance. RMSE is the standard deviation of the model prediction results [17]. The formula of RMSE can be written as follows MAPE represents the mean absolute percentage error function for the prediction and the eventual outcomes, this error measures express error as a percentage [19]. The formula of MAPE can be written as follows [18].
with is the error value at time t, is the actual value at time t, and ̂ is the forecasted value at time t.

Under-fitting/Non-underfitting Data
Under-fitting data can be detected using Random Forest. If RF gives the best performance using the default node size = 5 or the largest tree is more accurate than the smaller one, it means that RF may underfit. Table 2 shows the MAPE value of each node size in Export, Import, Official Reserve Assets, and Exchange Rates data. Furthermore, the relative MAPE for each data is also calculated and shown in Figure 2. If the relative MAPE is greater than 1, it means that RF may under-fit because the largest tree gives a smaller MAPE value. The result of relative MAPE shown in Figure 2 shows that RF under-fit in Official Reserve Assets data, and RF does not under-fit in Export, Import, and Exchange Rates data.

Linear/Nonlinear Data
Terasvirta test was used in this study to detect nonlinearity in the data. The significance level used in this test is 0.05, with hypothesis null states that data has a linear relationship with its lag. Table 3 shows the result of the Terasvirta test for Export, Import, Official Reserves Assets, and Exchange Rates data.

The Results of DRF and LSTMs
Before comparing the performance of DRF and LSTMs, the optimal hyperparameter for each model was selected using time series cross-validation. Table 4 shows the list of hyperparameters which were used in this study. Time series cross-validation (TSCV) splits the time series data in sequence into K complementary partitions. The model is validated on a test set in one round of TSCV and then trained on other K-1 partitions (referred to as training sets). The validation processes are repeated for K-1 rounds using different K-1 time series partitions to decrease the model variability. Then the performances of the model across different K-1 validation partitions are averaged [21]. Table 5 shows the data partition for time series cross-validation with = 6.  This study used RMSE, MAPE, and MAE resulted from time series cross-validation to evaluate the optimal hyperparameters. Table 6 shows the optimal hyperparameters of DRF and LSTMs for Export, Import, Official Reserves Assets, and Exchange Rates data. After the optimal hyperparameters of DRF and LSTMs were selected, this study evaluated the performance of DRF and LSTMs for each data by splitting the data into training and testing data with proportions 83% and 17% respectively. According to the result in 3.1 and 3.2 Export data is considered nonunderfitting and nonlinear data, import data is considered as non-underfitting and linear data, Official Reserve Assets data is considered under-fitting and linear data, and Exchange Rates data is considered non-underfitting and nonlinear data. The performance comparison of DRF and LSTMs for each data is shown in Table 7, Table 8, and Table 9. The visualization of actual data and forecasted data for training and testing data are shown in Figures  3-6. The black color in the figure represents the actual data, the green color in the figure represents the forecasted value of training data, and the red color in the figure represents the forecasted value of testing data. According to the result of the measure of forecast accuracy and the visualization of actual and forecasted data in Figures 3-6, LSTMs still outperform DRF whether the data is under-fitting, nonunderfitting, or nonlinear.

CONCLUSIONS
This study used Indonesia's economy indicators considered to under-fitting, non-underfitting, and nonlinear data. These indicators are Export, Import, Official Reserve Assets, and Exchange Rates. The performance comparison results using RMSE, MAPE, and MAE showed that LSTMs have higher performance than DRF in each data.