APPLICATION OF EXTREME LEARNING MACHINE METHOD ON STOCK CLOSING PRICE FORECASTING

ABSTRACT


INTRODUCTION
Investment is an activity of investing with a certain amount of capital to obtain benefits in the future, which aims to develop or expand assets and needs in the future [1]. According to data from the Indonesian Central Securities Depository (KSEI), in August 2022, the number of capital market investors exceeded 9.54 million investors with an increase of around 27.38% compared to December 2021. The Financial Services Authority (OJK) also recorded an eightfold increase in the number of capital market investors over the past five years.
Capital market investment options vary widely, one of which is stocks. Shares are defined as proof of capital ownership in a company with a nominal value, company name, and the rights and obligations of shareholders described by the company [2]. Economic growth in Indonesia, according to the World Bank report in 2022, is only around 5.2% with projected inflation of 4.2%. Inflation suppression can be done by investing in stocks. When stock investments are made when the price of necessities is increasing, the value of the currency will depreciate or weaken [3].
The investment must consider the risk to prevent and minimize losses that may occur, as well as obtain large profits. Therefore, the selection of stock types can be seen from the stock index created by the IDX. Stock indices are measured by the performance of stock prices with high liquidity, good company fundamentals, and large market capitalization. One of the stocks that meet the best index criteria is the shares of PT Aneka Tambang Tbk (ANTM). PT ANTM Tbk is one of the companies engaged in mining in Indonesia. The company won a special award in the "IDX Best Blue" category in 2016 because of the best performance growth and the most attractive to investors with good share price growth [4].
Forecasting is needed to analyze the risks that may occur in stock prices that have volatile price characteristics. Forecasting is a process used to make predictions [5]. Forecasting is believed to help determine decisions, policies, and strategies that will be implemented in the future. The data used in forecasting is time series data. However, it is necessary to use a method that is suitable for the conditions and data patterns so that good results can be obtained.
Artificial Neural Network (JST) is a modeling method that can capture and represent complex input and output relationships. There are many JST methods applied in forecasting with low learning speeds resulting in computational delays due to the use of gradient-based training algorithms and parameters determined iteratively [6]. Extreme Learning Machine (ELM) is a renewal method of JST developed by Guang Bin Huang.
In 2019, Alfiyatin et al. predicted the inflation rate in Indonesia using the ELM method. The results showed an accuracy value of 0.0202 which is better than the Backpropagation method with an accuracy value of 1.1604. The results show the advantages of the ELM method in the efficient use of computing time in the training process and the weakness in determining the number of neurons in the hidden layer through a trialand-error process. The ELM method was also applied in research by Wati et al. in 2019 on forecasting gold prices for investors [7]. This research produces an average error value of 0.29%.
Guang Bin Huang developed ELM to overcome the weaknesses of JSTs that have developed, especially in terms of learning speed. The ELM method is applied to train single-hidden layer feedforward networks (SLFN). The ELM method uses random weights and biases between the input and hidden layers. The advantages of the ELM method in learning speed and good generalization make this method better than other forecasting methods, such as the Support Vector Machine method [8]. Methods used in forecasting other than the ELM method are the Moving Average method, JST Backpropagation, Regression Analysis, Fuzzy Time Series, and Machine Learning. The ELM method has the advantage of faster forecasting performance in commonly used forecasting methods because the parameters are selected randomly. The learning speed process is good at generalizing without overtraining problems and can provide good results according to the input used. The ELM method is applied to forecast the closing price of PT ANTM Tbk shares and analyze the accuracy of the forecasting results generated by the ELM method.

Extreme Learning Machine
Extreme Learning Machine (ELM) is a new learning method with the application of a single hidden layer known as Single Hidden Layer Feedforward Neural Networks (SLFNs). The ELM method was formed to overcome the weaknesses in feedforward artificial neural networks, especially with very low learning speeds [9].
In the ELM method, all parameters, such as input weight and hidden bias, are determined and selected randomly with the activation function applied to the hidden layer.
In general, there is a mathematical model for the application of the ELM method as follows. For ̃ (number of neurons in the hidden layer) with a different number of samples where ( , ).
Standard SLFNs with N ̃ hidden layer neurons and g(x) activation function can be modeled as follows: Where, : the weight vector connecting the ℎ hidden neuron with the input neuron ( = 1,2, … , ), : the weight vector connecting the ℎ hidden neuron with the output neuron ( = 1,2, … , ), : the bias value of the ℎ hidden neuron, ̃: number of hidden neurons, : output value, dan : input value with = 1,2, … , with is the amount of data.
Standard SLFNs can approximate samples assuming the error rate is zero, so ∑ ‖̅̅̅ − ‖ = 0 =1 , then ̅̅̅ = , with , , and resulting in the following equation: can be written by, = where, ( 1 , … ,̃, 1 , … ,̃, 1 , … , ) = [ : [ : the output matrix of the hidden layer with the ℎ column is the output of the ℎ hidden neuron, where the inputs are 1 , 2 , … , . : matrix of desired outputs. The ELM method uses input weights and biases in the hidden layer that are determined randomly, so that the input weights associated with the hidden layer can be determined through the following equation: + : pseudo inverse of matrix [10].

Data Pre-processing and Post-processing
Data pre-processing is the process of normalizing data by making changes to the form of data into more specific values in the 0-1 range. The goal is to adjust the input data or input with the output data or output. The method used in data normalization is Min-Max Normalization [11]. The data normalization is formulated as follows: with, ′ : data normalization results, : actual stock data, and , : minimum, maximum data values.
Meanwhile, data post-processing is a data denormalization process in the form of returning previously normalized data values to their actual value based on forecasting results [12]. Data denormalization can be done based on the following formula.

Forecasting Accuracy
Forecasting results certainly have a measure of error that indicates the accuracy of a forecast. The smaller the error value, the higher the forecasting accuracy [13]. Forecasting accuracy is seen based on the Mean Absolute Percentage Error (MAPE) value.

Mean Absolute Percentage Error (MAPE)
, MAPE is a method of measuring the level of forecasting accuracy by comparing the difference between the forecast value and the actual value.
MAPE assessment criteria are [14]:  Table 1 shows the assessment criteria of MAPE used to see the level of accuracy of the resulting model. The best accuracy value criteria are below 10%, while the worst accuracy criteria are above 50%.

Extreme Learning Machine Process
The ELM method has a working process that is divided into two, that is the training process and the testing process. The ELM method is a training method that shows if the more training data used, the better the forecasting produced by paying attention to forecasting accuracy or small error values [15].

Proses Training
The training process aims to form a model in the ELM method [16]. The training process is also carried out to obtain the output weight value used in the next process. The stages in the training process are: a. Input and pre-processing of training data.
b. Model identification based on PACF plots using normalized training data by paying attention to significant lags. c. Initial weight initialization in the form of input layer weights to hidden nodes and bias weights to hidden nodes, based on random numbers influenced by activation functions with a range of [0,1], according to the range of binary sigmoid functions [17]. d. Each input unit ( 1 , … , ) receives a signal from the input and is forwarded to all units of the layer above it or the hidden unit. e. Calculate the output value in the hidden layer according to the binary sigmoid activation function.
f. Calculate the weight value connecting the hidden layer with the output layer. g. Calculate all outputs generated. h. Perform data post-processing.
After the final training output value has been obtained, accuracy testing is carried out to see the MAPE training value. This is done to determine whether the final weights and biases meet the criteria to be the best model used in the testing process.

Testing Process
The testing process is carried out to evaluate the performance of the ELM method based on the results of the training process carried out previously. The steps in the testing process are: a. Input and pre-processing of testing data. b. Perform testing data processing using the optimal weight and bias initialization of the best model obtained during the training process. c. After the output value of the testing data is obtained or the forecasting results have been obtained, data post-processing is carried out. d. After the data is denormalized, accuracy testing is carried out by looking at the MAPE value. e. The last stage that must be done is forecasting according to the desired forecasting period.

RESULTS AND DISCUSSION
This study uses secondary data obtained from the site https://id.investing.com. The data used is daily data from the closing price of PT ANTM Tbk shares from January 1, 2018 to October 31, 2022.

Data Sharing
In forecasting the closing price of PT ANTM Tbk shares, data is divided into two, that is training data and testing data. The composition of training data and testing data used in forecasting the closing price of PT ANTM Tbk shares. The training data used is 95% of the total data, which is 1150 data. Meanwhile, the testing data is 5% of the total data, that is 60 data.

ELM Training Process
Training data is used in model building and testing data is used to see the accuracy of the model and perform forecasting for the next period.

Data Pre-processing
The first step in the forecasting process with the ELM method is data normalization. Data normalization is carried out in accordance with the range of binary sigmoid activation functions, that is 0-1. The results of normalizing the training data used in model building can be seen in Table 2.  Table 2, shows the results of normalization of training data with a composition of 95% (1150 data) of the total closing stock data of PT ANTM Tbk.

Model Identification with PACF Plot
Model identification is carried out to obtain the determination of input variables based on significant lags with the help of PACF plots. The PACF plot of the closing price of PT ANTM Tbk shares can be seen in Figure 2.

Figure 2. PACF Plot Model Identification
Based on Figure 2, the PACF plot shows that the lag that has a significant influence is at lag 1 and lag 4, so it can be said that the closing price data of PT ANTM Tbk shares can be modeled and influenced by − 1 and − 4. Therefore, the input variables used in the training process using the ELM method are the closing price of PT ANTM Tbk shares with lag − 1 and lag − 2, and the variable closing price of PT ANTM Tbk shares as output.

Arsitektur Training ELM
After determining the input variables, the hidden nodes are determined. In ELM method, it is not explained exactly the maximum limit of the number of hidden nodes that can be used. In forecasting the closing price of PT ANTM Tbk shares, the number of hidden nodes with a value range of 5-50 is applied [18]. The variables used in the study are presented in Table 3. Table 3.

Variable Category
The closing price of PT ANTM Tbk shares with lag − 1 Input The closing price of PT ANTM Tbk shares with lag − 4

Hidden nodes range 5-50
Hidden nodes PT ANTM Tbk stock closing price Output Based on Table 3, the architecture and combination of the input layer with hidden nodes can be formed as many as three combinations, that is the combination of input − 1, the combination of input − 4, and the combination of input − 1, − 4. Each combination formed is run in the ELM training process with the input-hidden nodes-output architecture. For example, the 1 − 5 − 1 architecture shows that the architecture used has one input, five hidden nodes, and one output. Likewise, the 2 − 5 − 1 architecture shows an architecture with two inputs, five hidden nodes, and one output.

ELM Method Data Processing
Some parameters in ELM through package nnfor in Rstudio software include: 1. Hidden nodes (hd) are adjusted to the number of neurons used in the study. 2. Reptition (reps) is a form of repetition carried out to train the ELM method on the data used. In forecasting the closing price of PT ANTM Tbk shares, 30 reps are used in each architecture formed in carrying out the training process. 3. Lags are adjusted to the input variables obtained from the PACF plots.
One example of the ELM method modeling results by applying the architecture and parameters that have been formed obtained the following training results: Based on Table 4, it can be seen that the results of ELM training with input t-1 produce the smallest MAPE value on hidden nodes 31 of 0.0274 with a duration or computation time of 6.7574s. The training process in the ELM method has a randomly initialized weight value to facilitate the calculation of the output in the hidden layer. Weight initialization is formed as much as the number of inputs multiplied by the number of hidden nodes in an ELM architecture. After the weights are initialized, the output of the hidden layer is calculated using the activation function. Thus, the final weight between the hidden layer and the output layer is formed. Next, the output value is calculated to obtain forecasting results on the training data. The following is an example of the output value or forecast value of the 1 − 31 − 1 architecture with input − 1 generated before testing its accuracy, which is presented in Table 5. Based Table 5, shows the forecasting output values generated with the architecture and parameters applied in the study. After that, the data post-processing process is carried out or returns the data form to the initial form with the data denormalization process. The denormalized forecasting output value is calculated and compared to the accuracy value for each architecture. The highest accuracy value with the lowest error rate becomes the best model used in the testing process. This process is applied to each architecture with the combination of inputs and repetitions used in the study.

Forecasting Accuracy using MAPE
After obtaining the training results on each architecture with the hidden nodes and repetition parameters used, the next step is to compare the performance of the resulting MAPE value through the MLmetrics package in the Rstudio software. The architecture with hidden nodes and repetitions that has the smallest MAPE value becomes the best model for the forecasting process. A summary of the accuracy values for each architecture is presented in Table 6.  Table 6, shows a summary of the forecasting accuracy of each architecture formed with hidden nodes ranging from 5-50 and the input variables used in the study. It can be seen that the resulting MAPE values vary greatly. However, all the resulting MAPE values are very good because based on Table 1, which includes the MAPE criteria, the MAPE value is obtained below 10%, meaning that the model formed has excellent and accurate forecasting capabilities.

Best Model
Based on Table 6, the best model is obtained with architecture 1 − 31 − 1 with input − 1. The resulting MAPE value is 0.0274 which means that the level of forecasting error or accuracy produced by the architecture 1 − 31 − 1 is 2.74%, below 10%, meaning that the model has excellent forecasting capabilities. The duration or computation time generated by the 1 − 31 − 1 is also very good at only 6.7574s. The ELM model with architecture 1 − 31 − 1 can be seen in Figure 3.

ELM Testing Process
The testing process or testing on the ELM method is carried out with a data composition of 5% of the total data. The data is first subjected to data pre-processing. After the testing data is normalized, the testing process is carried out by directly entering all input values and weights from the training process. The architecture used in the testing process is the best model architecture formed in the training process. Testing data processing is carried out using the best model. After the forecasting value is generated, a data postprocessing process is carried out in the form of data denormalization to facilitate the calculation of forecasting accuracy before being used for forecasting. The results of forecasting and denormalization of testing data are presented in Table 7. Based on Table 7, the forecasting accuracy value is 0.0358 (3.58%) as seen from the MAPE. The comparison between testing data and forecasting results can be seen in Figure 4. In Figure 4, it can be seen that the ELM forecasting results are close to the actual data. So, it can be said that the best model based on the 1 − 31 − 1 architecture is very good to continue the forecasting process.

Forecasting
The results of forecasting the closing price of PT ANTM Tbk shares for the next 10 days using the ELM method with the best architecture, that is 1 − 31 − 1, obtained the following forecasting results: Based on Table 8, it can be seen that the closing price forecasting value of PT ANTM shares is quite stable. The lowest forecast price was 1856.4270 on November 14, 2022. However, it can be seen that the stock price has decreased significantly. The graph of the results of forecasting the next 10 days of the closing price of PT ANTM Tbk shares using the ELM method is presented in Figure 5.

CONCLUSIONS
Based on the results of the research conducted, it can be concluded: 1. The results of forecasting the closing price of PT ANTM Tbk shares show the highest forecast price of 1864.5150 on November 1, 2022. Meanwhile, the lowest forecast price on November 14, 2022, was 1856.4270. However, it can be seen that during the next 10 days, PT ANTM Tbk shares are quite stable and able to survive in the capital market because because it had a relatively low decline. 2. The forecasting accuracy generated with the 1 − 31 − 1 architecture as the best model in the ELM method has very good results. This is evidenced by the resulting MAPE value of 0.0358 or 3.58%. Forecasting accuracy with the ELM method is in the criteria below 10% which means that the forecasting ability is very good and accurate. From each architecture formed in the ELM training process, the accuracy value tends to be below 10%. This shows that the ELM method is very well used in forecasting. In addition, the computation time required in forecasting is only 6.7574 seconds.