PERFORMANCE COMPARISON OF SARIMA INTERVENTION AND PROPHET MODELS FOR FORECASTING THE NUMBER OF AIRLINE PASSENGER AT

Article History: The impact of the COVID-19 pandemic on the air transportation sector, particularly Soekarno-Hatta (Soetta) International Airport, has been quite significant. The number of passengers at Soetta Airport has decreased due to the COVID-19 pandemic, but flight activities are still ongoing to this day. An accurate forecasting model is needed to predict the number of airline passengers at Soetta Airport with the presence of the COVID-19 pandemic as an intervention. In this study we discuss performance comparison of two models namely SARIMA intervention and Prophet in forecasting the number of domestic passengers at Soetta Airport. The research results showed that the best SARIMA intervention model was SARIMA (0,1,1)(1,0,0) 12 b = 0, s = 20, r = 0, with a Mean Absolute Percentage Error (MAPE) of 28% and Root Mean Square Error (RMSE) of 433473. On the other hand, the Prophet model yielded a MAPE of 37% and an RMSE of 497154. In terms of MAPE and RMSE, the SARIMA intervention method provides better results than the Prophet model in forecasting the number of domestic passengers at Soetta Airport.


INTRODUCTION
Time series data can be affected by external events, such as natural disasters, government policies, national holidays, financial crises, and regime changes.These events are referred to as interventions in time series data analysis.If time series data analysis is conducted without considering the impact of interventions, it will result in significant model errors.The larger the error value, the less accurate the generated model will be, and the less capable the model will be of depicting the observed data.Therefore, it is important to consider interventions in time series data modeling to ensure more accurate and relevant analysis results.
Interventions in time series data, as mentioned, can have a profound impact on the accuracy of the models employed.In particular, the COVID-19 pandemic serves as a significant example of such an intervention, disrupting various aspects of daily life, including the aviation industry.This context sets the stage for understanding how interventions can lead to substantial changes in time series data, as we will explore in the case of airline passenger at Soekarno-Hatta (Soetta) International Airport.
Soekarno-Hatta (Soetta) International Airport is Indonesia's largest and main airport, serving a busy traffic of domestic and international flights to various destinations.In 2018, it ranked 18th in the world and 1st in Southeast Asia as the busiest airport, serving thousands of passengers daily [1].However, the COVID-19 pandemic has significantly impacted the air transportation sector, including Soetta Airport.The COVID-19 pandemic began in Indonesia in March 2020.The government implemented a series of measures to contain the spread of the virus, including Large-Scale Social Restrictions (PSBB) and Community Activity Restrictions (PPKM).These measures had a significant impact on the number of airline passengers, which decreased significantly during several periods.Additionally, the requirement to provide expensive rapid test results and the fear of contracting the virus further discouraged people from flying.A study found that at least 34 of Indonesia's 187 airports suspended passenger flights during a ban on homecoming travel from April 24 to June 1, 2020.This was also reflected in the revenue decline of PT Garuda Indonesia, one of Indonesia's major airlines, in the first quarter of 2020 compared to the previous year.Additionally, there was no increase in passenger numbers during high-traffic periods such as Eid homecoming, school holidays, and the Hajj and Umrah pilgrimages in 2020, resulting in a significant decline in revenue [2].The COVID-19 pandemic is considered an intervention that disrupted the time series data of passenger numbers at Soetta Airport.Time series forecasting is the process of predicting future values in a time series based on existing patterns and trends in the data.The goal is to provide accurate and useful estimates of how the time series will develop in the future.Forecasting often combines mathematical models with the insights and considerations of a manager [3].In a previous study, [4] compared forecasting methods between SARIMA and SARIMA Intervention for the number of domestic passengers at Soekarno-Hatta Airport.They concluded that the SARIMA Intervention method was the best.Another study conducted by [5] compared ARIMA, ARIMA with intervention, and Extreme Learning Machine for forecasting the number of passengers at Domine Eduard Osok Sorong Airport and concluded that ARIMA with intervention reduces MAPE by up to 45%.[6] Conducted a comparison between the SARIMA method and the step function intervention model to forecast the number of visitors to the Londa tourist attraction, with the conclusion that the best model is SARIMA with intervention.Expanding on the previous research, this study will compare the SARIMA intervention model with a relatively new machine learning model called Prophet, which is claimed to be a powerful forecasting tool.The goal is to predict the number of passengers at Soekarno-Hatta Airport while considering the impact of Covid-19 as an intervention.SARIMA, or Seasonal Autoregressive Integrated Moving Average, is a statistical model used to analyze and predict data with seasonal patterns and trends.The SARIMA model is an extension of the Autoregressive Integrated Moving Average (ARIMA) model, with the addition of seasonal components [7].Seasonal models have observation series that depend on previous and repetitive observations, forming cyclic patterns.The general notation for SARIMA is ARIMA(p,d,q)(P,D,Q) s , where (p,d,q) represents the nonseasonal order of the model, (P,D,Q) represents the seasonal order, and s represents the number of periods per season [8].
Interventions are assumed to occur at a known point in time and cause a change in the pattern of time series data.Intervention analysis is conducted to determine the extent to which the intervention effect influences the pattern of the time series data.An ARIMA time series that exhibits seasonal patterns and includes interventions is referred to as a SARIMA Intervention model [6].Intervention events can result in two different types of impacts: temporary intervention events that occur within a specific period (pulse function) and intervention events that have long-term impacts (step function) [5].
The Prophet model, also known as FB Prophet, is a forecasting tool developed by Facebook's data science team.It is designed for forecasting time series data with a simple and user-friendly approach.The Prophet model consists of three main components: seasonality, trend, and holidays.Prophet incorporates time series interventions within the holiday component, which are referred to as special events [9].Prophet has several advantages over classical time series models like ARIMA.It can handle missing data, work with various seasonal patterns, and easily incorporate new components such as special events.Prophet is also easy to use, even for analysts who may not have an in-depth understanding of time series forecasting techniques This research aims to find the best model between SARIMA intervention and Prophet in forecasting seasonal time series data, such as the number of airline passengers at Soekarno-Hatta International Airport, considering the intervention of the Covid-19 pandemic.It is expected that the results of this research can be used to forecast seasonal time series data with interventions more accurately.

Data
The data used is the number of domestic airline passengers at Soetta International Airport obtained from the website of the Biro Pusat Statistik (BPS) [11].The data covers a monthly period starting from January 2010 to March 2023.There is an intervention period or special event due to the COVID-19 pandemic that affects the time series data, which started in March 2020 and lasted until December 2021.

Stages of Data Analysis
The stages of data analysis carried out in this study are as follows: 1. Exploring data through time series plots helps to understand the characteristics or patterns of the data.Estimates the parameters of the model using the Maximum Likelihood Estimation (MLE) method to estimate the parameters of the model.The hypothesis used in the MLE method is H0: The parameter estimate is equal to 0 (non-significant parameter), and H1: The parameter estimate is not equal to 0 (significant parameter) at a significance level of 5%.A good model is indicated when all parameter estimates are significant.
Perform a diagnostic check of the model to assess whether the residuals of the formed model exhibit the properties of white noise or not.Diagnostic tests on the model include tests for independence and normality of residuals.The independence of residuals is tested using the Ljung-Box test statistic with hypothesis H0: there is no serial autocorrelation in time series data, and H1: there is serial autocorrelation in time series data.While the normality is tested using the Kolmogorov-Smirnov test with hypothesis H0: the data followes a normal distribution, and H1: the data does not follow a normal distribution.The model meets the assumptions if the p-value > α (5%) [14].
iii.To obtain alternative models, perform model overfitting by gradually increasing the orders of p, q, P, and Q. Overfitting involves progressively increasing the orders of the model parameters.The selected overfitting model is the one in which all coefficient estimates are statistically significant.
iv. Evaluating the model from the value of the Akaike Information Criterion (AIC) with the formula [15]: where ℓ is the value of the log-likelihood model, and k is the number of parameters.The best SARIMA model is the model with the smallest AIC value.This model will have a good fit to the data and will be less likely to overfit the data.
b. Evaluate the model with training data during the intervention period obtain from step 3a(vi).
c. Make a graph of the residual SARIMA data model pre-intervention to identify the intervention order that exceeds the ±3 ̂ limit.
d. Identify the intervention response that is the order b, s, and r based on the residual graph.The parameters b, s, and r represent the start time of the intervention, the duration of the intervention until stability is restored, and the pattern of the intervention effect, respectively.
where A is actual value, F is the forecast value, and n is the number of data points.
6.The model with the smallest RMSE and MSE value is selected as the best forecast model [17].
The flowchart of the research method is illustrated in Figure 1 below.The Box-Plot in Figure 3 illustrates that the data reaches peaks in July and December within one seasonal cycle.The highest passenger count occurred in the year 2018.However, when the COVID-19 pandemic hit and the PPKM policy was implemented, a significant decrease in passenger count can be observed starting from March 2020 (T=123).This decline continued for some time until the passenger count returned to normal levels before the COVID-19 period.Therefore, the training data is divided into two parts: before intervention (January 2010 -February 2020) and after intervention (March 2020 -December 2021).The data from January 2022 to March 2023 is used as the test data for cross-validation.

Checking Stationerity in Mean
The SARIMA model was first tested for stationarity on the training data before the intervention consisting of 122 observations (Nt).The ACF plot (Figure 4) showed a slow decrease, indicating that the data was not stationary.This was confirmed by the ADF test, which had a p-value of 0.1987, which is greater than the significance level of 0.05.This means that the null hypothesis cannot be rejected, and the data can be concluded to be non-stationary.

Identification of Model
The SARIMA model is suitable for the given data of the number of airline passengers because the data exhibit seasonality.This means that the data has a regular pattern of ups and downs that repeats itself every 12 months.To identify the appropriate SARIMA model, we examine the ACF and PACF plots of the stationary data as shown in Figure 7.The ACF plot shows a cut-off at lag 1 and gradually decreases at seasonal lags (12, 24, 36, and so on).This indicates that there is a significant correlation between the data and its lagged values, but this correlation decreases as the lag increases.The PACF plot gradually decreases at non-seasonal lags and cuts off at the first seasonal lag (lag 12).This indicates that there is no significant correlation between the data and its lagged values beyond the seasonal lag.Based on this information, the tentative models obtained are SARIMA(0,1,1)(1,0,0) 12 and SARIMA(2,1,0)(1,0,0) 12   After identifying the two tentative models, the next step is to estimate the parameters of the models.The estimated parameters for both models are shown in Table 1.The p-values for all the parameters in both models are less than α (0,05), which indicates that all the parameter estimates are significant.9 below show that both models do not exhibit autocorrelation in their residuals.This is indicated by the fact that the ACF plots of the residuals remain within significant bounds.Additionally, the residual plots and Q-Q plots show that the residuals in both models are centered around the mean value.This is further supported by the Kolmogorov-Smirnov test, which yields a p-value greater than α, indicating that the residuals are normally distributed.Based on these results, it can be inferred that the residuals of both models meet the assumption of white noise.However, based on the criterion of the smallest AIC, the best model for Nt is SARIMA(0,1,1)(1,0,0) 12 .

Overfitting
The pre-intervention SARIMA model's overfitting results are presented in the following Table 2. From the overfitting results in Table 2, it can be observed that none of the models have all their parameter estimates significantly at the 5% significance level.Therefore, the best model for Nt is SARIMA(0,1,1)(1,0,0) 12 .

Identification of Intervention Response
The initial step to identify the intervention order is by observing the difference between the forecasted values of Nt for t ≥ T and the observed values Zt.The forecasting is performed for the duration of the intervention, which is from March 2020 to December 2021 (22 observations).As illustrated in Figure 10, the comparison between Zt and the forecasted values of Nt shows that the SARIMA(0,1,1)(1,0,0) 12   The outlier effect plot in Figure 11 shows that the intervention occurs abruptly and has a temporary impact.This indicates that the appropriate intervention pattern is an abrupt temporary intervention.

Figure 11. The plot of the intervention pattern
The order of intervention can be identified by observing the residual plot that exceeds the ±3 ̂ limit.The residual plot in Figure 12 shows that the residual first crosses the ±3 ̂ limit in March 2023.This indicates that the intervention begins at that time, suggesting an order of b of 0. The order of s is determined by how long the intervention lasts until the data returns to normal.The initial assumption for the order of s is 20.The order of r is determined by the presence or absence of a clear pattern in the residual plot.The value of r is 0 if the residual pattern is not clear, and 1 if there is a clear pattern.The tentative model for parameter estimation will have b = 0, s = 20, and either r = 0 or r = 1.  1  The estimated parameter values for the intervention model are presented in the following Table 3.The intervention model with all its estimated parameters being statistically significant at a significance level of 5% is b=0, s=20, and r=0.This means that the intervention had a significant impact on the data, and that the effect of the intervention was visible for 20 time periods after the intervention.The diagnostic tests of the model, examining the independence and normality of the residuals, are shown in Figure 13.The diagnostic tests show that the residuals are independent and normally distributed.This means that the model is a good fit for the data, and that the residuals are not influenced by any systematic patterns.The ACF plot of the residuals shows that no lag exceeds the significance boundary, indicating no significant autocorrelation.The Q-Q plot also reveals that the residuals are closely scattered around the mean value.The Ljung-Box test and Kolmogorov-Smirnov test yield p-values greater than the significance level, indicating that the residuals of the intervention model with b=0, s=20, and r=0 satisfy the assumptions of independence and normality.

Overfitting the Intervention Order
The next step is to try different values for the intervention order to see if any of them improve the model.The results of this are shown in Table 4. None of the models with intervention order b=0, s=21, r=0 or b=0, s=21, r=1 have all significant parameter estimates.This means that these models are not reliable and cannot be used.The best SARIMA intervention model for the number of domestic passengers at Soetta Airport is SARIMA(0,1,1)(1,0,0) 12 with intervention order b=0, s=20, and r=0.

Prophet Model
The Prophet method is a forecasting method that is designed to be simple and easy to use.It is implemented in the Prophet library in Python software.Prophet does not require differencing or Box-Cox transformation of non-stationary data, unlike the ARIMA method.The first step in using Prophet is to define the COVID-19 intervention period, which is from March 2020 to December 2021, as a special event.This will exclude the intervention period from the time series in the resulting model.This can be seen from the holidays component in Prophet, as shown in Figure 14.The residuals of the Prophet model were analyzed to check if they met the assumption of white noise.The ACF plot showed no significant lags, and the Ljung-Box test resulted in a p-value greater than the significance level.This indicates that the residuals are not autocorrelated.The Q-Q plot showed that the distribution of the residuals is tightly clustered around the middle value of 0. However, the Kolmogorov-Smirnov test resulted in a p-value less than the significance level, indicating that the residuals are not normally distributed.
The Central Limit Theorem states that a distribution can be approximated by a normal distribution when the sample size is equal to or greater than 30 [18].In this study, a sample size of 144 is used, so it can be said that the assumption of normality is satisfied.Therefore, the residuals of the Prophet model have met the assumption of white noise.

Determine The Best Model
After building the SARIMA intervention and Prophet models, the next step is to use them to forecast the test data (which has 15 data points).Figure 17 below shows a plot that compares the cross-validated forecasts of both models with the actual test data.The best model between SARIMA intervention and Prophet is selected by comparing their MAPE (Mean Absolute Percentage Error) and RMSE (Root Mean Square Error) scores.MAPE and RMSE are two common metrics used to evaluate the performance of forecasting models.MAPE stands for Mean Absolute Percentage Error, and it is calculated by taking the average of the absolute percentage errors between the predicted and actual values.RMSE stands for Root Mean Square Error, and it is calculated by taking the square root of the average of the squared errors between the predicted and actual values.Prophet is a newer model that was developed specifically for forecasting social and economic data.As a result, SARIMA may have been better able to capture the unique characteristics of airline passenger data during the COVID-19 period.Secondly, SARIMA allows for the use of intervention terms.Intervention terms are used to account for sudden changes in the data that are not captured by the model has underlying trend and seasonality components.The COVID-19 pandemic caused a sudden and significant decrease in airline passenger demand.SARIMA can account for this change may have led to more accurate forecasts than Prophet, which does not support intervention terms.Thirdly, Prophet is not as wellsuited for forecasting short-term trends.SARIMA models can effectively capture and predict patterns and fluctuations in the data over a relatively short period [20].Prophet is a good model for forecasting long-term trends, but it is not as well-suited for forecasting short-term trends.The COVID-19 pandemic caused a sharp decline in airline passenger demand that lasted for several months.Prophet may have struggled to accurately forecast this short-term trend, while SARIMA may have been better able to do so.
In this case, the SARIMA intervention model has a MAPE of 28% and an RMSE of 433473.This means that the model's forecasts are, on average, 28% off the actual values, and the values are spread out by 433473.While this is not perfect accuracy, it is still reasonably good, especially considering the impact of COVID-19 on the airline industry.

CONCLUSIONS
The best model for forecasting the number of domestic passengers at Soetta Airport is a time series model called SARIMA (0,1,1)(1,0,0) 12 with intervention orders b=0, s=20, and r=0.This model achieved a forecast accuracy of 28% measured by MAPE and 433.474 measured by RMSE.These results were obtained from a short-term forecast of 15 data points.Future research should expand the scope of the intervention period until December 2022, coinciding with the end of the PPKM measures, and obtain a sufficiently long test data set.This is expected to improve the accuracy of the forecasts.However, please note that the relevance of these recommendations may vary depending on the evolving conditions in the aviation industry and the global health situation.Continuous monitoring and periodic updates to the forecasting model will be essential to adapt to changing circumstances.

2 . 3 .
Splitting the data into two sets: the training data, which spans from January 2010 to December 2021, and the testing data, which spans from January 2022 to March 2023.Next, the training data (Zt) is further divided into two subsets: the pre-intervention training data (Nt) and the intervention training data (It).The intervention period begins in March 2020, so the intervention training data is defined as the period from March 2020 to December 2021.Develop an intervention SARIMA model, with the following steps: a. develop a SARIMA model for the pre-intervention training data (Nt) i. Checking for stationarity in mean and variance.If the data is not stationary in the mean, differencing is performed.The examination of stationarity in variance is done using the Box-Cox plot.The transformation criterion is based on the λ value obtained from the plot.If λ=1, no Box-Cox transformation is necessary [12].ii.After making the time series stationary, we can use the ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots to identify the appropriate SARIMA model [13].

Figure 2 .Figure 3 .
Figure 2. A plot of Time Series Data For Domestic Passengers at Soetta Airport

Figure 4 .
Figure 4. (a) ACF and (b) PACF plots of the Nt time series After performing first-order differencing (d=1) and conducting the ADF test, the obtained p-value (0.01) < α (0.05), indicates that the differenced data is now stationary.The plot of the differenced data is shown in Figure 5.

Figure 5 .
Figure 5.A plot of Differenced Time Series Nt (d=1)Checking Stationerity in VarianceBased on the Box-Cox plot as shown in Figure6, the obtained λ value is close to or exceeds 1, suggesting that the data is already stationary in variance.Therefore, there is no need to perform the Box-Cox transformation.

Figure 6 .
Figure 6.A box-Cox plot of the differenced Nt .

Figure 8 .
Figure 8. Plot model diagnostic SARIMA(0,1,1)(1,0,0) 12 (a) residuals plot, (b) Q-Q plot, and (c) ACF's resdiuals plot The diagnostic tests presented in Figures 8 above andFigure9below show that both models do not exhibit autocorrelation in their residuals.This is indicated by the fact that the ACF plots of the residuals remain within significant bounds.Additionally, the residual plots and Q-Q plots show that the residuals in both models are centered around the mean value.This is further supported by the Kolmogorov-Smirnov test, which yields a p-value greater than α, indicating that the residuals are normally distributed.Based on these results, it can be inferred that the residuals of both models meet the assumption of white noise.However, based on the criterion of the smallest AIC, the best model for Nt is SARIMA(0,1,1)(1,0,0)12 .

Figure 10 .
Figure 10.Plot data between forecast value of Nt and observed value of Zt

Figure 14 .
Figure 14.Holidays Component of The Prophet Model Figure 15 shows the plot of the fitted Prophet model compared to the actual data.It can be observed that the Prophet modeling results are in line with the actual data before the intervention period.During the intervention period, the Prophet model captures a similar pattern to the actual data.

Figure 15 .Figure 16 .
Figure 15.Plot Data Between the Fitted Prophet Model and Actual Data Next, the model is diagnosed by examining the autocorrelation and distribution of residuals, as shown in Figure 16 below.

Figure 17 .
Figure 17.A plot of Test Data and Forecasting Data Using the SARIMA Intervention and Prophet Model Figure 18 below shows a bar chart that compares the MAPE and RMSE scores of both models.

Figure 18 .
(a) MAPE and (b) RMSE value of SARIMA Intervention and Prophet Model The SARIMA intervention model outperforms Prophet in terms of forecasting accuracy, as indicated by its lower MAPE and RMSE values.Several factors may contribute to this.Firstly, SARIMA is a more established model than Prophet.SARIMA has been around for decades and has been used to forecast a variety of different types of data [19].