AN ITERATIVE PROCEDURE FOR OUTLIER DETECTION IN GSTAR(1;1) MODEL

. Outliers are observations that differ significantly from others that can affect the estimation results in the model and reduce the estimator's accuracy. To deal with outliers is to remove outliers from the data. However, sometimes important information is contained in the outlier, so eliminating outliers is a misinterpretation. There are two types of outliers in the time series model, Innovative Outlier (IO) and Additive Outlier (AO). In the GSTAR model, outliers and spatial and time correlations can also be detected. We introduce an iterative procedure for detecting outliers in the GSTAR model. The first step is to form a GSTAR model without outlier factors. Furthermore, the detection of outliers from the model's residuals. If an outlier is detected, add an outlier factor into the initial model and estimate the parameters so that a new GSTAR model and residuals are obtained from the model. The process is repeated by detecting outliers and adding them to the model until a GSTAR model is obtained with no outliers detected. As a result, outliers are not removed or ignored but add an outlier factor to the GSTAR model. This paper presents case studies about Dengue Hemorrhagic Fever cases in five locations in West Kalimantan Province. These are the subject of the GSTAR model with adding outlier factors. The result of this paper is that using an iterative procedure to detect outliers based on the GSTAR residual model provides better accuracy than the regular GSTAR model (without adding outliers to the model). It can be solved without removing outliers from the data by adding outlier factors to the model. This way, the critical information in the outlier id is not lost, and an accurate ore model is obtained.


INTRODUCTION
Unexpected extraordinary observations that look different from most observations in the data set are often encountered in various kinds of data analysis. Outliers are observations that differ significantly from others. Its presence is unexpected because there are so many factors that can cause its presence. Commonly, outliers are removed. In this paper, outliers were not released, but the way to deal with outliers is by adding an outlier factor to the model. In the time series model, Auto-regressive Integrated Moving Average (ARIMA). Chang et al. have developed the iterative procedure for detecting the outliers and overcoming the presence of outliers by adding the outliers factor to the ARIMA model.
Space-time analysis is no exception, for outliers can be detected. In this paper, we proposed the iterative procedure for detecting the outliers in the space-time model, the generalized Space-Time Autoregressive (GSTAR) model. The GSTAR model is a space-time model used to model and forecast spatial time series data. Nowadays, the GSTAR model is growing in Indonesia. Some development of the GSTAR(1;1) model has been done by some researchers, such as making a new procedure for Generalized STAR modeling using Inverse Auto-covariance Matrix (IAcM) approach [1]. In terms of weighting on the GSTAR model. GSTAR has been modeled using the weighted average of fuzz sets concept approach and applied that model to oil palm production [2]. Yundari et al. (2017) researched error assumptions in the GSTAR model [3]. Recently, Yundari et al. (2018) did the research on spatial weight determination of the GSTAR(1;1) model by using the Kernel function [4]. In application, the GSTAR model is rapidly used to forecast Gross Domestic Product (GDP) in West European [5], chili price in Bandung's market [6], and criminality [7]. A combination of GSTAR modeling and variogram of spatial analysis was conducted [8]. However, spatial analysis is an older science than space-time analysis, the development of spatial models is still ongoing with several applications. The bootstrap approach estimated the parameters of the isotropic semivariogram [9]. Moreover, the effect of spatial aggregation on the space-time model was investigated [10].
Dengue fever is transmitted through the Aedes Aegypti mosquitos. Aedes Aegypti is one of the dengue mosquitoes that likes a warm climate. The case study in this paper is the cases of dengue fever sufferers in six districts in West Kalimantan. Climate change has occurred significantly. This significant increase can indicate outliers detection in the GSTAR model. This paper proposes the iterative procedure for outlier detection in the GSTAR model and applies it to the study case. This paper is divided into six sections. Section two briefly explains the GSTAR model. The definition and types of outliers are discussed in section three. Then the iterative procedure for outlier detection is discussed in section four. The application of the GSTAR model by adding an outlier factor is discussed in section five. Conclusions and remarks are put forward in section six.

GSTAR Model
GSTAR model is the development of the STAR model. In the STAR model, all locations have the same autoregressive parameters, so the locations used are assumed homogenous. It causes that this model only can be used for uniform locations. In reality, we have a heterogeneous location, so a model that can solve this problem is needed. The GSTAR model enables capturing a phenomenon with heterogeneous characteristics locations, and the parameters for each location are different from each other. Consider a random variable = ( 1, , 2, , … , , ), follows the GSTAR( ; 1 , 2 , … , ) model. It can be stated as [11] where Φ is the diagonal matrix of parameters in GSTAR model, is weight matrix, = ( 1, , 2, , … , , ) is the matrix residuals in GSTAR model, and ~(0, 2 ). For example, the GSTAR model with both autoregressive's order and spatial's order are 1, GSTAR(1;1), could be written as = Φ 0 −1 + Φ 1 −1 + The model in Eq. (1) is called as the GSTAR(1;1) model without outlier factor.
The uniqueness of the GSTAR model is the weight matrix. The weight is defined based on the correlation between the location to one location and another. The following are the weight of the GSTAR model [4] 1.
Uniform Weight gives the same weight for each location. Therefore, this weight is often used on homogenous data or has the exact distance between locations. This formulation calculates the values of the uniform location weights ( ) = 1 ( ) where ( ) is the weight between locations and , ( ) is the number of locations adjacent to the − location in spatial lag .

2.
Binary Weight only has zero and one value. The correlation between two geographically adjacent cities is defined by = 1. Whereas if geographically far apart is defined = 0.

3.
Inverse Distance Weight is based on the actual distance between locations. The weight calculation is obtained from the normalization of the actual inverse distance results. The first step is calculating the actual distance between locations, , i.e.
where is the distance between locations and . is a symmetrical matrix.
Then the matrix is standardized in form with ∑ ( ) =1 = 1 where ≠ . Generally the inverse weight of distance for each location is stated by The diagonal matrix of inverse weight is zero, because a location has no distance with itself. The inverse weight distance is not a symmetrical matrix.

ARIMA(1,0,0) and GSTAR(1;1) Model
Let is the sequence of random variables and follows the ARIMA(1,0,0) model, then can be defined as = −1 + where is an autoregressive parameter and is error term at time . Then if we have where is 1, , 2, , … , , are the sequence of random variables, then we can define This model is VAR(1) model. Further, if we add the spatial factor to VAR(1) model, then we get the GSTAR(1;1) model (see Eq. 1). The GSTAR(1;1) model can be stated as VAR(1) model with Φ = Φ 0 + Φ 1 It can conclude that the structure of AR(1) model is not much different with GSTAR(1;1).

Outlier
Outliers are inconsistent observation data due to unforeseen events such as turbulent political or economic crises [12]. Outliers can cause unreliable and invalid estimators, so outlier detection needs to be done. Outlier detection was firstly introduced by Fox (1972) [13]. Outliers consist of two types 1. Additive Outlier (AO) is an event that affects to time series data at a time. The definition of additive outlier model as follows [13]: The definition of innovative outlier model as follows [13]: If the outlier is AO, the effect given only occurs at the time of the observation. But if the outlier is IO, the effect given at the entire observation , +1 , … Generally, time series data can contain several different types of outliers. The outlier model in general as follows [12]: After getting the time series model with the outliers factor, start the iterative procedure for outliers detection based on the time series model with an outliers factor. If no outliers are found, then stop the iterative procedure. Otherwise, the estimation stage is repeated, with the newly identified outliers incorporated into the model (Eq. 2), until no more outliers can be found and all the outlier effects have been simultaneously estimated with the time series parameters. This paper adopts the types of outliers, iterative procedures of outliers detection, and the GSTAR model with outlier factors from the time series model. In that case, these observations develop into several locations and have spatial correlations among locations. Therefore, outliers may also be detected in the GSTAR model and correlate locations (spatial correlation).

An Iterative Procedure for Outlier Detection in GSTAR Model
This procedure is developed by an outlier's iterative procedure for ARIMA model. The procedure begins with modelling the original series by supposing that there is no outlier-estimated parameters of the GSTAR model without outlier assumptions. Let be a stochastic process following an GSTAR(1,1) model, i.e.
, is the backshift operator such that = − The following is an iterative procedure for outlier detection in GSTAR(1;1) model 1.

4.
If an AO/IO is identified on previous step, recompute ̂1 , , and ̂2 , , based on the same initial estimates of the parameters, but using the modified residuals ̌ and ̌2 .

5.
Repeat the thirdthe fourth step and stop the iteration until no further outlier candidates can be identified The following flowchart in Fig. 1 illustrates an iterative outlier detection procedure in GSTAR(1,1) model. This output of this procedure is the time when outliers are detected. Generally, space-time data can contain several different types of outliers. To overcome the main problem in this paper, so we need to add the outlier factor, ℎ ℎ ( ) ( ℎ ) for ℎ is number of outliers and ℎ is the time when outliers are detected for index h, to the GSTAR(1,1) model. The GSTAR model with outlier factors, in general, is as follows: After getting the GSTAR model with the outliers factor, restart the iterative procedure for outlier detection based on the GSTAR model with an outlier factor. If no outliers are found, then stop the iterative procedure. Otherwise, the estimation stage is repeated, with the newly identified outliers incorporated into the model (Eq. 5), until no more outliers can be found and all the outlier effects have been simultaneously estimated with the time series parameters. The model (Eq. 5) is called GSTAR(1; 1) with an outliers factor. An Iterative Procedure For Outlier Detection in Gstar.…

Descriptive Analysis
The data used in this paper are secondary, i.e., dengue fever cases in Kalimantan Barat. This paper uses five locations, namely A, B, C, D, and E city. Data on dengue fever cases were obtained from the Health Office of West Kalimantan Province from January 2015 to June 2018 (42 data) [14]. Fig. 2 shows time series plot data in each location. Those plots show that in the presence of outliers in DF cases. The red bullets are the outliers detected based on the boxplot. The descriptive statistic, such as mean, first, second, and the third quartile, is also shown in the boxplot. Visually, the possibility of detecting outliers several times in each location can be seen. More outliers were detected from the beginning of 2018 until mid-2018; this case becomes interesting to analyze. One of the assumptions of space-time analysis is the correlation of events between locations. Table 1 shows the correlation of dengue fever cases between locations. The GSTAR model with outlier factors in general is as follows: Based on Table 1, B and E show a strong correlation, i.e., 0.76. In contrast, C and D show a minor correlation compared to other locations. It is due to the closer distance between the two locations than the other three locations. The distance of the location also causes this. The higher the correlation, the stronger the relationship between those cases. The further the distance, the smaller the correlation.

An Iterative Procedure for Outlier Detection in GSTAR(1;1) Model
The following is the procedure for outlier detection in GSTAR(1;1) model.

1.
Modelling the GSTAR(1;1) model by supposing that there is no outlier detected. The weight matrix used is the inverse distance weight matrix. This model assumes that the dengue fever cases in one location are affected by distance or closeness with other locations.

5.
According to the iterative procedure for detecting the outliers using the model in Eq. (1), the procedure correctly identifies the time when the outlier has happened. We have four models: GSTAR(1;1) model without outlier factor, first, second, and third. Afterward, do the parameter estimation outlier factor based on residual in GSTAR(1;1) model without outlier factor, one iteration on the second model (GSTAR(1;1) model with outlier factor based on residual in the first model) and two iterations on the third model (GSTAR(1;1) model with outlier factor based on residual in the second model). Table 2 shows the time identified in the iterative procedure. While the outlier's parameters are shown in Table 3. These parameters gotten by solve the 3 rd model using Least Square. The 3 rd model is chosen cause no more outlier detected from that residuals.
Compute the Mean Square Error (MSE) for getting the best model. By adding the outlier factor to the model can increase the accuration of the model. The analysis produces the result as presented as Table 4. Therefore, the best model is the third model.

Diagnostic Checking Model
A diagnostic checking model examines whether the model assumptions are fulfilled. The basic assumption of the model is residual white noise. The basic assumption of the model is residual white noise [15]. The residual's white noise can be detected by checking that the normality of residuals is independent of the lag of times. Visually, residuals are independent if there is no correlation between lag times in the ACF plot of the residuals model (see Fig. 3). The plot from left to right connected by arrows shows the ACF of GSTAR(1;1) model without outlier factor → The first model → Second model → Third model (The best model).
While the normality residuals is done by Kolmogorov Smirnov test. The significance level (α) used in this test is 0.05. The p-value of the KS test sequentially for the city-A, B, C, D, and E are 1.253;0.655;0.517;0.717;0.850. We can conclude that the residuals are normally distributed. Fig. 4 gives the plot of the fitted values based on the GSTAR(1;1) model with the outlier factor versus the observations. The plot shows that the outliers can be reached by using this model. Indirectly, this can help reduce the error model in approaching the actual data without eliminating important information from the outliers.