CLUSTER ANALYSIS FOR DISTRICT/CITY GROUPING BASED ON VARIABLES AFFECTING POVERTY IN ACEH PROVINCE USING AVERAGE LINKAGE METHOD

ABSTRACT


INTRODUCTION
Poverty is a problem that still occurs in several countries, including Indonesia.Poverty is still a multidimensional problem so that it becomes a development priority.Poverty is one of the fundamental problems, because poverty involves meeting the most basic needs in life and poverty is a global problem because poverty is a problem faced by many countries [1].According to BPS, poverty is an economic inability to meet basic food and non-food needs as measured from the expenditure side.In this case it can be concluded that poverty is an inability of a person/household to meet basic needs in everyday life.
Aceh is one of the provinces in Indonesia which is still faced with the problem of poverty.According to BPS [2] the number of poor people in Aceh was recorded at 850.26 thousand people (15.53%), an increase of 16 thousand people compared to the number of poor people in March 2021 which numbered 834.24 thousand people (15.33%).In this case, it means that during the period March 2021-September 2021 the percentage of poor people in Aceh has increased.
To reduce the increase in the number of poor people, poverty alleviation efforts such as development are carried out.There are two strategies that must be taken in efforts to reduce poverty, namely, firstly protecting families and groups of poor people through meeting their needs from various fields, secondly conducting training for them so that they have the ability to carry out efforts to prevent new poverty [3].In an effort to reduce poverty, information on the level of poverty in each district/city in Aceh Province is very much needed, bearing in mind that the geographical conditions in each district/city in Aceh are different which causes the status of the distribution of poverty to be different.Therefore we need a study that can classify districts/cities that have a similar status of poverty distribution.
Cluster analysis is one of the multivariate analyzes that is used to group objects into several clusters according to the similarity of the variables studied, so that the similarity of objects in the same cluster will be obtained compared to objects in different clusters [4].The main objective of cluster analysis is to classify objects into relatively homogeneous groups based on a set of variables considered for research.In general, cluster analysis is divided into two methods, namely the hierarchical method and the nonhierarchical method [5].In this study the application of cluster analysis was used to classify 23 districts/cities in Aceh Province based on indicators that affect poverty in 2021 and look at the characteristics of poverty in each of the cluster results.Clustering is done using the cluster average linkage analysis method, the choice of this method is because this method is considered to have better accuracy than other hierarchical methods [6].

Poverty
Poverty is a state of inability to meet basic needs in everyday life.This situation is caused by the low income generated to meet the necessities of life such as clothing, boards and food.This situation can also have an adverse impact on meeting other living standards such as education and health.Poverty is understood in different ways.The main understanding includes: first, the description of material shortages, which usually includes daily food needs, clothing, food, housing, and health services [7].There are four forms of poverty, namely absolute poverty, relative poverty, cultural poverty and structural poverty [8].

Factor Analysis
Factor analysis is a technique of reducing variables to be simpler based on the relationship between the variables studied into a number of factors.In principle, variable analysis is used to reduce data, namely the process of summarizing a number of variables into fewer and naming them as factors [9].

Sample Adequacy Test
To see the adequacy of the sample, the Keizer-Meyer-Olkin (KMO) test was carried out.To find out whether the data is representative of the existing population, the KMO value is needed [10]: Hypothesis: H0 : The data is feasible to be analyzed H1 : The data is not feasible to be analyzed Test Statistics : (1) 2 : Correlation between variables i and j   2 : Partial correlation between variables i and j The sample can be said to represent the existing population if the KMO value is > 0.5.KMO standard values can be seen in the following Table 1

Bartlett's test
Testing with the Bartlett test is used to see whether there is a relationship (correlation) between variables in the multivariate case [11].The Bartlett test is carried out using the following equation [12]: Hypothesis: H0 : R = I (the correlation matrix is the same as the identity matrix) H1 : R ≠ I (correlation matrix is not the same as identity matrix) Test Statistics : )) Information : |R| : The determinant value of the correlation matrix n : The number of observations p : The number of variables Reject H0 if the p-value means the variables are correlated with each other, so the data is feasible to analyze.≤

Cluster Analysis With The Average Linkage Method
Cluster analysis or group analysis is a data analysis technique that aims to classify individuals or objects into several groups that have different characteristics between groups, so that individuals or objects that are in one group will be relatively homogeneous [13].Cluster analysis using the average linkage method is a hierarchical cluster analysis method that is often used.In this method the distance between two clusters is measured by the average distance between an object in one cluster and an object in another cluster [14].

𝑑(𝑢𝑣)𝑤 = ∑ ∑ 𝑑 𝑖𝑘 𝑘 𝑖
(5) Information : : the distance between the i-th object in the cluster (UV) and the k-th object in the cluster to W   : Number of objects in the cluster (UV)   : Number of objects in cluster W

Selection of Distance Measurement Methods
The similarity between two objects is indicated by the distance between the two objects.The smaller the value of the distance between the two objects, the greater the similarity between the two objects [15].The Manhattan distance is used if the observed variables are correlated or not independent [16].In this study the distance used is the Manhattan distance due to the correlation between the research variables.
Manhattan distance can be formulated as follows: , : the distance between object i and the k-th object   : the value of object i in the k-th variable   : the value of object j in the k-th variable p.s: the number of observed variables

Cluster Assumptions
In cluster analysis, there are two assumptions that must be met.In this study, after carrying out the KMO test and Bartlett test using 8 variables, it can be seen that there is a correlation between the variables.Thus a factor analysis will be carried out to reduce the variables by looking at the MSA value of each variable.The SPSS output results show that there are 3 variables with an MSA value of <0.5, so these variables must be eliminated and repeated cluster assumption tests are carried out.
The following are the results of the KMO and Bartlett tests for the variables X1, X2, X3, X5 and X8: 0.000 In Table 2 it can be seen that the KMO test values for variables X1, X2, X3, X5 and X8 are 0.875 and greater than 0.5, which means that H0 is accepted or the total data of 23 districts/cities in Aceh Province is feasible for analysis.Bartlett test results show a significance level of 0.000 and less than 0.05, which means that H0 is rejected or there is a relationship (correlation) between the study variables.Because the two assumptions are met, the next analysis process can be carried out, namely looking at the MSA value after the variables X4, X6 and X7 are eliminated.
The following are the results of the MSA test for variables X1, X2, X3, X5 and X8 : In Table 3 above it can be seen that the MSA value of all variables is greater than 0.5, so only 5 variables are suitable for further analysis of the 8 variables.Table 4 shows that there is 1 main component that has a characteristic root (eigen value) greater than 1, thus the factor formed is 1 factor.
Based on the results of the factor analysis above, the variables that can be used in cluster analysis are variables X1 (Households with a floor area of <10 m2), X2 (Households with a type of residential building floor made of soil/bamboo), X3 (Households with type of shelter made of bamboo/thatch/wood), X5 (Households with a source of drinking water from unprotected wells/springs/rivers/rainwater), and X8 (Households whose head of household does not go to school/does not finish elementary school/only SD).

Selection of Distance Measurement Methods
In this study the distance used is the Manhattan distance due to the correlation between the research variables.Following are the results of the research variable output with the Manhattan distance:

Table 5. Manhattan Distance Matrix
The following is an example of a calculation using the Manhattan distance formula.For example, we calculated the similarity between Simeulue District and Aceh Singkil District (Objects 1 and 2).Calculation of the similarity between objects 1 and 3 with a manhattan distance of 1.977.The example of calculating the distance above shows that Simeulue Regency has characteristics that are more similar to Aceh Singkil than to South Aceh.This is because the value of the distance between Simeulue and Aceh Singkil districts is smaller than the value of the distance between Simeulue and South Aceh districts, which is 1.417.

Grouping with the Average Linkage Method
The average linkage method will classify 23 districts/cities in Aceh Province based on the average distance between all members in one cluster and all other cluster members.Grouping begins by looking at the shortest distance between two objects using the Manhattan distance measure that has been obtained.The clustering results with the average linkage method are in the form of a dendrogram in Figure 1 below: The dendrogram is read from left to right where the vertical lines indicate the clusters that are merged together, while the lines on the scale show the cluster distances that are combined.

Determine The Number Of Clusters And Their Members
Details of the number of clusters with members formed can be seen in the SPSS cluster membership output table using the average linkage method.From the Table 6 it can be concluded that the members of each cluster are:

Cluster Interpretation
At this stage will provide specific characteristics in each cluster that is formed.Determination of the characteristics of the cluster can be seen from the centroid value (average) in each cluster.The following is a table of average values in each cluster:

CONCLUSIONS
Based on the results of the data analysis that has been done, two conclusions are obtained.First, from the results of cluster analysis using the average linkage method, 3 clusters were formed from 23 districts/cities in Aceh Province.Cluster 1 with the lowest poverty rate consisting of 17 Regencies/Cities.Cluster 2 with the highest poverty rate consisting of 2 districts/cities.Cluster 3 with a moderate poverty level consists of 4 districts/cities.Second, cluster characteristics in terms of dominant and non-dominant variables affect the poverty rate.In clusters 1, 2 and 3 the dominant poverty rate is influenced by variable X3, which means that there are still many households that have houses with inadequate wall types.In clusters 1 and 3 the poverty rate is not dominantly influenced by variable X1, which means that there are already many households that have a house with a proper floor type.In cluster 2 the poverty rate is not dominantly influenced by variable X5, which means that many households consume drinking water from cleaner and more protected sources.

Table 1 . Characteristics of KMO Values
:

Table 7 . Cluster Centroid (Mean) Value
Clusters1 has a high value on variable X3 and has the lowest value on variable X1 Clusters2 has a high value on variable X3 and has the lowest value on variable X5 Clusters3 has a high value on variable X3 and has the lowest value on variable X1