SENTIMENT ANALYSIS OF OMNIBUS LAW USING SUPPORT VECTOR MACHINE (SVM) WITH LINEAR KERNEL

ABSTRACT


INTRODUCTION
In 2020, the Indonesian government prepared the Job Creation Bill (RUU) using the Omnibus law concept to build the Indonesian economy and attract investors [1].The bill consists of 11 clusters, including three laws combined into one: Law Number 13 of 2003 concerning employment, Law Number 40 of 2004 concerning the social security system, and Law Number 24 of 2011 concerning social security administering bodies.The government aims to harmonize these laws to provide investors with refined regulations without worrying about overlapping regulations and losses for investors [2].
However, the Job Creation Law faced public opposition, with many critics arguing that it deducts severance pay for workers who have been terminated by the company and loss of maternity leave [3].From 2021 to 2022, mass labor demonstrations demanded the repeal of the law, with tens of thousands of workers demanding its repeal [4].The essence of the problem lies in Article 89 paragraph 45 of the Job Creation Bill, which revises the provisions in Article 156 of Law Number 13 of 2003 concerning Manpower [5], [6].
The latest case in Wadas village, Purworejo district, Central Java, concerns the community's rejection of a national strategic project in the form of mining, which is also the impact of the Job Creation Law.The most recent case occurred on Labor Day 2022, coincided with Eid al-Fitr 1443 H, where workers also took action regarding the omnibus law [7], [8], [9].
To resolve the pros and cons of the Job Creation Law, research or analysis is needed to gather public opinions on its impact.One method to obtain information on public opinion is the Text Mining method, which uses sentiment analysis to detect complaints, product or policy perceptions, and perceptions about certain brands [10], [11].
Sentiment Analysis, also known as opinion mining, is an automated process that involves comprehending, extracting, and processing textual data with the objective of discerning the sentiment expressed in a sentence, specifically whether it conveys a positive or negative opinion [12], [13].The methodology employed in this research will involve the utilisation of the Super Vector Machine (SVM) algorithm.The selection of the Super Vector Machine (SVM) classification approach was based on its demonstrated accuracy in classification tasks, its ability to identify optimal hyperplanes as separators, its efficient learning process, and its effectiveness as a text classification method.SVMs are particularly wellsuited for handling large feature spaces and exhibit strong generalisation capabilities [14], [15].
Based on the aforementioned context, the researcher intends to employ the Super Vector Machine (SVM) technique to ascertain public responses and opinions regarding the Job Creation Law in Indonesia via the Twitter social network.The objective of this study is to achieve an accurate and effective sentiment analysis classification, thereby offering valuable insights and serving as a basis for governmental recommendations aimed at addressing the issues arising from the implementation of the Job Creation Law in Indonesia.

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a widely used and influential method for solving classification problems.It was first introduced in 1992 at the Workshop on Computational Learning Theory and is based on computational theories such as hyperplane margins.SVM is a linear classifier-based classification method that solves problems using the Lagrangian equation, a dual form of SVM through quadratic programming.It is considered an alternative to standard SVM due to its efficiency in processing large-scale data [15].
The Support Vector Machine (SVM) algorithm is capable of effectively discriminating between different classes by utilising the concept of margins.Margin represents the distance between the two closest data classes to the hyperplane line, while the hyperplane is the separator between different data classes [16].
In Figure 1, The hyperplane with the best margin provides useful generalizations for better classification results  Figure 1 illustrates the two conditions of the hyperplane, as well as the corresponding margin distances.Both forms of classification are capable of accurately categorizing data groups.The classification of data is more accurate when utilizing a marked hyperplane with wider margin spacing, as opposed to a hyperplane with smaller margin spacing.Typically, the Support Vector Machine (SVM) algorithm seeks to identify the hyperplane that maximizes the margin distance, also known as the Maximum Marginal Hyperplane (MMH) [18].Support Vector Machines (SVM) is a classification method that partitions data into two distinct classes.The formula is derived using the notation   to represent the i-th data class [19]: The normal plane, denoted as w, and the position of the plane relative to the coordinate centre, denoted as b, are established concepts in the field.The formula for determining the distance of the boundary plane from the centre point is expressed as . The equation below represents the formula utilised to identify the optimal plane with a substantial margin value in the context of linear classification in primal space, through division [20]: (.  + ) ≥ The objective function || 2 is to be minimised while considering the constraint on   (.  + ) ≥ 1.

Evaluation of Accuracy Classification
The confusion matrix is a widely employed technique for evaluating the accuracy of data mining algorithms.The table presents the count of accurately classified and inaccurately classified test data.The confusion matrix is a quantitative evaluation tool that employs an algorithm to attain precise classification accuracy for a given class [21].The confusion matrix table can be found in Table 1.: The number of wrong negative observations predicted as positive A False Positive is commonly referred to as a type 1 error, which arises when a negative case is incorrectly classified as positive.On the other hand, a false negative is known as a type 2 error, which occurs when a positive case is mistakenly classified as negative [22].The measure of minority class accuracy can be represented by the true positive rate or recall, also known as sensitivity, within a matrix.The evaluation of predictors in the context of imbalance can be more comprehensive when considering metrics such as G-mean and AUC.The evaluation of machine learning performance in the context of imbalanced data can be accomplished by utilising matrices such as specificity (true negative rate), sensitivity (true positive rate), APER, total accuracy rate (1APER), G-mean, precision, and F-measure.These matrices can effectively assess the performance of machine learning algorithms, as demonstrated in Table 1.The formula is presented in the following approach [23], [24].
True Negative Rate (Acc -) or specificity =   + G-mean = √ × F-measure = False Positive Rate =   +  (11) Area Under the ROC Curve = +   +  +  +  (13) AUC is a good way to get the performance value of the classifier in general and to compare it with other classifications.

Data and Research Stages
The researchers collected a total of 3064 data tweet, comprising tweets and retweets, that were relevant to the discourse surrounding the advantages and disadvantages of the Job Creation Law.These data points were obtained by employing specific keywords such as "omnibus law" and "cipta kerja".The researchers utilised the data scraping methodology to collect Twitter data within the period of May 2022 and May 2023.The labour demonstration on May Day 2023 garnered considerable attention on the social media platform Twitter, with one of the prominent demands being a protest against the omnibus law [25].Figure 2 depicts the sequential stages entailed in the execution of data analysis within the context of this particular study.These stages include the initial step of establishing the case study, followed by the subsequent tasks of gathering pertinent Twitter data, preserving the acquired data in CSV format, preprocessing the data, and ultimately performing sentiment labelling and classification utilising the SVM methodology.The model undergoes training using annotated data and is subsequently assessed by computing the accuracy of its predictions on a separate set of test data.Lastly, the accuracy is computed in order to assess the model's proficiency in reliably predicting sentiment using the test data.

Wordcloud
According to [26], the wordcloud system is capable of generating a representation of individual words within a sentence, with a focus on the frequency of occurrence of associated words in written communication.The utilisation of wordclouds can enhance the experience for readers or researchers as they offer a rapid summary of the content within a sentence.According to [27], It is widely acknowledged that wordclouds have gained significant popularity in the field of Text Mining.This can be attributed to their inherent simplicity, as they provide a straightforward and intuitive means of visually representing word frequency.The utilisation of word clouds enables the fast visualisation of word frequency in an approach that is both engaging and informative.The wordcloud visualisation is depicted in Figure 3.   3 illustrates the size of the text in the wordcloud image which depends on the frequency of the corresponding data.Specifically, words that are used more frequently will be depicted with larger dimensions in the wordcloud, while words that are used less frequently will be visualized with smaller dimensions.

Overview of Scrapping Data
Table 2 provides an illustration of the outcomes derived from the scraping procedure.The dataset is structured in CSV format and consists of multiple variables, including time, user, and tweet.The outcomes of the data scraping procedure necessitate a preprocessing phase in order to cleanse the tweet text and render it suitable for further processing in subsequent stages.

Prepocessing Data
Twitter data is unstructured text with noise that inhibits categorisation.To improve data structure and organisation, data preparation is needed.Preprocessing includes case folding, tokenization, cleaning, and filtering textual data.Preprocessing organises the data, making categorisation easier.To enhance sentiment labelling, the researcher uses lexical and machine learning approaches.Lexicon classifies sentiment, whereas machine learning evaluates the model.RStudio classifies sentiment as good, negative, or neutral.To calculate a sentiment score, search for words with positive and negative reactions and subtract the positive from the negative.Table 3 shows findings.Table 3 shows tweet sentiment labeling; a total of 1067 tweets were labeled positive, 971 negative, and 1026 neutral.The next research will solely evaluate positive and negative tweets about the Omnibus Law with the intention of using them to improve future legislation.Table 4 displays scored and sentimentlabeled tweets.

SVM Classification
The classification process entails the creation of distinct training and test data sets for the purpose of algorithmic analysis.The utilisation of training data is employed in order to ascertain the optimal model, whereas testing data is utilised to evaluate the accuracy of model predictions.The dataset consisting of 2038 instances was partitioned by the researchers into separate training and test sets, with a ratio of 80% for training data and 20% for testing data.The training dataset generated a total of 1630 outcomes, whereas the test dataset resulted in 408 outcomes.The Confusion Matrix value is computed based on the outcomes derived from the test data, as illustrated in Figure 4.  5 shows the kernel accuracy linear relationship.The performance of the four support vector machine (SVM) kernels, when applied to the entire dataset, reveals that the highest accuracy is achieved when C=1 for each kernel.This observation is based on the comparison of C values, where C=1 consistently yields the highest accuracy compared to other C values.The outcomes of the classification comparison of the four kernels are presented in Table 5.14.According to the findings presented in Table 5.12, it is evident that the linear kernel approach, implemented through the Support Vector Machine algorithm, achieves the highest accuracy among the various kernel methods considered.The accuracy value obtained for the linear kernel with a regularization parameter (C) set to 1 is 97.06%.

Wordcloud
The utilisation of Wordcloud serves as a method for visually representing a comprehensive overview of frequently appearing words within a dataset comprised of tweets.The Wordcloud was generated based on the outcomes derived from the classification conducted using the Support Vector Machine (SVM) algorithm.In Figure 5 (a), a positive classification is illustrated, featuring the frequently employed keywords "cipta kerja" and "omnibus law".The analysis of the wordcloud has revealed that the term "UU (Undangundang)" and "implikasi UU" consistently appear in optimistic tweets discussing the contentious issue of the omnibus law in Indonesia.The term "UU" and "implikasi UU" frequently appear in word clouds with a larger font size, denoting their significance.The analysis of the data depicted in Figure 5 (a) reveals that the terms "demo", "buruh", "upah", and "murah" exhibit the highest frequency in relation to all other words within the negative classification category.This discovery suggests that, despite the implementation of the omnibus law, labourers' wages continue to remain comparatively low.

CONCLUSIONS
Based on the preceding discussion and problem formulation, researchers have derived conclusions to address certain problem formulations.Specifically, the sentiment analysis results indicate that there were 1067 tweets classified as expressing positive sentiment towards the Omnibus law, 971 tweets classified as expressing negative sentiment, and 1026 tweets classified as neutral.The findings indicate that the positive sentiment within the community is primarily expressed by business owners who perceive the omnibus law as advantageous and manageable.Conversely, the negative sentiment within the community is predominantly voiced by workers who believe that labour wages remain relatively low despite the implementation of the omnibus law.According to the analysis conducted using the Support Vector Machine (SVM) approach, it was determined that the linear kernel with a value of C = 1 yielded the highest accuracy rate of 97.06%.Additionally, the area under the curve (AUC) was found to be 0.97, indicating a high level of accuracy for the SVM linear kernel. [17].

Figure 1 .
SVM Hyperplane with (a) Small Margin, and (b) Large margin

Figure 2 Figure 2
Figure 2. Research Stages Figure 2 illustrates the sequential progression of the research process, encompassing multiple distinct stages.

Figure
Figure3illustrates the size of the text in the wordcloud image which depends on the frequency of the corresponding data.Specifically, words that are used more frequently will be depicted with larger dimensions in the wordcloud, while words that are used less frequently will be visualized with smaller dimensions.

Figure 4 Figure 4
Figure 4. Confusion Matrix Figure 4 displays 194 favourable sentiment classifications from test data.The programme successfully categorised 194 data points but misclassified four positive sentiment data as negative.202 predicted data matched actual data, while the model misclassified 8 negative sentiment data as positive sentiment.The researcher measured recall, precision, accuracy, and specificity.The sentiment categorisation formula above yields 96% recall, 97% accuracy, and 98% precision.High recall, accuracy, and precision are correct.Specificity, false positive rate, and area under the curve score well.AUC 0.97 represents excellent feasibility or classification accuracy.SVM algorithms optimise accuracy in academic research.Table5shows the kernel accuracy linear relationship.

Figure 5
displays the wordcloud representing the positive and negative classification outcomes in this research work.

Figure 5 .
Wordcloud for (a) Positive Classification, and (b) Negative Classification

Table 1 . Confusion Matrix
: The number of positive observations with right predicted.TN (True Negative) : The number of precise negative observations predicted.FP (False Positive) : The number of wrong positive observations predicted as negative.FN (False Negative)