Architecture of Text Mining Application in Analyzing Public Sentiments of West Java Governor Election using Naive Bayes Classification

Architecture of Text Mining Application in Analyzing Public Sentiments   of West Java Governor Election using Naive Bayes Classification
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The selection of West Java governor is one event that seizes the attention of the public is no exception to social media users. Public opinion on a prospective regional leader can help predict electability and tendency of voters. Data that can be used by the opinion mining process can be obtained from Twitter. Because the data is very varied form and very unstructured, it must be managed and uninformed using data pre-processing techniques into semi-structured data. This semi-structured information is followed by a classification stage to categorize the opinion into negative or positive opinions. The research methodology uses a literature study where the research will examine previous research on a similar topic. The purpose of this study is to find the right architecture to develop it into the application of twitter opinion mining to know public sentiments toward the election of the governor of west java. The result of this research is that Twitter opinion mining is part of text mining where opinions in Twitter if they want to be classified, must go through the preprocessing text stage first. The preprocessing step required from twitter data is cleansing, case folding, POS Tagging and stemming. The resulting text mining architecture is an architecture that can be used for text mining research with different topics.


💡 Research Summary

The paper presents a complete architecture for mining Twitter data to analyze public sentiment surrounding the 2018 West Java governor election in Indonesia. The authors begin with a literature review to identify suitable techniques for opinion mining, then define the problem of converting unstructured tweets into semi‑structured data that can be classified as positive, negative, or neutral. Data collection is performed via the Twitter API using election‑related hashtags such as #pilgubJabar, #ridwankamil, and others; only Indonesian‑language tweets are retained after language filtering. The processing pipeline consists of four main stages: (1) preprocessing – which includes cleansing (removing URLs, mentions, and noise), case folding, part‑of‑speech (POS) tagging, and stemming to reduce lexical variation; (2) feature selection – primarily unigram tokenization, with each token stored in a vector; (3) weighting and model training – the Naïve Bayes classifier is trained on the token frequencies, assuming feature independence; and (4) evaluation – the system’s automatic classifications are compared against manually labeled data to compute accuracy. The authors argue that this modular architecture (preprocessing, feature engineering, classification, evaluation) is reusable for other text‑mining tasks and offers a faster, lower‑cost alternative to traditional survey‑based opinion polling. Limitations noted include the lack of a dedicated sentiment lexicon for Indonesian, ambiguous handling of a neutral class, and insufficient treatment of non‑standard language forms common on Twitter. The paper suggests future work involving deep‑learning classifiers, real‑time streaming pipelines, and multilingual extensions to improve performance and applicability.


Comments & Academic Discussion

Loading comments...

Leave a Comment