Jurnal Ilmu Komputer dan Informasi
Not a member yet
247 research outputs found
Sort by
Multilabel Hate Speech Classification in Indonesian Political Discourse on X using Combined Deep Learning Models with Considering Sentence Length
Hate speech, as public expression of hatred or offensive discourse targeting race, religion, gender, or sexual orientation, is widespread on social media. This study assesses BERT-based models for multi-label hate speech detection, emphasizing how text length impacts model performance. Models tested include BERT, BERT-CNN, BERT-LSTM, BERT-BiLSTM, and BERT with two LSTM layers. Overall, BERT-BiLSTM achieved the highest (82.00%) and best performance on longer texts (83.20% ) with high and , highlighting its ability to capture nuanced context. BERT-CNN excelled in shorter texts, achieving the highest (79.80%) and an of 79.10%, indicating its effectiveness in extracting features in brief content. BERT-LSTM showed balanced and across text lengths, while BERT-BiLSTM, although high in r, had slightly lower on short texts due to its reliance on broader context. These results highlight the importance of model selection based on text characteristics: BERT-BiLSTM is ideal for nuanced analysis in longer texts, while BERT-CNN better captures key features in shorter content
The Optimizing Data Quality in Interagency Data Sharing: A Framework
In the modern landscape of government operations, characterized by a shift towards openness, inclusivity, and interagency collaboration driven by the pursuit of public value and evidence-based policy making, the importance of interagency data sharing (IDS) is unmistakable. Despite the evident benefits of information exchange among government agencies, challenges persist, especially concerning nuanced considerations of data quality. This study aims to bridge this critical gap by proposing a specialized framework for IDS within government agencies. This framework, crafted to proactively address data quality considerations throughout the entire lifecycle, transcends traditional approaches and seeks to offer insights for fostering effective practices in interagency data sharing. Positioned at the nexus of evolving government operations, the research underscores the necessity for strategic frameworks prioritizing data quality to support collaborative and effective evidence-driven decision-making
Preprocessing Impact on SAR Oil Spill Image Segmentation Using YOLOv8
Synthetic Aperature Radar (SAR) is a sensory equipment used in marine remote sensing that emits radio waves to capture a representation of the target scene. SAR images have poor quality, one of which is due to speckle noise. This research uses SAR images containing oil spills as objects that are detected using machine learning with the YOLOv8 model. The dataset was obtained from MKLab by preprocessing to improve the quality of SAR images before processing. Preprocessing involves annotating the dataset, augmenting it with flip augmentation, and filtering it using threshold and median filters in addition to a sharpen kernel that finds the optimal midway value. The default value of the YOLOv8 hyperparameter is used with addition of delta as well as subtraction of the same delta.
The implementation of preprocessing and combination of hyperparameters is examined to optimize the YOLOv8 model in detecting oil spills in SAR images. Based on 10 experimental scenarios, initial results with the original MKLab image provide an mAP50 of 49.7%. Implementing Flip augmentation alone on the data set increases the mAP50 value by 18.8%. Followed by the sharpen 1.2 kernel filter increasing the mAP50 value to 68.89%, while the median and thresholding filters tend to reduce the mAP50 value. The combination of experiments with the best results was preprocessing with flip augmentation and sharpen 1.2 kernel filter with hyperparameters: epoch 200, warmup 4.0, momentum 0.9, warmup bias lr 0.01, weight decay 0.005, and learning rate 0.000714, resulting in an mAP50 value of 68.89%. In addition, it was found that the sharpening kernel with a real number midpoint of 1.2 and combination with flipping augmentation had the greatest impact on increasing the MAP50 value in SAR oil spill image segmentation by YOLOv8
Context-Aware Detection of Deceptive Design Patterns in E-Commerce Websites Using Word Embedding Based Deep Learning Paradigms
Deceptive designs (DDs) are a hidden technological tactic that manipulates the user's consumer behavior in a way that benefits website vendors without them knowing. Proper identification of deceptive designs is essential to prevent users from being misled by hidden tactics. To fulfill this requirement, this study assesses Word2Vec word embedding based deep learning models for text based deceptive design detection. Models trained consist of Convolutional Neural Networks (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and a hybrid model (CNN + BiLSTM) that combines the two aforementioned models. These four key score indices of accuracy, precision, sensitivity, and F1-score are computed to compare the performance of each proclaimed model. When compared to the existing DD detection techniques, all three of these approaches attain state-of-the-art performance. The results of this evaluation illustrate that the hybrid model achieves the highest accuracy of 95% in capturing the nuanced text context of deceptive designs. Furthermore, even when other metrics are considered, the hybrid model performs more effectively. To guarantee the independence and security of user activities, intelligent deep learning paradigms are integrated to identify hidden deceptive activities automatically. This allows for the accurate detection and classification of deceptive designs in intricate e-commerce environments
Efficient Design and Compression of CNN Models for Rapid Character Recognition
Convolutional Neural Networks (CNNs) are extensively utilized for image processing and recognition tasks; however, they often encounter challenges related to large model sizes and prolonged training times. These limitations present difficulties in resource-constrained environments that require rapid model deployment and efficient computation. This study introduces a systematic approach to designing lightweight CNN models specifically for character recognition, emphasizing the reduction of model complexity, training duration, and computational costs without sacrificing performance. Techniques such as hyperparameter tuning, model pruning, and post-training quantization (PTQ) are employed to decrease model size and enhance training speed. The proposed methods are particularly well-suited for deployment on edge computing platforms, such as Raspberry Pi, or embedded systems with limited resources. Our results demonstrate a reduction of over 80% in model size, decreasing from 43.73 KB to 6.25 KB, and a reduction of more than 45% in training time, decreasing from over 150 seconds to less than 80 seconds. This research highlights the potential for achieving a balance between efficiency and accuracy in CNN design for real-world deployment, addressing the increasing demand for streamlined deep learning models in resource-constrained environments
Application of Machine Learning Methods for Classification of Gamma and Hadron Signals in High Energy Particle Detection
A major challenge in particle physics is the binary classification of high-energy gamma signals against a complex hadron background. Accurate identification of these gamma signals is critical for particle detection, especially as the volume and complexity of data increases as technology advances. The research developed a machine learning-based classification model to efficiently and accurately distinguish gamma signals from hadrons. Logistic Regression, Decision Trees, Random Forests, and Artificial Neural Networks are used for classification. Principal Component Analysis (PCA) and correlation analysis identified dominant features, while Monte Carlo simulations validated the distribution of gamma and hadron spectra. This study focuses on geometric parameters such as fLength, fWidth, fAlpha, as well as photon distribution and distance effects (fDist) in gamma signal identification using K-Means clustering. The Random Forest algorithm achieved the highest accuracy of 87.96%, with an F1-score of 0.91, which defines its robustness in the classification task. PCA and correlation analysis showed fSize, fLength, and fWidth as the most influential factors in classification. Monte Carlo simulations successfully replicated the spectral distribution pattern with high experimental validation. The research presents a novel integration of geometric analysis, clustering techniques, and simulation validation in the classification of high-energy particles. Machine learning methods, in particular Random Forest, effectively distinguish the gamma signal from the hadron background. The combination of PCA and Monte Carlo simulations improves the understanding of data distribution patterns and key classification factors. This research contributes to the development of a more reliable astrophysical signal classification system with potential applications in large-scale astronomical data management
E-Government Between Developed and Developing Countries: Key Perspectives from Denmark and Iraq
E-government involves using technology to provide public information and services digitally. This study examines key factors addressing infrastructure, cultural, political, technical, and social challenges in e-government implementation. By exploring diverse contexts, from citizen engagement to data frameworks, it will elucidate best practices and lessons for overcoming hurdles on the bureaucratic and user sides. The research aims to uncover how states can successfully transition services online. Insights can inform policymakers seeking to digitize governance and leverage information and communication technologies to improve state-citizen relations. Additionally, it aims to compare and analyze e-government systems in a developing country (Iraq) and a developed country (Denmark) to highlight key differences that could inform e-government development efforts in developing nations. Iraq and Denmark were chosen due to the disparity between their e-government systems, enabling the identification of weaknesses in Iraq's e-government initiatives and providing insights from Denmark's more advanced experience. Examining this e-government gap between a developing and developed country will allow developing nations like Iraq to pinpoint areas for improvement and potentially benefit from Denmark's success in this area
Forest and Land Fire Vulnerability Assessment and Mapping using Machine Learning Method in East Nusa Tenggara Province, Indonesia
Forest and land fires are severe disasters for forest ecosystems, diminishing their functionality. Accurate prediction of fire-prone areas aids in effective management and prevention. Machine learning methods have shown promise in this regard. By 2022, East Nusa Tenggara (NTT) had the highest incidence of such fires. This study aims to assess NTT's forest and land fire vulnerability using seven machine learning methods: Gaussian Naive Bayes, Support Vector Machine, Logistic Regression, Artificial Neural Network, Random Forest, Gradient Boosting Machine, and Extreme Gradient Boost. A geospatial dataset integrating NTT's 2022 fire data and fourteen fire-related factors were created using ArcGIS. Feature selection, employing the Information Gain Ratio, identified nine key features: Degree of Slope, Land Cover, NDVI, Annual Rainfall, Distance to Road, Distance to River, Distance to Buildings, Wind Speed, and Solar Radiation. The Random Forest model emerged as optimal, with AUC values of 0.864 and 0.742 for training and testing, respectively. The resulting vulnerability map highlighted factors contributing to NTT's forest fires, including gentle slopes, forest cover, unhealthy vegetation, low rainfall, human activities, remote water access, soil moisture, distant firefighting facilities, low wind speeds, and high solar radiation. Recommendations include land management, fire-resistant vegetation, policy enforcement, community education, and infrastructure enhancement
Classification of Economic Activities in Indonesia Using IndoBERT Language Model
Classification of economic activities plays a vital role in understanding, analyzing, and managing complex economic processes in a society or country. It facilitates economic analysis, data collection, policy formulation, and informed decision-making. In Indonesia, economic activities are classified according to the Indonesian Standard Industrial Classification (KBLI). This classification process requires in-depth knowledge about KBLI, and this process is still performed manually, which is therefore time-consuming. To address this challenge, this paper proposes to use a transformer-based language model that was pretrained using a large Indonesian corpus, i.e., IndoBERT, to better understand the contextual meanings of text in order to improve the accuracy of automatic economic activity classification. Our results show that the finetuned IndoBERTLARGE model achieves superior results, with an F1 score of 96.82% and a balanced accuracy of 96.10%, outperforming other recent methods used for similar task, i.e., CatBoost and DistilBERT models
Estimating Passenger Density in Trains through Crowd Counting Modeling
The Greater Jakarta Commuter Rail, also known as the KRL Commuter Line, is one of the primary transportation choices for many people due to its comfort and efficiency. However, the level of user dissatisfaction is still relatively high, particularly regarding the frequent and unpredictable overcrowding of trains. To address this issue, our research develops an Artificial Intelligence-based model to predict train passenger density through crowd counting. By utilizing the proposed k-F1 metric and a constructed dataset of train density, we compare three object detection approaches: bounding box prediction (YOLOv5), density map (CSRNet), and proposal point (P2PNet). Our results show that P2PNet excels in estimating the number of people and predicting their locations in crowded situations. However, for situations that have fewer people and larger object sizes, YOLOv5 demonstrates the best performance. To estimate the density of space, we propose a method that takes into account the region of interest, image perspective transformation, and masking. The proportion between the masked area and the total area provides an estimation of the density level within the train. This method can be applied to real-time image-based CCTV systems in predicting train congestion and facilitating transportation management decisions aligned with Indonesia's sustainable development goals