Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Not a member yet
1071 research outputs found
Sort by
Hybrid Video Transcription Summarization with a BERT-Based Clustering and BART
The use of video as a medium for information and education is rapidly increasing across online platforms. However, long durations and unstructured delivery often hinder audiences from grasping the core message, presenting challenges for the development of automatic summarization methods for monologues, interviews, and podcasts. Extractive methods often yield less coherent summaries, while abstractive methods may overlook important details. To address this issue, this study proposes a hybrid approach combining extractive and abstractive techniques. In the extractive stage, sentences are represented using BERT embeddings and clustered using two methods, namely K-Means Clustering and Hierarchical Clustering (agglomerative). The abstractive stage then employs the BART model to generate summaries that are more coherent and informative. Experimental evaluations on 20 Human Metapneumovirus (HMPV) videos indicate the strongest performance on monologues, with ROUGE-1 of 57%, ROUGE-2 of 30%, and ROUGE-L of 32%. Although lower performance was observed for interviews and podcasts due to dynamic interactions and frequent speaker shifts, the hybrid approach consistently surpassed extractive-only and abstractive-only baselines. These results highlight the effectiveness of the hybrid approach and its potential for developing more adaptive video summarization in the future
Improving Vehicle Payment Method Classification Using XGBoost with SMOTE and SHAP Interpretation
Class imbalance in vehicle payment method classification can cause predictive models to become biased toward the majority. This study aims to build a classification model for automotive consumer payment methods using Extreme Gradient Boosting (XGBoost), with class balancing handled through the Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN), and model interpretability performed using SHAP (SHapley Additive Explanations). The dataset consisted of 11,011 records and 13 attributes derived from Toyota vehicle delivery order transactions. Results show that the XGBoost model without balancing achieved 67.37% accuracy but only 0.24 recall for the Cash class. After applying SMOTE, the recall for the Cash class improved to 0.58, while ADASYN produced a similar improvement at 0.59, with overall accuracy maintained at around 61–62% and a stable ROC-AUC of 0.65. Feature importance and SHAP analysis identified c_vehicle_model and c_city as the most influential factors in predicting the payment method. From a business perspective, the improved ability to detect cash customers reduces the risk of misclassification and enables dealers to better segment customer payment preference. This supports more effective marketing campaigns, sales strategies, and financing risk management. The combination of XGBoost, SMOTE, ADASYN, and SHAP has proven effective in handling imbalanced data while offering transparent interpretability of predictions, making it a practical foundation for data-driven decision-making in the automotive industry
Optimization of a New Adaptive Stacking Ensemble Model Integrated with IoT for Stress Level Detection Based on Physiological Signals
Mental health issues among college students are receiving increasing attention, particularly because of academic and social pressures and the impact of technology use. This study aims to develop a real-time stress level prediction model using a New Adaptive Stacking Ensemble approach based on physiological data and IoT devices. The data included heart rate, SpO₂, body temperature, and systolic and diastolic blood pressure. Five machine learning algorithms are used as base models: SVM, C4.5, Decision Tree, KNN, and Random Forest. The MLP serves as the meta-model, which is then optimized using Optuna. The model training process begins with pre-processing, feature standardization using StandardScaler, and data balancing using SMOTE. The results showed that the stacking model with the MLP meta-model achieved an accuracy of 90.00% under the individual Random Forest and KNN models, and increased to 97.00% after hyperparameter optimization. This model was then integrated with IoT devices using MAX30102, MLX90614, and digital tensiometer sensors, as well as a Streamlit interface to display real-time stress classification results. The system built not only excels in accuracy but can also be implemented to directly detect stress levels, thereby potentially supporting early intervention and mental health promotion in campus environments
Plant Disease Identification Using Image Processing: A Systematic Literature Review
This article is a literature review focusing on plant disease identification using image processing techniques. This review aims to provide a comprehensive analysis of dataset sources, preprocessing methodologies, segmentation techniques, feature extraction processes, and various classification methods, along with their associated accuracies. It also discusses challenges encountered and potential future research directions. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol, a literature search was conducted in the Scopus database to obtain primary studies. The search covered Scopus-indexed journals and proceedings published by IEEE, Elsevier, Springer, MDPI, and ACM between 2019 and 2025. The initial identification phase yielded 9,286 studies screened. Further screening was performed based on specific eligibility criteria, including relevance to the topic, year of publication, subject area, document type, and articles written in English, resulting in the selection of 82 studies for the review. The findings indicate that the most commonly used dataset is PlantVillage, followed by field data. The dominant preprocessing techniques include image enhancement and augmentation. For segmentation and feature extraction, the most frequently used methods were k-means and CNN, respectively. Sixty-one studies achieved an accuracy exceeding 90%. However, several key challenges remain: data limitations, methodological issues, and practical constraints. Future research should focus on developing more representative datasets, hybrid approaches that integrate classical and deep learning methods, and lightweight, adaptive decision support systems suitable for real-world agricultural applications. This review supports continued progress in this field by providing valuable insights for researchers developing image-based methods for identifying plant diseases
Ant Colony Optimization for Jakarta Historical Tours: A Comparative Analysis of GPS and Map Image Approaches
The Traveling Salesman Problem (TSP) is a problem that represents a difficult combinatorial optimization problem starting from practical problems. The ant colony optimization (ACO) algorithm is implemented in several topics, particularly in solving combinatorial optimization problems. ACO is inspired by the behavior of ants in searching for the shortest path between a food source and their nest. In this research, ACO is used to find the best path or traveling salesman problem for museums and historical sites in Jakarta capital city of Indonesia. This research employs an approach based on the location's coordinates or latitude and longitude, while another method depends on coordinate data obtained from a supplied map image. After implementing both models, it can be concluded that the ACO model is not very good at solving TSP using actual coordinates. Meanwhile, the algorithm can quickly find near-optimal paths when using coordinates from a map image. The algorithm generates the optimal path in 11 seconds, reducing the initial distance from 17.938 to 4.430, using 4.731 ants and 75 trips with a distance power of 1. Statistical random variation was also performed, which proved that the algorithm is flexible and reliable when tested under various conditions
Real-Time Location Monitoring and Routine Reminders Based on Internet of Things Integrated with Mobile for Dementia Disorder
The increasing number of dementia sufferers worldwide demands a new approach to monitoring daily activities and locations to reduce the risk of getting lost. This study develops a real-time location monitoring and routine reminder system based on the Internet of Things (IoT), integrated with a mobile application. The system is designed to assist individuals with dementia, particularly elderly and younger adults with cognitive impairments, in performing daily routines independently, while providing a sense of security for families and caregivers through real-time location tracking features. This technology utilizes GPS for accurate location monitoring, daily activity reminders, and automatic notifications for caregivers in case of deviations from usual routes. The system development includes prototype creation that consisting of a mobile application and IoT tools such as the ESP32 WROOM microcontroller, Ublox Neo6M V2 GPS module, and SIM800L V2 GSM module. Functionality testing and impact evaluation were conducted to assess its effectiveness in improving the quality of life for dementia sufferers and facilitating monitoring for caregivers. With features such as daily reminders, emergency contacts, and real-time data integration, this system is intended not only for dementia patients but also for families and caregivers seeking tools to ensure the safety and comfort of the sufferers. It is expected that this research will enhance the independence of dementia patients in performing daily activities and provide innovative solutions through IoT technology to improve well-being across different age groups
Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
Most of the hate speech and abusive content on social media, particularly in the Indonesian language, presents significant challenges for content moderation systems. Previous research has applied machine learning models such as Recurrent Neural Networks (RNN), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) to address this issue. However, these approaches are limited in their ability to capture the relational and contextual nuances inherent in the data, resulting in suboptimal performance. This study introduces an approach by combining Graph Neural Networks (GNN) with Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction to improve hate speech detection on Twitter (platform X). The dataset consists of 13,169 Indonesian tweets, manually labeled for hate speech and abusive categories. Preprocessing steps include text cleaning, stemming, stop-word removal, and normalization. The GNN model achieved superior results, with accuracy scores of 92.90% for Abusive and 89.78% for Hate Speech, significantly outperforming the RNN model, which achieved accuracy of 86.09% and 86.15%, respectively. This study highlights the advantage of graph-based approaches in capturing complex relationships within text data. Future research can explore expanding datasets to include regional dialects and integrating advanced feature extraction techniques like Word2Vec or BERT. This study establishes a robust framework for improving hate speech detection, offering a valuable contribution to safer digital environments
Multi-Process Data Mining with Clustering and Support Vector Machine for Corporate Recruitment
Having an efficient and accurate recruitment process is very important for a company to attract candidates with professionalism, a high level of loyalty, and motivation. However, the current selection method often faces problems due to the subjectivity of assessing prospective employees and the long process of deciding on the best candidate. Therefore, this research aims to optimize the recruitment process by applying data mining techniques to improve efficiency and accuracy in candidate selection. The method used in this research utilizes a multi-process Data Mining approach, which is a combination of clustering and classification algorithms sequentially. In the initial stage, the K-Means algorithm is applied to cluster candidates based on administrative selection data, such as document completeness and reference support. Next, a classification model was built using a Support Vector Machine (SVM) to categorize the best candidates based on the results of psychological tests, medical tests, and interviews. The experimental results show that the SVM model produces high evaluation scores, with an AUC of 87%, Classification Accuracy (CA) of 90%, F1-score of 89%, Precision of 91%, and Recall of 90%. With these results, it can be concluded that this model is able to improve accuracy in the employee selection process and help companies make more measurable and data-based recruitment decisions
UDAWA Gadadar: Agent-based Cyber-physical System for Universal Small-scale Horticulture Greenhouse Management System
Digitalization in agriculture is becoming increasingly important for improving efficiency and sustainability, but small-scale farmers often face difficulties in adopting digital technologies because of various constraints. This study proposes an open-source intelligent system platform called UDAWA (Universal Digital Agriculture Workflow Assistant) to assist small-scale farmers in digitizing greenhouse management processes. The first variant of this platform, UDAWA Gadadar, was designed as a cyber-physical agent to control and monitor greenhouse instruments. UDAWA Gadadar was built using a 5C architecture approach and farmer-centric design thinking, utilizing an ESP32 microcontroller and a power sensor module to ensure performance and energy efficiency. The UDAWA Gadadar prototype was tested in a small-scale greenhouse with promising results, with an average remaining memory of 175 KB in the non-SSL mode and 122 KB in the SSL mode. Cost analysis indicates that this platform is relatively affordable for small-scale farmers, with a total component cost of USD 33.7 per unit. A decision matrix analysis involving five different greenhouse models in Pancasari Village, Buleleng Regency, Bali, showed that UDAWA Gadadar has high relevance and potential for adoption, particularly in models GH3 and GH5, with compatibility scores of 0.27. This study contributes to the development of appropriate and accessible digitalization solutions for small-scale agriculture, with future work focusing on developing other physical agent variants and a digital twin for enhanced cultivation simulations
XGBoost Algorithm for Cervical Cancer Risk Prediction: Multi-dimensional Feature Analysis
Cervical cancer continues to pose a significant global health challenge, with early detection remaining the cornerstone for effective intervention. This study is situated at the intersection of clinical oncology and computational intelligence, exploring the potential of gradient-boosting algorithms to overcome the limitations of conventional screening methodologies. An XGBoost model was developed to predict cervical cancer risk. This model incorporates demographic, behavioral, and clinical parameters. The model was developed using data from 858 patients at the Hospital Universitario de Caracas. The preprocessing pipeline was designed to address the complexities inherent in medical data, including strategic management of missing values and standardizing heterogeneous features. The model demonstrated an overall accuracy of 96.3%, with a sensitivity of 66.7% and a specificity of 97.6%. This performance profile indicates adept navigation of the delicate balance between missed diagnoses and unnecessary interventions. Feature importance analysis revealed a multifaceted risk landscape, where screening test results contributed substantial predictive power (approximately 60%), complemented by demographic and behavioral factors, including age, reproductive history, and contraceptive usage patterns. The confusion matrix analysis revealed the clinical implications of the model predictions, demonstrating a promising positive predictive value of 55.0% despite the pronounced class imbalance. These findings suggest that ensemble learning approaches can effectively synthesize diverse patient data into meaningful risk assessments, potentially enhancing screening efficiency through personalized stratification. Future research directions include prospective validation across diverse populations, integration of longitudinal data, and further exploration of explainable AI techniques to bridge the gap between algorithmic predictions and clinical implementation