Jurnal Politeknik Negeri Batam (PoliBatam)
Not a member yet
3001 research outputs found
Sort by
Comparative Study of LSTM and GRU Accuracy in Predicting BBRI Stock Closing Price
Stock price forecasting plays an important role in supporting investment decision-making in volatile financial markets. This study compares the performance of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models in predicting the closing price of PT Bank Rakyat Indonesia (BBRI.JK) stock using daily closing price data from Yahoo Finance for the period November 2, 2020, to October 30, 2025. The research methodology includes data collection, preprocessing, model development, and evaluation. The results show that the GRU model outperforms LSTM in prediction accuracy, achieving an RMSE of 90.14, MAPE of 1.86%, and MAE of 68.89, while LSTM records an RMSE of 111.00, MAPE of 2.37%, and MAE of 87.55. In terms of computational efficiency, LSTM requires less training time (343.57 seconds) compared to GRU (471.98 seconds). The Diebold–Mariano test yields a DM statistic of 1.9949 with a p-value of 0.0461, indicating a statistically significant difference in predictive accuracy, where GRU produces lower prediction errors. This study provides empirical insights into the trade-off between accuracy and computational efficiency of deep learning models for stock price forecasting.Stock price forecasting plays an important role in supporting investment decision-making in volatile financial markets. This study compares the performance of Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models in predicting the closing price of PT Bank Rakyat Indonesia (BBRI.JK) stock using daily closing price data from Yahoo Finance for the period November 2, 2020, to October 30, 2025. The research methodology includes data collection, preprocessing, model development, and evaluation. The results show that the GRU model outperforms LSTM in prediction accuracy, achieving an RMSE of 90.14, MAPE of 1.86%, and MAE of 68.89, while LSTM records an RMSE of 111.00, MAPE of 2.37%, and MAE of 87.55. In terms of computational efficiency, LSTM requires less training time (343.57 seconds) compared to GRU (471.98 seconds). The Diebold–Mariano test yields a DM statistic of 1.9949 with a p-value of 0.0461, indicating a statistically significant difference in predictive accuracy, where GRU produces lower prediction errors. This study provides empirical insights into the trade-off between accuracy and computational efficiency of deep learning models for stock price forecasting
The Impact of the L1/L2 Ratio on Selection Stability and Solution Sparsity along the Elastic Net Regularization Path in High-Dimensional Genomic Data
High-dimensional genomic datasets (p>n) pose persistent challenges for predictive modeling and biomarker-oriented feature selection due to multicollinearity and instability of selected feature sets under resampling. Although Elastic Net is widely used to address correlated predictors via combined L1/L2 regularization, the practical role of the L1/L2 mixing ratio (α) is often treated as a secondary tuning choice driven primarily by predictive accuracy. This study investigates how varying α shapes the trade-off among selection stability, solution sparsity, and predictive performance along the Elastic Net regularization path. Experiments were conducted using the publicly available METABRIC breast cancer cohort (n = 1,964) with 21,113 gene expression features and a binary overall survival status outcome. Logistic regression with Elastic Net penalty was fitted across a grid of α values, with the regularization strength (λ) selected by cross-validation. Feature selection stability was evaluated under repeated resampling using the Jaccard index, Dice coefficient, and Adjusted Rand Index (ARI), while sparsity was summarized by the average number of non-zero coefficients; predictive performance was assessed using AUC, accuracy, and F1-score. Results show a monotonic decline in stability as α increases: α = 0.2 yields the highest stability (Jaccard 0.324, Dice 0.487, ARI 0.434), whereas LASSO (α = 1.0) produces the lowest stability (Jaccard 0.278, Dice 0.431, ARI 0.400). In contrast, predictive performance varies only marginally across α (AUC 0.696–0.704; accuracy 0.666–0.671; F1-score 0.738–0.742), while sparsity changes substantially (average selected features 110–204). Coefficient path analyses further illustrate abrupt shrinkage under LASSO versus smoother, group-preserving shrinkage under Elastic Net, consistent with improved reproducibility under lower-to-moderate α. Frequency-of-selection analysis highlights genes repeatedly selected across resampling, supporting interpretability of stable configurations without claiming causal biomarker validity. Overall, the findings demonstrate that α is a substantive modeling choice that materially affects stability and sparsity even when accuracy is similar, motivating stability-aware tuning for high-dimensional genomic prediction and reproducible feature discovery
Interpretable Ensemble Models for Lifestyle-Based Sleep Disorder Prediction
Sleep disorders are a major global health concern that affect cognitive performance, mental well-being, and long-term physiological health. Conventional diagnostic methods such as polysomnography are time-consuming and resource-intensive, limiting their use for large-scale early detection. Therefore, machine learning offers a practical alternative for predictive and data-driven sleep disorder analysis. This study compares the performance of four ensemble learning algorithms Random Forest, Gradient Boosting, AdaBoost, and XGBoost in predicting sleep disorders based on lifestyle and physiological factors using the Sleep Health and Lifestyle dataset consisting of 374 samples and three classes: insomnia, none, and sleep apnea. The research workflow includes data preprocessing, feature encoding, dataset splitting (70:30), and hyperparameter optimization using Grid Search with 5-fold Cross Validation to improve model stability and generalization given the limited dataset size. Model evaluation is conducted using accuracy, precision, recall, and F1-score calculated with a macro-average approach to ensure fair multi-class performance assessment. The results show that AdaBoost and XGBoost achieve the highest test accuracy of 90.27%, while Random Forest and Gradient Boosting obtain 89.38%. Performance differences among models are relatively small (±1%) but indicate consistent predictive behavior. Feature importance analysis identifies BMI category and systolic blood pressure as the most influential predictors, followed by occupation and physical activity level, highlighting the relevance of lifestyle and physiological factors in sleep disorder risk. Overall, this study demonstrates that ensemble learning models provide reliable predictive performance and interpretable insights to support early detection of sleep disorders based on lifestyle patterns.Sleep disorders are a major global health concern that affect cognitive performance, mental well-being, and long-term physiological health. Conventional diagnostic methods such as polysomnography are time-consuming and resource-intensive, limiting their use for large-scale early detection. Therefore, machine learning offers a practical alternative for predictive and data-driven sleep disorder analysis. This study compares the performance of four ensemble learning algorithms Random Forest, Gradient Boosting, AdaBoost, and XGBoost in predicting sleep disorders based on lifestyle and physiological factors using the Sleep Health and Lifestyle dataset consisting of 374 samples and three classes: insomnia, none, and sleep apnea. The research workflow includes data preprocessing, feature encoding, dataset splitting (70:30), and hyperparameter optimization using Grid Search with 5-fold Cross Validation to improve model stability and generalization given the limited dataset size. Model evaluation is conducted using accuracy, precision, recall, and F1-score calculated with a macro-average approach to ensure fair multi-class performance assessment. The results show that AdaBoost and XGBoost achieve the highest test accuracy of 90.27%, while Random Forest and Gradient Boosting obtain 89.38%. Performance differences among models are relatively small (±1%) but indicate consistent predictive behavior. Feature importance analysis identifies BMI category and systolic blood pressure as the most influential predictors, followed by occupation and physical activity level, highlighting the relevance of lifestyle and physiological factors in sleep disorder risk. Overall, this study demonstrates that ensemble learning models provide reliable predictive performance and interpretable insights to support early detection of sleep disorders based on lifestyle patterns
EDCST-Rain: Enhanced Density-Aware Cross-Scale Transformer for Robust Object Classification Under Diverse Rainfall Conditions
Rain degradation significantly impairs object classification systems, causing accuracy drops of 40-60% under severe conditions and limiting autonomous vehicle deployment. While preprocessing approaches attempt deraining before classification, they suffer from error propagation and computational overhead. This paper introduces EDCST-Rain, an Enhanced Density-Aware Cross-Scale Transformer specifically designed for robust classification under diverse rain conditions. The architecture consists of five integrated components: a Rain Density Encoding Module that captures rain streak density, accumulation, and orientation; a Swin-Tiny Backbone for hierarchical feature extraction; and three rain-specific mechanisms: directional attention modules adapting to rain streak orientation, accumulation-aware processing handling lens droplet distortions, and adaptive cross-scale fusion integrating multi-resolution information. We develop a comprehensive physics-based rain simulation framework covering four rain types (drizzle, moderate, heavy, storm) and implement a curriculum learning strategy that progressively introduces rain complexity during training. Extensive experiments on CIFAR-10 demonstrate that EDCST-Rain achieves 83.1% clean accuracy while maintaining 71.8% under severe rain (86.4% retention), representing a 10-percentage-point improvement over state-of-the-art methods. With 15.8 million parameters and a 14.3 ms GPU inference time, enabling real-time operation, EDCST-Rain provides a practical, weather-robust perception framework suitable for autonomous systems operating under adverse weather conditions
Performance of Load Balancing Algorithms on Homogeneous and Heterogeneous Servers in On-Premise Environments
This research evaluates the performance of Round Robin, IP Hash, and Random Allocation algorithms in a homogeneous server environment, as well as Least Response Time, Least Connection, and Weighted Least Connection algorithms in a heterogeneous server environment implemented on on-premise servers. This study was motivated by the need to improve traffic management efficiency in local server infrastructure, where system performance is greatly influenced by resource diversity and distribution strategies. The experimental method was applied using NGINX and NGINX Plus as load balancing platforms, with Apache JMeter as a testing tool with low, medium, and high load test scenarios, while Netdata monitored the load distribution in real-time. Performance evaluation was based on six key metrics: throughput, latency, error rate, load distribution, CPU usage, and memory consumption. The results show that in a homogeneous environment, static algorithms such as Round Robin, IP Hash, and Random Allocation maintain stable performance with consistent throughput and minimal latency. Conversely, in a heterogeneous environment, dynamic algorithms, such as Weighted Least Connection, achieve lower latency and more balanced resource utilization. These findings highlight that algorithm selection must match system characteristics: static algorithms are more suitable for small-scale, uniform deployments, while dynamic approaches are recommended for heterogeneous or large-scale systems that require adaptive load management. Overall, weight-based dynamic approaches demonstrate superior scalability and resilience under high workloads.This research evaluates the performance of Round Robin, IP Hash, and Random Allocation algorithms in a homogeneous server environment, as well as Least Response Time, Least Connection, and Weighted Least Connection algorithms in a heterogeneous server environment implemented on on-premise servers. This study was motivated by the need to improve traffic management efficiency in local server infrastructure, where system performance is greatly influenced by resource diversity and distribution strategies. The experimental method was applied using NGINX and NGINX Plus as load balancing platforms, with Apache JMeter as a testing tool with low, medium, and high load test scenarios, while Netdata monitored the load distribution in real-time. Performance evaluation was based on six key metrics: throughput, latency, error rate, load distribution, CPU usage, and memory consumption. The results show that in a homogeneous environment, static algorithms such as Round Robin, IP Hash, and Random Allocation maintain stable performance with consistent throughput and minimal latency. Conversely, in a heterogeneous environment, dynamic algorithms, such as Weighted Least Connection, achieve lower latency and more balanced resource utilization. These findings highlight that algorithm selection must match system characteristics: static algorithms are more suitable for small-scale, uniform deployments, while dynamic approaches are recommended for heterogeneous or large-scale systems that require adaptive load management. Overall, weight-based dynamic approaches demonstrate superior scalability and resilience under high workloads
Optimizing Sentiment Classification Models for TikTok Comments using Emotion-Based Preprocessing and Grid Search
TikTok has become one of the social media platforms with a significant influence on public opinion formation in Indonesia. However, the linguistic characteristics of user comments which are expressive, concise, and feature emotional forms like emojis, emoticons, and excessive capitalization pose challenges for sentiment analysis. This research aims to optimize a sentiment classification model for TikTok comments using emotion-based preprocessing and hyperparameter optimization via Grid Search. The dataset comprises 4,500 comments from three different time periods discussing the Minister of Finance, Purbaya Yudhi Sadewa. Three testing scenarios were conducted: common preprocessing, emotion-based preprocessing, and a combination of emotion-based preprocessing with Grid Search. The results indicate that emotion-based preprocessing improved model accuracy by 4–5%, while Grid Search optimization provided an additional increase of up to 3%, achieving a peak F1-score of 0.92 with the LightGBM model. Analysis based on sentiment time-periods reveals that across the three different periods, sentiments remained predominantly positive. The integration of emotion-based processing and parameter tuning proved effective in enhancing the model\u27s ability to understand emotional variations in text and to map periodic changes in public sentiment on Indonesian-language social media
Analysis of Gradient Boosted Trees Algorithm in Breast Cancer Classification
Early and accurate classification of breast cancer is essential to support clinical diagnostic processes and improve patient outcomes. This study proposes a comprehensive machine learning pipeline based on Gradient Boosted Tree algorithms to classify breast tumors into benign and malignant categories. The proposed framework integrates several preprocessing stages, including outlier handling using the Local Outlier Factor (LOF), feature normalization with StandardScaler, class imbalance handling using SMOTE, and feature selection through ANOVA-based SelectKBest. Five ensemble learning models—XGBoost, LightGBM, CatBoost, HistGradientBoosting, and GradientBoosting—were trained and evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results show that all models achieved strong and comparable classification performance. Among them, CatBoost obtained the highest ROC-AUC value of 0.9960, along with an accuracy of 0.9649, precision of 0.9750, recall of 0.9286, and F1-score of 0.9512. Statistical evaluation using the DeLong test indicated that the differences in ROC-AUC among the evaluated models were not statistically significant (p > 0.05), suggesting similar discriminative capabilities across models. To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied to the CatBoost model as a representative classifier. The results show that features related to nuclear size and shape, such as radius, area, perimeter, and concavity, contributed most significantly to malignant predictions. This study demonstrates that the integration of robust preprocessing techniques, Gradient Boosted Tree models, and explainable machine learning provides an accurate and interpretable approach for breast cancer classification. However, the evaluation was conducted on a single public dataset without external validation, and further studies using independent and real-world datasets are required before clinical deployment.Early and accurate classification of breast cancer is essential to support clinical diagnostic processes and improve patient outcomes. This study proposes a comprehensive machine learning pipeline based on Gradient Boosted Tree algorithms to classify breast tumors into benign and malignant categories. The proposed framework integrates several preprocessing stages, including outlier handling using the Local Outlier Factor (LOF), feature normalization with StandardScaler, class imbalance handling using SMOTE, and feature selection through ANOVA-based SelectKBest. Five ensemble learning models—XGBoost, LightGBM, CatBoost, HistGradientBoosting, and GradientBoosting—were trained and evaluated using accuracy, precision, recall, F1-score, and ROC-AUC metrics. The experimental results show that all models achieved strong and comparable classification performance. Among them, CatBoost obtained the highest ROC-AUC value of 0.9960, along with an accuracy of 0.9649, precision of 0.9750, recall of 0.9286, and F1-score of 0.9512. Statistical evaluation using the DeLong test indicated that the differences in ROC-AUC among the evaluated models were not statistically significant (p > 0.05), suggesting similar discriminative capabilities across models. To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied to the CatBoost model as a representative classifier. The results show that features related to nuclear size and shape, such as radius, area, perimeter, and concavity, contributed most significantly to malignant predictions. This study demonstrates that the integration of robust preprocessing techniques, Gradient Boosted Tree models, and explainable machine learning provides an accurate and interpretable approach for breast cancer classification. However, the evaluation was conducted on a single public dataset without external validation, and further studies using independent and real-world datasets are required before clinical deployment
Network-Informed Optimal Control via Graph Neural Networks: A Framework with Application to Tax Enforcement
This paper introduces a novel framework integrating multiplex network theory, machine learning, and optimal control to optimize tax revenue dynamics in the Democratic Republic of Congo (DRC). We model the Congolese economy as a multiplex network where economic sectors represent interdependent layers. Using machine learning techniques on empirical tax data (2000-2024), we reconstruct network topology and identify systemic sectors. Our network informed optimal control approach demonstrates potential revenue increases of 25-35% with 30-40% volatility reduction. The framework provides actionable insights for the upcoming transition to Corporate Income Tax (CIT) and offers a replicable methodology for developing economies
Analyzing Compost Fermentation Accuracy Through Fuzzy Logic and R-Square Techniques
The accumulation of unmanaged organic waste remains a critical environmental issue, highlighting the need for technological support to improve composting efficiency and monitoring. This study proposes an Internet of Things (IoT)-based system for monitoring compost fermentation conditions using temperature and humidity sensors, combined with Fuzzy Logic and R-square (R²) analysis to evaluate fermentation quality. The system employs a DHT11 sensor integrated with an ESP8266 microcontroller to collect temperature and humidity data in real time over a 20-day observation period, resulting in 1,008 data points. Fuzzy Logic is applied through fuzzification, rule-based inference, and defuzzification to classify compost conditions into four categories: poor, good, very good, and cooling needed. The model’s performance is further validated using multiple linear regression, with temperature and humidity as independent variables and average temperature as the dependent variable. The results show that compost temperature ranged between 28–32°C and humidity between 50–87%, indicating that the fermentation process was predominantly in the mesophilic or early composting phase. The fuzzy inference results demonstrate that most conditions fell within the “good” category, while the R² value of 0.87 indicates a strong relationship between the observed variables. These findings confirm that the integration of IoT, Fuzzy Logic, and statistical analysis is effective as a real-time monitoring and decision support system for compost management, while also highlighting the need for additional parameters to achieve a more comprehensive compost quality assessment.The accumulation of unmanaged organic waste remains a critical environmental issue, highlighting the need for technological support to improve composting efficiency and monitoring. This study proposes an Internet of Things (IoT)-based system for monitoring compost fermentation conditions using temperature and humidity sensors, combined with Fuzzy Logic and R-square (R²) analysis to evaluate fermentation quality. The system employs a DHT11 sensor integrated with an ESP8266 microcontroller to collect temperature and humidity data in real time over a 20-day observation period, resulting in 1,008 data points. Fuzzy Logic is applied through fuzzification, rule-based inference, and defuzzification to classify compost conditions into four categories: poor, good, very good, and cooling needed. The model’s performance is further validated using multiple linear regression, with temperature and humidity as independent variables and average temperature as the dependent variable. The results show that compost temperature ranged between 28–32°C and humidity between 50–87%, indicating that the fermentation process was predominantly in the mesophilic or early composting phase. The fuzzy inference results demonstrate that most conditions fell within the “good” category, while the R² value of 0.87 indicates a strong relationship between the observed variables. These findings confirm that the integration of IoT, Fuzzy Logic, and statistical analysis is effective as a real-time monitoring and decision support system for compost management, while also highlighting the need for additional parameters to achieve a more comprehensive compost quality assessment
Hybrid Rainfall Analysis in Semarang by Integrating SARIMA Predictions with Meteorological Association Rules
Climate variability necessitates advanced analytical approaches to understand irregular rainfall patterns, particularly in coastal cities like Semarang, Central Java. This research employs a dual-analysis framework combining the Seasonal Autoregressive Integrated Moving Average (SARIMA) model and the Apriori algorithm to forecast rainfall and uncover hidden meteorological associations. Analyzing BMKG monthly climatological data from January 2020 to December 2024, the research addresses both temporal trends and variable dependencies. The SARIMA 〖(1,0,0)(2,1,0)〗_12 model projected rainfall dynamics for 2025, identifying critical wet periods (January-March, November-December) and dry intervals (July-September), achieving a MAPE of 44.97%. To complement temporal forecasting, the Apriori algorithm was applied with 50% minimum support and 50% confidence, generating association rules from daily discretized meteorological data. Results reveal that the combination of low temperature (Tx_Low, Tn_Low) and moderate wind speed (FFx_Medium) exhibits the strongest correlation with heavy rainfall events Lift Ratio 12.34, indicating a 12-fold increased risk compared to random conditions. By synergizing temporal forecasting with the identification of meteorological triggers, this research offers a robust basis for early warning systems, supporting flood mitigation and water resource management strategies in Semarang