Jurnal Politeknik Negeri Batam (PoliBatam)
Not a member yet
    3001 research outputs found

    Weight Estimation of Broiler Ducks Based on Image Processing and Machine Learning with IoT Integration

    No full text
    The broiler duck farming industry in Indonesia faces challenges in efficiently monitoring body weight, as traditional manual weighing methods are labor-intensive, time-consuming, and stressful for the animals. To address this issue, this study aims to develop a non-invasive and automated weight estimation system that integrates digital image processing, machine learning, and Internet of Things (IoT) technologies. The methodology involves acquiring multi-angle images of ducks, applying preprocessing steps such as resizing, normalization, and contrast enhancement, and extracting hand-crafted features, including Histogram of Oriented Gradients (HOG) and HSV color histograms. These features are then fused, reduced via Principal Component Analysis (PCA), and processed using a Support Vector Regression (SVR) model with optimized hyperparameters for weight prediction. While previous studies have focused on cattle, broilers, or fish, research specifically targeting meat-type ducks remains limited, particularly those that combine image-based regression with IoT-enabled real-time monitoring. Experimental results demonstrate that the proposed system achieves a mean absolute error (MAE) of approximately 110 grams on the validation set, with per-duck averaging improving stability compared to per-image predictions. Visualization through scatter plots, boxplots, and learning curves further confirms that the model effectively captures general weight distribution trends but exhibits higher errors in certain mid-weight ranges. The integration with IoT facilitates continuous, stress-free monitoring of duck growth, underscoring the system’s potential as a practical and sustainable solution for precision livestock farming

    A Fuzzy C-Means–Based Clustering Model for Analyzing TOEFL Prediction Scores in Higher Education

    No full text
    In the era of digital transformation, the application of data mining in academic data management has become an important requirement for improving the quality of education. One crucial aspect is English proficiency. One of the tools for measuring English proficiency is the Test of English as a Foreign Language (TOEFL) Prediction test, which is administered at every university, including the State Polytechnic of Lhokseumawe. The management of TOEFL Prediction scores can utilize data mining as a basis for more in-depth learning analysis, as well as evaluation material. This study aims to design and develop a model for grouping the TOEFL scores of students at State Polytechnic of Lhokseumawe by applying the Fuzzy C-Means (FCM) algorithm. The research methods included observation and interviews, data collection and pre-processing, cluster model design, web-based system development, and system testing. Evaluation was conducted through Black Box and White Box testing for the system, as well as cluster quality validation using the Xie-Beni Index (XB) and Partition Coefficient. The results showed that the pre-test dataset of first-year students (651 data) produced three clusters with an XB value of 0.623, while the dataset of final-year students (826 data) produced six clusters with an XB value of 0.181. The developed model proved to be able to map students\u27 English language abilities in a more structured manner and could be used as a basis for academic planning and skill improvement

    Comprehensive Comparison of TF-IDF and Word2Vec in Product Sentiment Classification Using Machine Learning Models

    No full text
    Sentiment analysis supports data-driven decisions by turning product reviews into reliable polarity labels. We compare four text representations, TF-IDF, TF-IDF reduced via SVD, Word2Vec (trained from scratch), and a hybrid TF-IDF(SVD-300). Word2Vec, for sentiment classification of Indonesian Shopee product reviews from Kaggle (~2.5k texts). After normalization (with optional emoji handling and Indonesian stemming), ratings are mapped to binary sentiment (≤2 negative, ≥4 positive; 3 discarded). Each representation is evaluated with Logistic Regression, Support Vector Machines (linear/RBF), Naive Bayes, and Random Forest under stratified 5-fold cross-validation. TF-IDF with Logistic Regression (C=1.0) yields the best results (F1-macro = 0.816 ± 0.026; Accuracy = 0.816 ± 0.026), with LinearSVC as a strong runner-up. Word2Vec (scratch) performs lower, consistent with limited data being insufficient to learn stable embeddings, while the hybrid representation offers only modest gains over Word2Vec and does not surpass TF-IDF. These findings indicate that TF-IDF is the most reliable and consistent representation for small, short-text review datasets, and they underscore the impact of feature design on downstream classification performance

    Addressing Extreme Class Imbalance in Multilingual Complaint Classification Using XLM-RoBERTa

    No full text
    Government complaint management systems often suffer from extreme class imbalance, where a few public service categories accumulate most reports while many others remain under-represented. This research examines whether simple class weighting can improve fairness in multilingual transformer models for automatic routing of Indonesian citizen complaints on the LaporGub Central Java e-governance platform. The dataset comprises 53,877 Indonesian-language complaints spanning 18 service categories with an imbalance ratio of about 227:1 between the largest and smallest classes. After cleaning and deduplication, we stratify the data into training, validation, and test sets. We compare three approaches: (i) a linear support vector machine (SVM) with term frequency inverse document frequency (TF-IDF) unigram and bigram and class-balanced weights, (ii) a cross-lingual RoBERTa (XLM-RoBERTa-base) model without class weighting, and (iii) an XLM-RoBERTa-base model with a class-weighted cross-entropy loss. Fairness is operationalised as equal importance for categories and quantified primarily using the macro-averaged F1-score (Macro-F1), complemented by per-class F1, weighted F1, and accuracy. The unweighted XLM-RoBERTa model outperforms the SVM baseline in Macro-F1 (0.610 vs 0.561). The class-weighted variant attains similar Macro-F1 (0.608) while redistributing performance towards minority categories. Analysis shows that class weighting is most beneficial for categories with a few hundred to several thousand samples, whereas extremely rare categories with fewer than 200 complaints remain difficult for all models and require additional data-centric interventions. These findings demonstrate that multilingual transformer architectures combined with simple class weighting can provide a more balanced backbone for automated complaint routing in Indonesian e-government, particularly for low- and medium-frequency service categories

    Implementation of Real-Time Swarm Drone Formation Using Firebase and MIT App Inventor with Interpolation-Based Control in Gazebo

    No full text
    This paper presents the implementation of a real-time swarm drone formation control system that leverages Firebase as the communication bridge and MIT App Inventor as the user interface. The simulation is conducted in the Gazebo environment with five quadcopter drones. Formation commands are sent from an Android application to Firebase, then processed by a Python-based ROS node to adjust drone positions. Four primary formations - line, triangle, circle, and star - are implemented, along with a dynamic mode enabling sequential transitions among multiple patterns. The integration of linear interpolation ensures smooth transitions, consistent timing, and stable hovering. Experimental results show an average response delay of 0.4–0.6 seconds and stable altitude at 3.5 meters. This approach demonstrates an intuitive and scalable swarm control method. Future enhancements may include telemetry feedback, Firebase authentication, and PID tuning to optimize control accuracy. &nbsp

    Performance Comparison of Naive Bayes and Support Vector Machine Methods in Music Genre Classification Based on Audio Signal Feature Extraction Using Mel-Frequency Cepstral Coefficients (MFCC)

    No full text
    Music genre classification has gained increasing attention with the emergence of digital music platforms. One of the relevant features extracted from audio signals is Mel-Frequency Cepstral Coefficients (MFCC), which is widely recognized as an effective technique. MFCC features are extracted at the frame level and aggregated at the clip level to represent each music track, making them suitable for audio-based classification tasks. This study applies Naïve Bayes and Support Vector Machine (SVM) algorithms for classification using the GTZAN dataset consisting of 1,000 audio files from 10 music genres, each with a duration of 30 seconds. The performance of these methods is evaluated using accuracy, precision, recall, and F1-score. The results show that SVM demonstrates superior performance, achieving an accuracy of 95.25% compared to 50.37% for Naïve Bayes. This performance gap can be attributed to SVM’s ability to model non-linear decision boundaries and effectively handle high-dimensional MFCC feature spaces. The main contribution of this study lies in the systematic evaluation of multiple SVM kernel configurations and parameter settings, providing empirical insights into the robustness of classical machine learning methods for MFCC-based music genre classification. This study concludes that SVM is better than Naive Bayes in music genre classification with MFCC feature extraction

    Classification Analysis of Single Tuition Fees Using the Random Forest Method with K-Fold Cross Validation

    No full text
    Classification is the process of grouping data into specific categories based on their characteristics or features, which plays a crucial role in the analysis, decision-making, and prediction of new data. In academic settings, classification is used to determine the Single Tuition Fee to place students according to their economic ability. Lhokseumawe State Polytechnic has implemented the UKT system since 2020 with eight categories, but some students are still placed in UKT groups that do not match the results of the manual process, which has limited accuracy. This study uses the Random Forest method as a technology-based solution to improve the accuracy and objectivity of UKT classification. The dataset used consists of 10,000 student data with 10 variables, covering economic and social information. The research process includes data preprocessing, Random Forest model training, performance evaluation using accuracy, precision, recall, and F1-score, and model stability testing through 10-fold K-Fold Cross Validation. The results show that Random Forest is able to classify most UKT classes well, especially classes 0–5 and 7. Class 6 has lower performance with a recall of 0.39 and an F1-score of 0.56 due to the limited number of samples. The overall accuracy of the model reaches 96%, while K-Fold Cross Validation produces an average accuracy of 95.50% with a standard deviation of 0.66%, indicating the model is stable and able to generalize to new data. This study proves that Random Forest is effective in UKT classification, producing an objective, fair, and efficient system. This implementation model supports data-driven decision-making in higher education and increases transparency in UKT determination

    Implementing Defense-in-Depth Framework on Orange Pi NAS Using Host-Based Security and ZFS

    No full text
    Network-Attached Storage (NAS) based on low cost Single Board Computers (SBC) offers an affordable alternative to commercial storage systems, yet its exposure to network-based threats requires a robust and layered security approach. This research implements the Defense-in-Depth (DiD) framework on an Orange Pi based NAS running Debian 12, integrating host-based security mechanisms and the ZFS file system to enhance data integrity, availability, and system resilience. The security layers include firewall restrictions, intrusion prevention with Fail2Ban, integrity monitoring using AIDE and rkhunter, system auditing with Lynis, and log analysis with Logwatch. Additionally, ZFS snapshots and the Sanoid retention policy are applied to provide rapid data recovery with minimal storage overhead. Experimental results show that all defense layers function effectively under testing scenarios such as brute-force attempts, unauthorized port access, file modification, and data deletion. ZFS snapshots successfully restore deleted or altered files, ensuring minimal Recovery Point Objective (RPO) of one hour. System performance remained stable, with CPU usage averaging only 7.9% and memory usage at 33%, indicating that the DiD model is feasible even on low-resource SBC hardware. These findings demonstrate that a cost-efficient SBC-based NAS can achieve strong resilience against common cyber threats through layered security design and modern file system capabilities

    Analysis of Gradient Boosting Algorithms with Optuna Optimization and SHAP Interpretation for Phishing Website Detection

    No full text
    Phishing remains a persistent cybersecurity threat, evolving rapidly to bypass traditional blacklist-based detection systems. Machine Learning (ML) approaches offer a promising solution, yet finding the optimal balance between detection accuracy and model interpretability remains a challenge. This study aims to evaluate and optimize the performance of three state-of-the-art Gradient Boosting algorithms—XGBoost, LightGBM, and CatBoost—for phishing website detection. The research utilizes the UCI Phishing Websites dataset consisting of 11,055 instances. The novelty of this study lies in the implementation of the Optuna framework with the Tree-structured Parzen Estimator (TPE) for automated hyperparameter optimization and the application of SHAP (Shapley Additive Explanations) interaction values to interpret the "black-box" models. The experimental results demonstrate that the LightGBM model, optimized via Optuna, achieved the highest performance with an F1-Score of 0.9798, outperforming the baseline model (0.9713) by 0.87%. Furthermore, SHAP analysis identified \u27SSLfinal_State\u27 as the most critical determinant for distinguishing phishing sites. This study confirms that optimizing modern boosting algorithms significantly enhances phishing detection capabilities while providing necessary explainability for cybersecurity analysts.Phishing remains a persistent cybersecurity threat, evolving rapidly to bypass traditional blacklist-based detection systems. Machine Learning (ML) approaches offer a promising solution, yet finding the optimal balance between detection accuracy and model interpretability remains a challenge. This study aims to evaluate and optimize the performance of three state-of-the-art Gradient Boosting algorithms—XGBoost, LightGBM, and CatBoost—for phishing website detection. The research utilizes the UCI Phishing Websites dataset consisting of 11,055 instances. The novelty of this study lies in the implementation of the Optuna framework with the Tree-structured Parzen Estimator (TPE) for automated hyperparameter optimization and the application of SHAP (Shapley Additive Explanations) interaction values to interpret the "black-box" models. The experimental results demonstrate that the LightGBM model, optimized via Optuna, achieved the highest performance with an F1-Score of 0.9798, outperforming the baseline model (0.9713) by 0.87%. Furthermore, SHAP analysis identified \u27SSLfinal_State\u27 as the most critical determinant for distinguishing phishing sites. This study confirms that optimizing modern boosting algorithms significantly enhances phishing detection capabilities while providing necessary explainability for cybersecurity analysts

    Optimizing Email Spam Detection through Handling Class Imbalance with Class Weights and Hyperparameter Using GridSearchCV

    No full text
    Email spam is a major problem in digital communication that can disrupt productivity, burden network resources, and pose a security threat. This research focuses on optimizing spam email detection using a machine learning approach by addressing class imbalance through class weighting and hyperparameter tuning using GridSearchCV. To improve model accuracy and sensitivity, a combination of diverse datasets is applied to provide a wider scope of training data. The models used in this study include Support Vector Machine (SVM), Random Forest, Multinomial Naive Bayes (MNB), and XGBoost. Evaluation is carried out based on metrics such as accuracy, precision, recall, and F1-score, before and after hyperparameter tuning. The experimental results show that SVM produces the highest accuracy after tuning, reaching 97.10%, compared to 96.73% before hyperparameter tuning. In addition, Random Forest, MNB, and XGBoost also show significant improvements, with each model achieving better performance after tuning. Overall, this study shows that dataset merging and class weight adjustment can significantly improve the model\u27s ability to detect spam, as well as provide a basis for implementing the model in a more effective email spam detection system

    2,280

    full texts

    3,001

    metadata records
    Updated in last 30 days.
    Jurnal Politeknik Negeri Batam (PoliBatam)
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇