International Journal of Advances in Intelligent Informatics
Not a member yet
235 research outputs found
Sort by
Constructing decision rules from naive bayes model for robust and low complexity classification
A large spectrum of classifiers has been described in the literature. One attractive classification technique is a Naïve Bayes (NB) which has been relayed on probability theory. NB has two major limitations: First, it requires to rescan the dataset and applying a set of equations each time to classify instances, which is an expensive step if a dataset is relatively large. Second, NB may remain challenging for non-statisticians to understand the deep work of a model. On the other hand, Rule-Based classifiers (RBCs) have used IF-THEN rules (henceforth, rule-set), which are more comprehensible and less complex for classification tasks. For elevating NB limitations, this paper presents a method for constructing a rule-set from the NB model, which serves as RBC. Experiments of the constructing rule-set have been conducted on (Iris, WBC, Vote) datasets. Coverage, Accuracy, M-Estimate, and Laplace are crucial evaluation metrics that have been projected to rule-set. In some datasets, the rule-set obtains significant accuracy results that reach 95.33 %, 95.17% for Iris and vote datasets, respectively. The constructed rule-set can mimic the classification capability of NB, provide a visual representation of the model, express rules infidelity with acceptable accuracy; an easier method to interpreting and adjusting from the original model. Hence, the rule-set will provide a comprehensible and lightweight model than NB itself
A data mining approach for classification of traffic violations types
Traffic summons, also known as traffic tickets, is a notice issued by a law enforcement official to a motorist, who is a person who drives a car, lorry, or bus, and a person who rides a motorcycle. This study is set to perform a comparative experiment to compare the performance of three classification algorithms (Naive Bayes, Gradient Boosted Trees, and Deep Learning algorithm) in classifying the traffic violation types. The performance of all the three classification models developed in this work is measured and compared. The results show that the Gradient Boosted Trees and Deep Learning algorithm have the best value in accuracy and recall but low precision. Naïve Bayes, on the other hand, has high recall since it is a picky classifier that only performs well in a dataset that is high in precision. This paper’s results could serve as baseline results for investigations related to the classification of traffic violation types. It is also helpful for authorities to strategize and plan ways to reduce traffic violations among road users by studying the most common traffic violation types in an area, whether a citation, a warning, or an ESERO (Electronic Safety Equipment Repair Order)
Intelligent feature selection using particle swarm optimization algorithm with a decision tree for DDoS attack detection
The explosive development of information technology is increasingly rising cyber-attacks. Distributed denial of service (DDoS) attack is a malicious threat to the modern cyber-security world, which causes performance disruption to the network servers. It is a pernicious type of attack that can forward a large amount of traffic to damage one or all target’s resources simultaneously and prevents authenticated users from accessing network services. The paper aims to select the least number of relevant DDoS attack detection features by designing an intelligent wrapper feature selection model that utilizes a binary-particle swarm optimization algorithm with a decision tree classifier. In this paper, the Binary-particle swarm optimization algorithm is used to resolve discrete optimization problems such as feature selection and decision tree classifier as a performance evaluator to evaluate the wrapper model’s accuracy using the selected features from the network traffic flows. The model’s intelligence is indicated by selecting 19 convenient features out of 76 features of the dataset. The experiments were accomplished on a large DDoS dataset. The optimal selected features were evaluated with different machine learning algorithms by performance measurement metrics regarding the accuracy, Recall, Precision, and F1-score to detect DDoS attacks. The proposed model showed a high accuracy rate by decision tree classifier 99.52%, random forest 96.94%, and multi-layer perceptron 90.06 %. Also, the paper compares the outcome of the proposed model with previous feature selection models in terms of performance measurement metrics. This outcome will be useful for improving DDoS attack detection systems based on machine learning algorithms. It is also probably applied to other research topics such as DDoS attack detection in the cloud environment and DDoS attack mitigation systems
An improved K-Nearest neighbour with grasshopper optimization algorithm for imputation of missing data
K-nearest neighbors (KNN) has been extensively used as imputation algorithm to substitute missing data with plausible values. One of the successes of KNN imputation is the ability to measure the missing data simulated from its nearest neighbors robustly. However, despite the favorable points, KNN still imposes undesirable circumstances. KNN suffers from high time complexity, choosing the right k, and different functions. Thus, this paper proposes a novel method for imputation of missing data, named KNNGOA, which optimized the KNN imputation technique based on the grasshopper optimization algorithm. Our GOA is designed to find the best value of k and optimize the imputed value from KNN that maximizes the imputation accuracy. Experimental evaluation for different types of datasets collected from UCI, with various rates of missing values ranging from 10%, 30%, and 50%. Our proposed algorithm has achieved promising results from the experiment conducted, which outperformed other methods, especially in terms of accuracy
Coloring picture fuzzy graphs through their cuts and its computation
In a fuzzy set (FS), there is a concept of alpha-cuts of the FS for alpha in [0,1]. Further, this concept was extended into (alpha,delta)-cuts in an intuitionistic fuzzy set (IFS) for delta in [0,1]. One of the expansions of FS and IFS is the picture fuzzy set (PFS). Hence, the concept of (alpha,delta)-cuts was developed into (alpha,delta,beta)-cuts in a PFS where beta is an element of [0,1]. Since a picture fuzzy graph (PFG) consists of picture fuzzy vertex or edge sets or both of them, we have an idea to construct the notion of the (alpha,delta,beta)-cuts in a PFG. The steps used in this paper are developing theories and algorithms. The objectives in this research are to construct the concept of (alpha,delta,beta)-cuts in picture fuzzy graphs (PFGs), to construct the (alpha,delta,beta)-cuts coloring of PFGs, and to design an algorithm for finding the cut chromatic numbers of PFGs. The first result is a definition of the (alpha,delta,beta)-cut in picture fuzzy graphs (PFGs) where (alpha,delta,beta) are elements of a level set of the PFGs. Further, some properties of the cuts are proved. The second result is a concept of PFG coloring and the chromatic number of PFG based on the cuts. The third result is an algorithm to find the cuts and the chromatic numbers of PFGs. Finally, an evaluation of the algorithm is done through Matlab programming. This research could be used to solve some problems related to theories and applications of PFGs
Evaluation of texture feature based on basic local binary pattern for wood defect classification
Wood defects detection has been studied a lot recently to detect the defects on the wood surface and assist the manufacturers in having a clear wood to be used to produce a high-quality product. Therefore, the defects on the wood affect and reduce the quality of wood. This research proposes an effective feature extraction technique called the local binary pattern (LBP) with a common classifier called Support Vector Machine (SVM). Our goal is to classify the natural defects on the wood surface. First, preprocessing was applied to convert the RGB images into grayscale images. Then, the research applied the LBP feature extraction technique with eight neighbors (P=8) and several radius (R) values. After that, we apply the SVM classifier for the classification and measure the proposed technique's performance. The experimental result shows that the average accuracy achieved is 65% on the balanced dataset with P=8 and R=1. It indicates that the proposed technique works moderately well to classify wood defects. This study will consequently contribute to the overall wood defect detection framework, which generally benefits the automated inspection of the wood defects
Reversible difference expansion multi-layer data hiding technique for medical images
Maintaining the privacy and security of confidential information in data communication has always been a major concern. It is because the advancement of information technology is likely to be followed by an increase in cybercrime, such as illegal access to sensitive data. Several techniques were proposed to overcome that issue, for example, by hiding data in digital images. Reversible data hiding is an excellent approach for concealing private data due to its ability to be applied in various fields. However, it yields a limited payload and the quality of the image holding data (Stego image), and consequently, these two factors may not be addressed simultaneously. This paper addresses this problem by introducing a new non-complexity difference expansion (DE) and block-based reversible multi-layer data hiding technique constructed by exploring DE. Sensitive data are embedded into the difference values calculated between the original pixels in each block with relatively low complexity. To improve the payload capacity, confidential data are embedded in multiple layers of grayscale medical images while preserving their quality. The experiment results prove that the proposed technique has increased the payload with an average of 369999 bits and kept the peak signal to noise ratio (PSNR) to the average of 36.506 dB using medical images' adequate security the embedded private data. This proposed method has improved the performance, especially the secret size, without reducing much the quality. Therefore, it is suitable to use for relatively big payloads
Adjusting cyber insurance premiums based on frequency in a communication network
This study compares cyber insurance premiums with and without a communication network effect frequency. As a cybersecurity factor, the frequency in a communication network influences the speed of cyberattack transmission. It means that a network or a high activity node is more vulnerable than a network with low activity. Traditionally, cyber insurance pricing considers historical data to set premiums or rates. Conversely, the network security level can evaluate using the Monte Carlo simulation based on the epidemic model. This simulation requires spreading parameters, such as infection rate, recovery rate, and self-infection rate. Our idea is to modify the infection rate as a function of the frequency in a communication network. The node-based model uses probability distributions for the communication mechanism to generate the data. It adopts the co-purchase network formation in market basket analysis for building weighted edges and nodes. Simulations are used to compare the initial and modified infection rates. This paper considered prism and Petersen graph topology as case studies. The relative difference is a metric to compare the significance of premium adjustment. The results show that the premium for a node with a low level in a communication network can reach 28.28% lower than the initial premium. The premium can reach 20.99% lower than the initial network premium for a network. Based on these results, insurance companies can adjust cyber insurance premiums based on computer usage to offer a more appropriate price
Hybrid approach redefinition with cluster-based instance selection in handling class imbalance problem
Class Imbalance problems often occur in the classification process, the existence of these problems is characterized by the tendency of a class to have instances that are much larger than other classes. This problem certainly causes a tendency towards low accuracy in minority classes with smaller number of instances and also causes important information on minority classes not to be obtained. Various methods have been applied to overcome the problem of the imbalance class. One of them is the Hybrid Approach Redefinition method which is one of the Hybrid Ensembles methods. The tendency to pay attention to the performance classifier, has led to an understanding of the importance of selecting an instance that will be used as a classifier. In the classic Hybrid Approach Redefinition method classifier selection is done randomly using the Random Under Sampling approach, and it is interesting to study how performance is obtained if the sampling process is based on Cluster-Based by selecting existing instances. The purpose of this study is to apply the Hybrid Approach Redefinition method with Cluster-Based Instance Selection (CBIS) approach so that it can obtain a better performance classifier. The results showed that Hybrid Approach Redefinition with cluster-based instance selection gave better results on the number of classifiers, data diversity, and performance classifiers compared to classic Hybrid Approach Redefinition
Similarity measure fuzzy soft set for phishing detection
Phishing is a serious web security problem, and the internet fraud technique involves mirroring genuine websites to trick online users into stealing their sensitive information and taking out their personal information, such as bank account information, usernames, credit card, and passwords. Early detection can prevent phishing behavior makes quick protection of personal information. Classification methods can be used to predict this phishing behavior. This paper presents an intelligent classification model for detecting Phishing by redefining a fuzzy soft set (FSS) theory for better computational performance. There are four types of similarity measures: (1) Comparison table, (2) Matching function, (3) Similarity measure, and (4) Distance measure. The experiment showed that the Similarity measure has better performance than the others in accuracy and recall, reached 95.45 % and 99.77 %, respectively. It concludes that FSS similarity measured is more precise than others, and FSS could be a promising approach to avoid phishing activities. This novel method can be implemented in social media software to warn the users as an early warning system. This model can be used for personal or commercial purposes on social media applications to protect sensitive data