Computer Science Journal (AGH University of Science and Technology, Krakow)
Not a member yet
476 research outputs found
Sort by
Named-entity recognition for hindi language using context pattern-based maximum entropy
This paper describes Named Entity Recognition (NER) system for Hindi language using two methodologies. An existing BaseLine Maximum Entropy-based Named Entity (BL-MENE) model and Context Pattern-based MENE (CP-MENE) framework the one proposed in this work. BL-MENE utilizes several features for the NER task but suffers from inaccurate Named Entity (NE) boundary detection, mis-classification errors, and partial recognition of NEs due to certain missing essentials. However, CP-MENE based NER task incorporates extensive features and patterns set to overcome these problems. In fact, the CP-MENE features include right-boundary, left-boundary, part-of-speech, synonyms, gazetteers and relative pronoun features. CP-MENE formulates a kind of recursive relationship to extract high ranked NE patterns that are generated through regular expressions via python@ code. Nowadays, since the Web contents in the Hindi language are rising, especially in the health-care applications, this work is conducted on the Hindi Health Data (HHD) corpus at Kaggle dataset. We conducted experiments on four NE categories- Person (PER), Disease (DIS), Consumable (CNS) and Symptom (SMP). Usually, researchers’ work upon PER NE within news articles while other NEs, especially related to the health-care domain such as DIS, CNS, and SMP NE types are left out which are incorporated in this research. CP-MENE improvised the classification performance of NEs and the F-measure achieved are 79.68% for PER, 72.50% for DIS, 68.78% for CNS, and 67.23% for SMP respectively which are comparable with respect to other NER approaches
Sparse data classifier based on the first-past-the-post voting system
Point of Interest (POI) is a general term for objects describing places from the real world. The concept of POIs matching, i.e. determining whether two sets of attributes represent the same location, is not a trivial challenge due to the large variety of data sources. The representation of POIs may vary depending on the base in which they are stored. Manual comparison of objects with each other is not achievable in real-time, therefore there are multiple solutions to automatic merging. However there is no efficient solution that includes the deficiencies in the existence of attributes, has been proposed so far. In this paper, we propose the Multilayered Hybrid Classifier which is composed of machine learning and deep learning techniques, supported by the first-past-the-post voting system. We examined different weights for constituencies which were taken into consideration during the majority (or supermajority) decision. As a result, we achieved slightly higher accuracy than the current best model - Random Forest, which in its working also base on voting
Gramian Angular Field Transformation-Based Intrusion Detection
Cyber threats are increasing progressively in their frequency, scale, sophistication, and cost. The advancement of such threats has raised the need to enhance intelligent intrusion-detection systems. In this study, a different perspective has been developed for intrusion detection. Gramian angular fields were adapted to encode network traffic data as images. Hereby, a way to reveal bilateral feature relationships and benefit from the visual interpretation capability of deep-learning methods has been opened. Then, image-encoded intrusions were classified as binary and multi-class using convolutional neural networks. The obtained results were compared to both conventional machine-learning methods and related studies. According to the results, the proposed approach surpassed the success of traditional methods and produced success rates that were close to the related studies. Despite the use of complex mechanisms such as feature extraction, feature selection, class balancing, virtual data generation, or ensemble classifiers in related studies, the proposed approach is fairly plain -- involving only data-image conversion and classification. This shows the power of simply changing the problem space
Automatic bridge between BPMN models and UML activity diagrams based on graph transformation
Model Driven Engineering (MDE) provides available tools, concepts and languages to create and transform models. One of the most important successes of MDE is model transformation; it permits transforming models used by one community to equivalent models used by another one. Moreover, each community of developers has its own tools for verification, testing and test case generation. Hence, a developer of one community who moves to work with another community needs a transformation process from the second community to (his/her) own community and vice versa. Therefore, the target community can benefit from the expertise of the source one and the developers do not begin from zero.In this context, we propose in this paper an automatic transformation to create a bridge between the BPMN and UML communities. We propose an approach and a visual tool for the automatic transformation of BPMN models to UML Activity Diagrams (UML-AD). The proposed approach is based on Meta-Modeling and Graph Transformation, and uses the AToM3 tool. Indeed, we were inspired by the OMG meta-models of BPMN and UML-AD and implemented versions of both meta-models using AToM3. This last allows generating automatically a visual modeling tool for each proposed meta-model. Based on these two meta-models, we propose a graph grammar composed of sixty rules that perform the transformation process. The proposed approach is illustrated through three case studies
A Reversible Data Hiding Scheme through Encryption using Rotated Stream Cipher
The research in the domain of reversible data hiding got much attention in recent years due to its wide applications in medical image transmission and cloud computing. Reversible data hiding during image encryption is a recently emerged framework for hiding secret data into an image during the image encryption process. In this manuscript, we propose a new reversible data hiding through encryption scheme which will ensure a high embedding rate without bringing any additional overhead of key handling. The proposed algorithm can use any secure symmetric encryption scheme, and the encryption and/or decryption key should be shared with the receiver for data extraction and image recovery. As per the proposed scheme, the data hider can hide three-bits of secret message in an image block of size pixels. The data extraction image recovery will be carried out by analyzing the closeness between adjacent pixels. The simulation of the new scheme on the USC-SIPI dataset shows that the proposed scheme outperforms the well-known existing schemes in embedding rate and bit error rate
Formal verification of the extension of iStar to support Big data projects
Identifying all the right requirements is indispensable for the success of anysystem. These requirements need to be engineered with precision in the earlyphases. Principally, late corrections costs are estimated to be more than 200times as much as corrections during requirements engineering (RE). EspeciallyBig data area, it becomes more and more crucial due to its importance andcharacteristics. In fact, and after literature analyzing, we note that currentsRE methods do not support the elicitation of Big data projects requirements. Inthis study, we propose the BiStar novel method as extension of iStar to under-take some Big data characteristics such as (volume, variety ...etc). As a firststep, we identify some missing concepts that currents requirements engineeringmethods do not support. Next, BiStar, an extension of iStar is developed totake into account Big data specifics characteristics while dealing with require-ments. In order to ensure the integrity property of BiStar, formal proofs weremade, we perform a bigraph based description on iStar and BiStar. Finally, anapplication is conducted on iStar and BiStar for the same illustrative scenario.The BiStar shows important results to be more suitable for eliciting Big dataprojects requirements
A Character Frequency based Approach to Search for Substrings of a Circular Pattern and its Conjugates in an Online Text
A fundamental problem in computational biology is to deal with circular patterns. The problem consists of finding the least certain length substrings of a pattern and its rotations in the database. In this paper, a novel method is presented to deal with circular patterns. The problem is solved using two incremental steps. First, an algorithm is provided that reports all substrings of a given linear pattern in an online text. Next, without losing efficiency, the algorithm is extended to process all circular rotations of the pattern. For a given pattern P of size M, and a text T of size N, the algorithm reports all locations in the text where a substring of Pc is found, where Pc is one of the rotations of P. For an alphabet size σ, using O(M) space, desired goals are achieved in an average O(MN/σ) time, which is O(N) for all patterns of length M ≤ σ. Traditional string processing algorithms make use of advanced data structures such as suffix trees and automaton. We show that basic data structures such as arrays can be used in the text processing algorithms without compromising the efficiency
A Novel Framework for Aspect Knowledgebase Generated Automatically from Social Media Using Pattern Rules
One of the factors improving businesses in business intelligence is summarization systems which could generate summaries based on sentiment from social media. However, these systems could not produce automatically, they used annotated datasets. To automatically produce sentiment summaries without using the annotated datasets, we propose a novel framework using pattern rules. The framework has two procedures: 1) pre-processing and 2) aspect knowledgebase generation. The first procedure is to check and correct misspelt words (bigram and unigram) by a proposed method, and tag part-of-speech all words. The second procedure is to automatically generate aspect knowledgebase used to produce sentiment summaries by the sentiment summarization systems. Pattern rules and semantic similarity-based pruning are used to automatically generate aspect knowledgebase from social media. In the experiments, eight domains from benchmark datasets of reviews are used. The performance evaluation of our proposed approach shows the high performance when compared to other approaches
A Meta-Heuristic Approach Based on the Genetic and Greedy Algorithms to Solving Flexible Job-Shop Scheduling Problem
In today’s competitive business world, manufacturers need to accommodate customer demands with appropriate scheduling. This requires efficient manufacturing chain scheduling. One of the most important problems that has always been considered in the manufacturing and job-shop industries is offering various products according to the needs of customers in different periods of time, within the shortest possible time and with rock-bottom cost. Job-Shop Scheduling systems are one of the applications of group technology in industry, the purpose of which is to take advantage of the physical or operational similarities of products in various aspects of construction and design. In addition, these systems are identified as Cellular Manufacturing Systems (CMS). Today, applying CMS and the use of its benefits have been very important as a possible way to increase the speed of the organization’s response to rapid market changes. In this paper, a meta-heuristic method based on combining genetic and greedy algorithms has been used in order to optimize and evaluate the performance criteria of flexible job-shop scheduling problem. In order to improve the efficiency of the genetic algorithm, the initial population is generated in a greedy algorithm and several elitist operators are used to improve the solutions. The greedy algorithm which is used to improve the generation of the initial population prioritizes the cells and the job in each cell, and thus offers quality solutions. The proposed algorithm is tested over P-FJSP dataset and compared with the state-of-the-art techniques of this literature. To evaluate the performance of the diversity, spacing, quality and run-time criteria were used in a multi-objective function. The results of simulation indicate better performance of the proposed method compared to NRGA and NSGA-II methods
Classification of traffic over collaborative IoT and Cloud platforms using deep learning recurrent LSTM
Internet of Things (IoT) and cloud based collaborative platforms are emerging as new infrastructures during recent decades. The classification of network traffic in terms of benign and malevolent traffic is indispensable for IoT-cloud based collaborative platforms to utilize the channel capacity optimally for transmitting the benign traffic and to block the malicious traffic. The traffic classification mechanism should be dynamic and capable enough to classify the network traffic in a quick manner, so that the malevolent traffic can be identified in earlier stages and benign traffic can be channelized to the destined nodes speedily. In this paper, we are presenting deep learning recurrent LSTM based technique to classify the traffic over IoT-cloud platforms. Machine learning techniques (MLTs) have also been employed for comparison of the performance of these techniques with the proposed LSTM RNet classification method. In the proposed research work, network traffic is classified into three classes namely Tor-Normal, NonTor-Normal and NonTor-Malicious traffic. The research outcome shows that the proposed LSTM RNet classify the traffic accurately and also helps in reducing the network latency and in enhancing the data transmission rate as well as network throughput