1,721,120 research outputs found

    Data-Centric AI

    No full text
    The evolution of Artificial Intelligence (AI) has been driven by two core components: data and algorithms. Historically, AI research has predominantly followed the Model-Centric paradigm, which focuses on developing and refining models, while often treating data as static. This approach has led to the creation of increasingly sophisticated algorithms, which demand vast amounts of manually labeled and meticulously curated data. However, as data becomes central to AI development, it is also emerging as a significant bottleneck. The Data-Centric AI (DCAI) paradigm shifts the focus towards improving data quality, enabling the achievement of accuracy levels that are unattainable with Model-Centric approaches alone. This special issue presents recent advancements in DCAI, offering insights into the paradigm and exploring future research directions, aiming to contextualize the contributions included in this issue

    Mining official data

    No full text
    In statistics, the term "official data" denotes data collected in censuses and statistical surveys by National Statistics Institutes (NSIs), as well as administrative and registration records collected by government departments and local authorities. They are used to produce "official statistics" for the purpose of making policy decisions, and to facilitate the appreciation of economic, social, demographic, and other matters of interest to governments, government departments, local authorities, businesses and to the general public. For instance, population and economic census information is of great value in planning public services (education, fund allocation, public transport), as well as in private businesses (placing new factories, shopping malls, or banks, as well as marketing particular products). Moreover, survey data on specific topics, such as labour force, time use, household budget, are regularly collected by NSIs to keep updated information on some economic and social phenomena. The application of data mining techniques to official data has great potential in supporting good public policy and in underpinning the effective functioning of a democratic society. Nevertheless, it is not straightforward and requires challenging methodological research, which is still in the initial stages. This special issue includes six papers which constitute updated and extended versions of papers selected from those presented at the Workshop on Mining Official Data, chaired by the guest editors of this issue in Helsinki in August 2002. The workshop was organized under the auspices of the European project KDNet (The Knowledge Discovery Network of Excellence) and within the framework of the 13th European Conference on Machine Learning (ECML'02) and the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02)

    Nearest cluster-based intrusion detection through convolutional neural networks

    Full text link
    The recent boom in deep learning has revealed that the application of deep neural networks is a valuable way to address network intrusion detection problems. This paper presents a novel deep learning methodology that uses convolutional neural networks (CNNs) to equip a computer network with an effective means to analyse traffic on the network for signs of malicious activity. The basic idea is to represent network flows as 2D images and use this imagery representation of the flows to train a 2D CNN architecture. The novelty consists in deriving an imagery representation of the network flows through performing a combination of the nearest neighbour search and the clustering process. The advantage is that the proposed data mapping method allows us to build imagery data that express potential data patterns arising at neighbouring flows. The proposed methodology leads to better predictive accuracy when compared to competitive intrusion detection architectures on three benchmark datasets

    Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

    No full text
    Recognizing malware before its installation plays a crucial role in keeping an android device safe. In this paper we describe a supervised method that is able to analyse multiple information (e.g. permissions, api calls and network addresses) that can be retrieved through a broad static analysis of android applications. In particular, we propose a novel multi-view machine learning approach to malware detection, which couples knowledge extracted via both clustering and classification. In an assessment, we evaluate the effectiveness of the proposed method using benchmark Android applications and established machine learning metrics

    SILVIA: An eXplainable Framework to Map Bark Beetle Infestation in Sentinel-2 Images

    Full text link
    Recent long spells of high temperatures and drought-hit summers have fostered the conditions for an unprecedented outbreak of bark beetles in Europe. This phenomenon has ruined vast swathes of European conifer forests creating a need among forest managers to find effective methods to gather information about the mapping of bark beetle infestation hotspots. Sentinel-2 data have been recently established as an alternative to field surveys for certain inventory tasks. Hence, this study explores the achievements of machine learning to perform the inventory mapping of bark beetle infestation hotspots in Sentinel-2 images. In particular, we investigate the accuracy performance of a spectral classifier that is learned for the study task by leveraging spectral vegetation indices and performing self-training. We use a dataset of Sentinel-2 images acquired in nonoverlapping forest scenes from the North-east of France acquired in October 2018. The selected scenes host bark beetle infestation hotspots of different sizes, which originate from the mass reproduction of the bark beetle in the 2018 infestation. We perform a learning stage by accounting for the ground-truth bark beetle infestation masks of a subset of images in the study imagery dataset (training imagery set). The goal is to produce a prediction of the bark beetle infestation masks for the remaining images in the study imagery dataset (working imagery set). Moreover, we use an explainable artificial intelligence technique to study the relevance of spectral information and explain the effect of both self-training and spectral vegetation indices on the mapping decisions

    Autoencoder-based deep metric learning for network intrusion detection

    Full text link
    Nowadays intrusion detection systems are a mandatory weapon in the war against the ever-increasing amount of network cyber attacks. In this study we illustrate a new intrusion detection method that analyses the flow-based characteristics of the network traffic data. It learns an intrusion detection model by leveraging a deep metric learning methodology that originally combines autoencoders and Triplet networks. In the training stage, two separate autoencoders are trained on historical normal network flows and attacks, respectively. Then a Triplet network is trained to learn the embedding of the feature vector representation of network flows. This embedding moves each flow close to its reconstruction, restored with the autoencoder associated with the same class as the flow, and away from its reconstruction, restored with the autoencoder of the opposite class. The predictive stage assigns each new flow to the class associated with the autoencoder that restores the closest reconstruction of the flow in the embedding space. In this way, the predictive stage takes advantage of the embedding learned in the training stage, achieving a good prediction performance in the detection of new signs of malicious activities in the network traffic. In fact, the proposed methodology leads to better predictive accuracy when compared to competitive intrusion detection architectures on benchmark datasets
    corecore