Search CORE

1,721,120 research outputs found

Data-Centric AI

Author: Malerba D.
Pasquadibisceglie V.
Publication venue
Publication date: 01/01/2024
Field of study

The evolution of Artificial Intelligence (AI) has been driven by two core components: data and algorithms. Historically, AI research has predominantly followed the Model-Centric paradigm, which focuses on developing and refining models, while often treating data as static. This approach has led to the creation of increasingly sophisticated algorithms, which demand vast amounts of manually labeled and meticulously curated data. However, as data becomes central to AI development, it is also emerging as a significant bottleneck. The Data-Centric AI (DCAI) paradigm shifts the focus towards improving data quality, enabling the achievement of accuracy levels that are unattainable with Model-Centric approaches alone. This special issue presents recent advancements in DCAI, offering insights into the paradigm and exploring future research directions, aiming to contextualize the contributions included in this issue

Archivio istituzionale della ricerca - Università di Bari

Learning to order basic components of structured complex objects

Author: MALERBA D
CECI MICHELANGELO
Publication venue
Publication date: 01/01/2007
Field of study

Archivio istituzionale della ricerca - Università di Bari

Mining official data

Author: Malerba D.
Brito P.
Publication venue
Publication date: 01/01/2003
Field of study

In statistics, the term "official data" denotes data collected in censuses and statistical surveys by National Statistics Institutes (NSIs), as well as administrative and registration records collected by government departments and local authorities. They are used to produce "official statistics" for the purpose of making policy decisions, and to facilitate the appreciation of economic, social, demographic, and other matters of interest to governments, government departments, local authorities, businesses and to the general public. For instance, population and economic census information is of great value in planning public services (education, fund allocation, public transport), as well as in private businesses (placing new factories, shopping malls, or banks, as well as marketing particular products). Moreover, survey data on specific topics, such as labour force, time use, household budget, are regularly collected by NSIs to keep updated information on some economic and social phenomena. The application of data mining techniques to official data has great potential in supporting good public policy and in underpinning the effective functioning of a democratic society. Nevertheless, it is not straightforward and requires challenging methodological research, which is still in the initial stages. This special issue includes six papers which constitute updated and extended versions of papers selected from those presented at the Workshop on Mining Official Data, chaired by the guest editors of this issue in Helsinki in August 2002. The workshop was organized under the auspices of the European project KDNet (The Knowledge Discovery Network of Excellence) and within the framework of the 13th European Conference on Machine Learning (ECML'02) and the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'02)

Archivio istituzionale della ricerca - Università di Bari

Flexible Matching of Boolean Symbolic Objects

Author: LISI Francesca Alessandra
ESPOSITO Floriana
MALERBA D.
Publication venue
Publication date: 01/01/1998
Field of study

Archivio istituzionale della ricerca - Università di Bari

Nearest cluster-based intrusion detection through convolutional neural networks

Author: Malerba D.
Andresini G.
Appice A.
Publication venue
Publication date: 01/01/2021
Field of study

The recent boom in deep learning has revealed that the application of deep neural networks is a valuable way to address network intrusion detection problems. This paper presents a novel deep learning methodology that uses convolutional neural networks (CNNs) to equip a computer network with an effective means to analyse traffic on the network for signs of malicious activity. The basic idea is to represent network flows as 2D images and use this imagery representation of the flows to train a 2D CNN architecture. The novelty consists in deriving an imagery representation of the network flows through performing a combination of the nearest neighbour search and the clustering process. The advantage is that the proposed data mapping method allows us to build imagery data that express potential data patterns arising at neighbouring flows. The proposed methodology leads to better predictive accuracy when compared to competitive intrusion detection architectures on three benchmark datasets

Archivio istituzionale della ricerca - Università di Bari

Machine Learning for Topographic Map Interpretation

Author: LANZA Antonietta
ESPOSITO Floriana
MALERBA D.
Publication venue
Publication date: 01/01/2000
Field of study

Archivio istituzionale della ricerca - Università di Bari

Clustering-Aided Multi-View Classification: A Case Study on Android Malware Detection

Author: Malerba D.
Andresini G.
Appice A.
Publication venue
Publication date: 01/01/2020
Field of study

Recognizing malware before its installation plays a crucial role in keeping an android device safe. In this paper we describe a supervised method that is able to analyse multiple information (e.g. permissions, api calls and network addresses) that can be retrieved through a broad static analysis of android applications. In particular, we propose a novel multi-view machine learning approach to malware detection, which couples knowledge extracted via both clustering and classification. In an assessment, we evaluate the effectiveness of the proposed method using benchmark Android applications and established machine learning metrics

Archivio istituzionale della ricerca - Università di Bari

SILVIA: An eXplainable Framework to Map Bark Beetle Infestation in Sentinel-2 Images

Author: Malerba D.
Andresini G.
Appice A.
Publication venue
Publication date: 01/01/2023
Field of study

Recent long spells of high temperatures and drought-hit summers have fostered the conditions for an unprecedented outbreak of bark beetles in Europe. This phenomenon has ruined vast swathes of European conifer forests creating a need among forest managers to find effective methods to gather information about the mapping of bark beetle infestation hotspots. Sentinel-2 data have been recently established as an alternative to field surveys for certain inventory tasks. Hence, this study explores the achievements of machine learning to perform the inventory mapping of bark beetle infestation hotspots in Sentinel-2 images. In particular, we investigate the accuracy performance of a spectral classifier that is learned for the study task by leveraging spectral vegetation indices and performing self-training. We use a dataset of Sentinel-2 images acquired in nonoverlapping forest scenes from the North-east of France acquired in October 2018. The selected scenes host bark beetle infestation hotspots of different sizes, which originate from the mass reproduction of the bark beetle in the 2018 infestation. We perform a learning stage by accounting for the ground-truth bark beetle infestation masks of a subset of images in the study imagery dataset (training imagery set). The goal is to produce a prediction of the bark beetle infestation masks for the remaining images in the study imagery dataset (working imagery set). Moreover, we use an explainable artificial intelligence technique to study the relevance of spectral information and explain the effect of both self-training and spectral vegetation indices on the mapping decisions

Archivio istituzionale della ricerca - Università di Bari

ECML/PKDD 2007 Workshop on “Multi Relational Data Mining”

Author: Ceci Michelangelo
Appice Annalisa
Malerba D
Publication venue
Publication date: 01/01/2007
Field of study

Archivio istituzionale della ricerca - Università di Bari

Autoencoder-based deep metric learning for network intrusion detection

Author: Malerba D.
Andresini G.
Appice A.
Publication venue
Publication date: 01/01/2021
Field of study

Nowadays intrusion detection systems are a mandatory weapon in the war against the ever-increasing amount of network cyber attacks. In this study we illustrate a new intrusion detection method that analyses the flow-based characteristics of the network traffic data. It learns an intrusion detection model by leveraging a deep metric learning methodology that originally combines autoencoders and Triplet networks. In the training stage, two separate autoencoders are trained on historical normal network flows and attacks, respectively. Then a Triplet network is trained to learn the embedding of the feature vector representation of network flows. This embedding moves each flow close to its reconstruction, restored with the autoencoder associated with the same class as the flow, and away from its reconstruction, restored with the autoencoder of the opposite class. The predictive stage assigns each new flow to the class associated with the autoencoder that restores the closest reconstruction of the flow in the embedding space. In this way, the predictive stage takes advantage of the embedding learned in the training stage, achieving a good prediction performance in the detection of new signs of malicious activities in the network traffic. In fact, the proposed methodology leads to better predictive accuracy when compared to competitive intrusion detection architectures on benchmark datasets

Archivio istituzionale della ricerca - Università di Bari