1,721,069 research outputs found
Stochastic training of neural networks via successive convex approximations
This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent
developments in the field of nonconvex optimization, going under the general name of successive convex approximation techniques. The basic idea is to iteratively replace the original (nonconvex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Different from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the NN function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN’s weights. We experiment on several medium-sized benchmark problems and on a large-scale data set involving simulated physical data. The results show how the algorithm outperforms the state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power
Continual self-supervised learning in Earth observation with embedding regularization
Continual Self-Supervised Learning (CSSL) is a promising approach for intelligent systems that address the challenge of learning in scenarios with limited data, mirroring real-world conditions. However, CSSL remains relatively unexplored, especially in the context of Earth Observation (EO). In this paper, we investigate the problem of CSSL in remote sensing (RS), focusing on leveraging satellite and aerial imagery to develop systems that can continuously adapt and learn with minimal human intervention in data preparation. Specifically, we tackle the task of semantic segmentation, which has diverse applications in RS. Building upon existing work in the domain, we propose a novel algorithm called Continual Barlow Twins with Embedding Regularizer (CBT-ER). To evaluate the effectiveness of our approach, we conduct experiments on three heterogeneous datasets (i.e. Potsdam, DFC2022, SEN12MS). To ensure robust experimentation, we vary the availability of data labels (10%, 100%) and compare our approach against different baselines, showing encouraging performance
Bayesian Neural Networks With Maximum Mean Discrepancy Regularization
Bayesian Neural Networks (BNNs) are trained to optimize an entire
distribution over their weights instead of a single set, having significant
advantages in terms of, e.g., interpretability, multi-task learning, and
calibration. Because of the intractability of the resulting optimization
problem, most BNNs are either sampled through Monte Carlo methods, or trained
by minimizing a suitable Evidence Lower BOund (ELBO) on a variational
approximation. In this paper, we propose a variant of the latter, wherein we
replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean
Discrepancy (MMD) estimator, inspired by recent work in variational inference.
After motivating our proposal based on the properties of the MMD term, we
proceed to show a number of empirical advantages of the proposed formulation
over the state-of-the-art. In particular, our BNNs achieve higher accuracy on
multiple benchmarks, including several image classification tasks. In addition,
they are more robust to the selection of a prior over the weights, and they are
better calibrated. As a second contribution, we provide a new formulation for
estimating the uncertainty on a given prediction, showing it performs in a more
robust fashion against adversarial attacks and the injection of noise over
their inputs, compared to more classical criteria such as the differential
entropy
Continual Barlow Twins: continual self-supervised learning for remote sensing semantic segmentation
In the field of Earth Observation (EO), Continual Learning (CL) algorithms
have been proposed to deal with large datasets by decomposing them into several
subsets and processing them incrementally. The majority of these algorithms
assume that data is (a) coming from a single source, and (b) fully labeled.
Real-world EO datasets are instead characterized by a large heterogeneity
(e.g., coming from aerial, satellite, or drone scenarios), and for the most
part they are unlabeled, meaning they can be fully exploited only through the
emerging Self-Supervised Learning (SSL) paradigm. For these reasons, in this
paper we propose a new algorithm for merging SSL and CL for remote sensing
applications, that we call Continual Barlow Twins (CBT). It combines the
advantages of one of the simplest self-supervision techniques, i.e., Barlow
Twins, with the Elastic Weight Consolidation method to avoid catastrophic
forgetting. In addition, for the first time we evaluate SSL methods on a highly
heterogeneous EO dataset, showing the effectiveness of these strategies on a
novel combination of three almost non-overlapping domains datasets (airborne
Potsdam dataset, satellite US3D dataset, and drone UAVid dataset), on a crucial
downstream task in EO, i.e., semantic segmentation. Encouraging results show
the superiority of SSL in this setting, and the effectiveness of creating an
incremental effective pretrained feature extractor, based on ResNet50, without
the need of relying on the complete availability of all the data, with a
valuable saving of time and resources
Pixle: a fast and effective black-box attack based on rearranging pixels
Recent research has found that neural networks are vulnerable to several
types of adversarial attacks, where the input samples are modified in such a
way that the model produces a wrong prediction that misclassifies the
adversarial sample. In this paper we focus on black-box adversarial attacks,
that can be performed without knowing the inner structure of the attacked
model, nor the training procedure, and we propose a novel attack that is
capable of correctly attacking a high percentage of samples by rearranging a
small number of pixels within the attacked image. We demonstrate that our
attack works on a large number of datasets and models, that it requires a small
number of iterations, and that the distance between the original sample and the
adversarial one is negligible to the human eye
A decentralized training algorithm for Echo State Networks in distributed big data applications
The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed-forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy.The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed-forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy
Semi-supervised echo state networks for audio classification
Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a powerful tool for the analysis of dynamic data. In an ESN, the input signal is fed to a fixed (possibly large) pool of interconnected neurons, whose state is then read by an adaptable layer to provide the output. This last layer is generally trained via a regularized linear least-squares procedure. In this paper, we consider the more complex problem of training an ESN for classification problems in a semi-supervised setting, wherein only a part of the input sequences are effectively labeled with the desired response. To solve the problem, we combine the standard ESN with a semi-supervised support vector machine (S3VM) for training its adaptable connections. Additionally, we propose a novel algorithm for solving the resulting non-convex optimization problem, hinging on a series of successive approximations of the original problem. The resulting procedure is highly customizable and also admits a principled way of parallelizing training over multiple processors/computers. An extensive set of experimental evaluations on audio classification tasks supports the presented semi-supervised ESN as a practical tool for dynamic problems requiring the analysis of partially labeled data
Fully decentralized semi-supervised learning via privacy-preserving matrix completion
Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal
FairDrop: Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning
Graph representation learning has become a ubiquitous component in many
scenarios, ranging from social network analysis to energy forecasting in smart
grids. In several applications, ensuring the fairness of the node (or graph)
representations with respect to some protected attributes is crucial for their
correct deployment. Yet, fairness in graph deep learning remains
under-explored, with few solutions available. In particular, the tendency of
similar nodes to cluster on several real-world graphs (i.e., homophily) can
dramatically worsen the fairness of these procedures. In this paper, we propose
a novel biased edge dropout algorithm (FairDrop) to counter-act homophily and
improve fairness in graph representation learning. FairDrop can be plugged in
easily on many existing algorithms, is efficient, adaptable, and can be
combined with other fairness-inducing solutions. After describing the general
algorithm, we demonstrate its application on two benchmark tasks, specifically,
as a random walk model for producing node embeddings, and to a graph
convolutional network for link prediction. We prove that the proposed algorithm
can successfully improve the fairness of all models up to a small or negligible
drop in accuracy, and compares favourably with existing state-of-the-art
solutions. In an ablation study, we demonstrate that our algorithm can flexibly
interpolate between biasing towards fairness and an unbiased edge dropout.
Furthermore, to better evaluate the gains, we propose a new dyadic group
definition to measure the bias of a link prediction task when paired with
group-based fairness metrics. In particular, we extend the metric used to
measure the bias in the node embeddings to take into account the graph
structure.Comment: Submitted to a journal for the peer-review proces
Distributed learning for random vector functional-link networks
This paper aims to develop distributed learning algorithms for Random Vector Functional-Link (RVFL) networks, where training data is distributed under a decentralized information structure. Two algorithms are proposed by using Decentralized Average Consensus (DAC) and Alternating Direction Method of Multipliers (ADMM) strategies, respectively. These algorithms work in a fully distributed fashion and have no requirement on coordination from a central agent during the learning process. For distributed learning, the goal is to build a common learner model which optimizes the system performance over the whole set of local data. In this work, it is assumed that all stations know the initial weights of the input layer, the output weights of local RVFL networks can be shared through communication channels among neighboring nodes only, and local datasets are blocked strictly. The proposed learning algorithms are evaluated over five benchmark datasets. Experimental results with comparisons show that the DAC-based learning algorithm performs favorably in terms of effectiveness, efficiency and computational complexity, followed by the ADMM-based learning algorithm with promising accuracy but higher computational burden.This paper aims to develop distributed learning algorithms for Random Vector Functional-Link (RVFL) networks, where training data is distributed under a decentralized information structure. Two algorithms are proposed by using Decentralized Average Consensus (DAC) and Alternating Direction Method of Multipliers (ADMM) strategies, respectively. These algorithms work in a fully distributed fashion and have no requirement on coordination from a central agent during the learning process. For distributed learning, the goal is to build a common learner model which optimizes the system performance over the whole set of local data. In this work, it is assumed that all stations know the initial weights of the input layer, the output weights of local RVFL networks can be shared through communication channels among neighboring nodes only, and local datasets are blocked strictly. The proposed learning algorithms are evaluated over five benchmark datasets. Experimental results with comparisons show that the DAC-based learning algorithm performs favorably in terms of effectiveness, efficiency and computational complexity, followed by the ADMM-based learning algorithm with promising accuracy but higher computational burden
- …
