1,721,069 research outputs found

    Stochastic training of neural networks via successive convex approximations

    No full text
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of nonconvex optimization, going under the general name of successive convex approximation techniques. The basic idea is to iteratively replace the original (nonconvex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Different from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the NN function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN’s weights. We experiment on several medium-sized benchmark problems and on a large-scale data set involving simulated physical data. The results show how the algorithm outperforms the state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power

    Continual self-supervised learning in Earth observation with embedding regularization

    No full text
    Continual Self-Supervised Learning (CSSL) is a promising approach for intelligent systems that address the challenge of learning in scenarios with limited data, mirroring real-world conditions. However, CSSL remains relatively unexplored, especially in the context of Earth Observation (EO). In this paper, we investigate the problem of CSSL in remote sensing (RS), focusing on leveraging satellite and aerial imagery to develop systems that can continuously adapt and learn with minimal human intervention in data preparation. Specifically, we tackle the task of semantic segmentation, which has diverse applications in RS. Building upon existing work in the domain, we propose a novel algorithm called Continual Barlow Twins with Embedding Regularizer (CBT-ER). To evaluate the effectiveness of our approach, we conduct experiments on three heterogeneous datasets (i.e. Potsdam, DFC2022, SEN12MS). To ensure robust experimentation, we vary the availability of data labels (10%, 100%) and compare our approach against different baselines, showing encouraging performance

    Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

    Full text link
    Bayesian Neural Networks (BNNs) are trained to optimize an entire distribution over their weights instead of a single set, having significant advantages in terms of, e.g., interpretability, multi-task learning, and calibration. Because of the intractability of the resulting optimization problem, most BNNs are either sampled through Monte Carlo methods, or trained by minimizing a suitable Evidence Lower BOund (ELBO) on a variational approximation. In this paper, we propose a variant of the latter, wherein we replace the Kullback-Leibler divergence in the ELBO term with a Maximum Mean Discrepancy (MMD) estimator, inspired by recent work in variational inference. After motivating our proposal based on the properties of the MMD term, we proceed to show a number of empirical advantages of the proposed formulation over the state-of-the-art. In particular, our BNNs achieve higher accuracy on multiple benchmarks, including several image classification tasks. In addition, they are more robust to the selection of a prior over the weights, and they are better calibrated. As a second contribution, we provide a new formulation for estimating the uncertainty on a given prediction, showing it performs in a more robust fashion against adversarial attacks and the injection of noise over their inputs, compared to more classical criteria such as the differential entropy

    Continual Barlow Twins: continual self-supervised learning for remote sensing semantic segmentation

    Full text link
    In the field of Earth Observation (EO), Continual Learning (CL) algorithms have been proposed to deal with large datasets by decomposing them into several subsets and processing them incrementally. The majority of these algorithms assume that data is (a) coming from a single source, and (b) fully labeled. Real-world EO datasets are instead characterized by a large heterogeneity (e.g., coming from aerial, satellite, or drone scenarios), and for the most part they are unlabeled, meaning they can be fully exploited only through the emerging Self-Supervised Learning (SSL) paradigm. For these reasons, in this paper we propose a new algorithm for merging SSL and CL for remote sensing applications, that we call Continual Barlow Twins (CBT). It combines the advantages of one of the simplest self-supervision techniques, i.e., Barlow Twins, with the Elastic Weight Consolidation method to avoid catastrophic forgetting. In addition, for the first time we evaluate SSL methods on a highly heterogeneous EO dataset, showing the effectiveness of these strategies on a novel combination of three almost non-overlapping domains datasets (airborne Potsdam dataset, satellite US3D dataset, and drone UAVid dataset), on a crucial downstream task in EO, i.e., semantic segmentation. Encouraging results show the superiority of SSL in this setting, and the effectiveness of creating an incremental effective pretrained feature extractor, based on ResNet50, without the need of relying on the complete availability of all the data, with a valuable saving of time and resources

    Pixle: a fast and effective black-box attack based on rearranging pixels

    Full text link
    Recent research has found that neural networks are vulnerable to several types of adversarial attacks, where the input samples are modified in such a way that the model produces a wrong prediction that misclassifies the adversarial sample. In this paper we focus on black-box adversarial attacks, that can be performed without knowing the inner structure of the attacked model, nor the training procedure, and we propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image. We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye

    A decentralized training algorithm for Echo State Networks in distributed big data applications

    No full text
    The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed-forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy.The current big data deluge requires innovative solutions for performing efficient inference on large, heterogeneous amounts of information. Apart from the known challenges deriving from high volume and velocity, real-world big data applications may impose additional technological constraints, including the need for a fully decentralized training architecture. While several alternatives exist for training feed-forward neural networks in such a distributed setting, less attention has been devoted to the case of decentralized training of recurrent neural networks (RNNs). In this paper, we propose such an algorithm for a class of RNNs known as Echo State Networks. The algorithm is based on the well-known Alternating Direction Method of Multipliers optimization procedure. It is formulated only in terms of local exchanges between neighboring agents, without reliance on a coordinating node. Additionally, it does not require the communication of training patterns, which is a crucial component in realistic big data implementations. Experimental results on large scale artificial datasets show that it compares favorably with a fully centralized implementation, in terms of speed, efficiency and generalization accuracy

    Semi-supervised echo state networks for audio classification

    No full text
    Echo state networks (ESNs), belonging to the wider family of reservoir computing methods, are a powerful tool for the analysis of dynamic data. In an ESN, the input signal is fed to a fixed (possibly large) pool of interconnected neurons, whose state is then read by an adaptable layer to provide the output. This last layer is generally trained via a regularized linear least-squares procedure. In this paper, we consider the more complex problem of training an ESN for classification problems in a semi-supervised setting, wherein only a part of the input sequences are effectively labeled with the desired response. To solve the problem, we combine the standard ESN with a semi-supervised support vector machine (S3VM) for training its adaptable connections. Additionally, we propose a novel algorithm for solving the resulting non-convex optimization problem, hinging on a series of successive approximations of the original problem. The resulting procedure is highly customizable and also admits a principled way of parallelizing training over multiple processors/computers. An extensive set of experimental evaluations on audio classification tasks supports the presented semi-supervised ESN as a practical tool for dynamic problems requiring the analysis of partially labeled data

    Fully decentralized semi-supervised learning via privacy-preserving matrix completion

    No full text
    Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal.Distributed learning refers to the problem of inferring a function when the training data are distributed among different nodes. While significant work has been done in the contexts of supervised and unsupervised learning, the intermediate case of Semi-supervised learning in the distributed setting has received less attention. In this paper, we propose an algorithm for this class of problems, by extending the framework of manifold regularization. The main component of the proposed algorithm consists of a fully distributed computation of the adjacency matrix of the training patterns. To this end, we propose a novel algorithm for low-rank distributed matrix completion, based on the framework of diffusion adaptation. Overall, the distributed Semi-supervised algorithm is efficient and scalable, and it can preserve privacy by the inclusion of flexible privacy-preserving mechanisms for similarity computation. The experimental results and comparison on a wide range of standard Semi-supervised benchmarks validate our proposal

    FairDrop: Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning

    Full text link
    Graph representation learning has become a ubiquitous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applications, ensuring the fairness of the node (or graph) representations with respect to some protected attributes is crucial for their correct deployment. Yet, fairness in graph deep learning remains under-explored, with few solutions available. In particular, the tendency of similar nodes to cluster on several real-world graphs (i.e., homophily) can dramatically worsen the fairness of these procedures. In this paper, we propose a novel biased edge dropout algorithm (FairDrop) to counter-act homophily and improve fairness in graph representation learning. FairDrop can be plugged in easily on many existing algorithms, is efficient, adaptable, and can be combined with other fairness-inducing solutions. After describing the general algorithm, we demonstrate its application on two benchmark tasks, specifically, as a random walk model for producing node embeddings, and to a graph convolutional network for link prediction. We prove that the proposed algorithm can successfully improve the fairness of all models up to a small or negligible drop in accuracy, and compares favourably with existing state-of-the-art solutions. In an ablation study, we demonstrate that our algorithm can flexibly interpolate between biasing towards fairness and an unbiased edge dropout. Furthermore, to better evaluate the gains, we propose a new dyadic group definition to measure the bias of a link prediction task when paired with group-based fairness metrics. In particular, we extend the metric used to measure the bias in the node embeddings to take into account the graph structure.Comment: Submitted to a journal for the peer-review proces

    Distributed learning for random vector functional-link networks

    No full text
    This paper aims to develop distributed learning algorithms for Random Vector Functional-Link (RVFL) networks, where training data is distributed under a decentralized information structure. Two algorithms are proposed by using Decentralized Average Consensus (DAC) and Alternating Direction Method of Multipliers (ADMM) strategies, respectively. These algorithms work in a fully distributed fashion and have no requirement on coordination from a central agent during the learning process. For distributed learning, the goal is to build a common learner model which optimizes the system performance over the whole set of local data. In this work, it is assumed that all stations know the initial weights of the input layer, the output weights of local RVFL networks can be shared through communication channels among neighboring nodes only, and local datasets are blocked strictly. The proposed learning algorithms are evaluated over five benchmark datasets. Experimental results with comparisons show that the DAC-based learning algorithm performs favorably in terms of effectiveness, efficiency and computational complexity, followed by the ADMM-based learning algorithm with promising accuracy but higher computational burden.This paper aims to develop distributed learning algorithms for Random Vector Functional-Link (RVFL) networks, where training data is distributed under a decentralized information structure. Two algorithms are proposed by using Decentralized Average Consensus (DAC) and Alternating Direction Method of Multipliers (ADMM) strategies, respectively. These algorithms work in a fully distributed fashion and have no requirement on coordination from a central agent during the learning process. For distributed learning, the goal is to build a common learner model which optimizes the system performance over the whole set of local data. In this work, it is assumed that all stations know the initial weights of the input layer, the output weights of local RVFL networks can be shared through communication channels among neighboring nodes only, and local datasets are blocked strictly. The proposed learning algorithms are evaluated over five benchmark datasets. Experimental results with comparisons show that the DAC-based learning algorithm performs favorably in terms of effectiveness, efficiency and computational complexity, followed by the ADMM-based learning algorithm with promising accuracy but higher computational burden
    corecore