1,721,019 research outputs found
EGG-GAE: scalable graph neural networks for tabular data imputation
Missing data imputation (MDI) is crucial when dealing with tabular datasets across various domains. Autoencoders can be trained to reconstruct missing values, and graph autoencoders (GAE) can additionally consider similar patterns in the dataset when imputing new values for a given instance. However, previously proposed GAEs suffer from scalability issues, requiring the user to define a similarity metric among patterns to build the graph connectivity beforehand. In this paper, we leverage recent progress in latent graph learning to propose a novel EdGe Generation Graph AutoEncoder (EGG-GAE) for missing data imputation that overcomes these two drawbacks. EGG-GAE works on randomly sampled mini-batches of the input data (hence scaling to larger datasets), and it automatically infers the best connectivity across the mini-batch for each architecture layer. We also experiment with several extensions, including an ensemble strategy for inference and the inclusion of what we call prototype nodes, obtaining significant improvements, both in terms of imputation error and final downstream accuracy, across multiple benchmarks and baselines
Distributed stochastic nonconvex optimization and learning based on successive convex approximation
We study distributed stochastic nonconvex optimization in multi-agent networks. We introduce a novel algorithmic framework for the distributed minimization of the sum of the expected value of a smooth (possibly nonconvex) function-the agents' sum-utility-plus a convex (possibly nonsmooth) regularizer. The proposed method hinges on successive convex approximation (SCA) techniques, leveraging dynamic consensus as a mechanism to track the average gradient among the agents, and recursive averaging to recover the expected gradient of the sumutility function. Almost sure convergence to (stationary) solutions of the nonconvex problem is established. Finally, the method is applied to distributed stochastic training of neural networks. Numerical results confirm the theoretical claims, and illustrate the advantages of the proposed method with respect to other methods available in the literature
Multi-site Forecasting of Energy Time Series with Spatio-Temporal Graph Neural Networks
Climate change has prompted the energy sector to shift its focus to renewable energy sources, which are environmentally friendly but less in terms of cost, complexity, and plants' management. It becomes critical to have a reliable method for estimating the output power of these systems, which are dispersed across the country and vary in kind and technology, and whose output power is mostly determined by meteorological factors. In this paper, we exploit the capability of modeling dynamic graph-like data of a specific type of graph neural network, spatio-temporal graph neural network, which can process spatial information about plants' distribution in a particular region as well as temporal data on individual plant power production. Plants in the same region can share information and make more accurate forecasts in this way. The suggested model was evaluated on two types of datasets: one with data gathered from real photovoltaic systems and the other with synthesized power time series reconstructed from data acquired by satellite detection. Our studies discovered how these systems can estimate the production outputs of photovoltaic stations simultaneously and with higher accuracy with respect to previous state-of-the-art models, performing effectively even in the absence of meteorological data
A probabilistic re-intepretation of confidence scores in multi-exit models
In this paper, we propose a new approach to train a deep neural network with multiple intermediate auxiliary classifiers, branching from it. These ‘multi-exits’ models can be used to reduce the inference time by performing early exit on the intermediate branches, if the confidence of the prediction is higher than a threshold. They rely on the assumption that not all the samples require the same amount of processing to yield a good prediction. In this paper, we propose a way to train jointly all the branches of a multi-exit model without hyper-parameters, by weighting the predictions from each branch with a trained confidence score. Each confidence score is an approximation of the real one produced by the branch, and it is calculated and regularized while training the rest of the model. We evaluate our proposal on a set of image classification benchmarks, using different neural models and early-exit stopping criteria
Structured ensembles. An approach to reduce the memory footprint of ensemble methods
In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory
Drop edges and adapt. A fairness enforcing fine-tuning for graph neural networks
The rise of graph representation learning as the primary solution for many different network science tasks led to a surge of interest in the fairness of this family of methods. Link prediction, in particular, has a substantial social impact. However, link prediction algorithms tend to increase the segregation in social networks by disfavouring the links between individuals in specific demographic groups. This paper proposes a novel way to enforce fairness on graph neural networks with a fine-tuning strategy. We Drop the unfair Edges and, simultaneously, we Adapt the model's parameters to those modifications, DEA in short. We introduce two covariance-based constraints designed explicitly for the link prediction task. We use these constraints to guide the optimization process responsible for learning the new 'fair' adjacency matrix. One novelty of DEA is that we can use a discrete yet learnable adjacency matrix in our fine-tuning. We demonstrate the effectiveness of our approach on five real-world datasets and show that we can improve both the accuracy and the fairness of the link prediction tasks. In addition, we present an in-depth ablation study demonstrating that our training algorithm for the adjacency matrix can be used to improve link prediction performances during training. Finally, we compute the relevance of each component of our framework to show that the combination of both the constraints and the training of the adjacency matrix leads to optimal performances
MARE: Self-supervised multi-attention REsu-net for semantic segmentation in remote sensing
Scene understanding of satellite and aerial images is a pivotal task in various remote sensing (RS) practices, such as land cover and urban development monitoring. In recent years, neural networks have become a de-facto standard in many of these applications. However, semantic segmentation still remains a challenging task. With respect to other computer vision (CV) areas, in RS large labeled datasets are not very often available, due to their large cost and to the required manpower. On the other hand, self-supervised learning (SSL) is earning more and more interest in CV, reaching state-of-the-art in several tasks. In spite of this, most SSL models, pretrained on huge datasets like ImageNet, do not perform particularly well on RS data. For this reason, we propose a combination of a SSL algorithm (particularly, Online Bag of Words) and a semantic segmentation algorithm, shaped for aerial images (namely, Multistage Attention ResU-Net), to show new encouraging results (i.e., 81.76% mIoU with ResNet-18 backbone) on the ISPRS Vaihingen dataset
Explainable spatio-temporal Graph Neural Networks for multi-site photovoltaic energy production
In recent years, there has been a growing demand for renewable energy sources, which are inherently associated with a decentralized distribution and dependent on weather conditions. Their management and associated forecasting of produced energy are tasks of increasing complexity. Spatio-Temporal Graph Neural Networks have been applied in this context with excellent results, taking advantage of the correct integration of both topological data, defined by the distribution of the plants in the territory, and temporal data of the time series. A drawback of graph neural networks is the recurrent mechanism adopted to process the temporal part, which increases greatly the computational load of these models. Moreover, these models are formulated for real and sensitive contexts where, in addition to being accurate, the predictions must also be understandable by the human operator. For these reasons, in this paper we propose a novel explainable energy forecasting framework based on Spatio-Temporal Graph Neural Networks: the forecasting model generates predictions by processing temporal and spatial information using a spectral graph convolution and a 1D convolutional neural network respectively, then we apply a state-of-the-art explainer to them in order to produce explanations about the generation process. Our proposed method obtains predictions having better performance than previous approaches, both in terms of computational efficiency and prediction accuracy, with the possibility of interpreting them in order to understand the generation process. The novel approach based on fusion of forecasting and explainability in a single framework enables the creation of powerful and reliable systems suitable for real-world issues and challenges
A calibrated multiexit neural network for detecting urothelial cancer cells
Deep convolutional networks have become a powerful tool for medical imaging diagnostic. In pathology, most efforts have been focused in the subfield of histology, while cytopathology (which studies diagnostic tools at the cellular level) remains underexplored. In this paper, we propose a novel deep learning model for cancer detection from urinary cytopathology screening images. We leverage recent ideas from the field of multioutput neural networks to provide a model that can efficiently train even on small-scale datasets, such as those typically found in real-world scenarios. Additionally, we argue that calibration (i.e., providing confidence levels that are aligned with the ground truth probability of an event) has been a major shortcoming of prior works, and we experiment a number of techniques to provide a well-calibrated model. We evaluate the proposed algorithm on a novel dataset, and we show that the combination of focal loss, multiple outputs, and temperature scaling provides a model that is significantly more accurate and calibrated than a baseline deep convolutional network
Adaptive propagation graph convolutional network
Graph convolutional networks (GCNs) are a family ofneural network models that perform inference on graph data byinterleaving vertexwise operations and message-passing exchanges acrossnodes. Concerning the latter, two key questions arise: 1) how to design adifferentiable exchange protocol (e.g., a one-hop Laplacian smoothing inthe original GCN) and 2) how to characterize the tradeoff in complexitywith respect to the local updates. In this brief, we show that the state-of-the-art results can be achieved by adapting the number of communicationsteps independently at every node. In particular, we endow each node witha halting unit (inspired by Graves’ adaptive computation time [1]) thatafter every exchange decides whether to continue communicating or not.We show that the proposed adaptive propagation GCN (AP-GCN)achieves superior or similar results to the best proposed models so faron a number of benchmarks while requiring a small overhead in termsof additional parameters. We also investigate a regularization term toenforce an explicit tradeoff between communication and accuracy. Thecode for the AP-GCN experiments is released as an open-source library
- …
