1,720,967 research outputs found
Training multi-layer binary neural networks with random local binary error signals
Binary neural networks (BNNs) significantly reduce computational complexity and memory usage in machine and deep learning by representing weights and activations with just one bit. However, most existing training algorithms for BNNs rely on quantization-aware floating-point stochastic gradient descent (SGD), limiting the full exploitation of binary operations to the inference phase only. In this work, we propose, for the first time, a fully binary and gradient-free training algorithm for multi-layer BNNs, eliminating the need for back-propagated floating-point gradients. Specifically, the proposed algorithm relies on local binary error signals and binary weight updates, employing integer-valued hidden weights that serve as a synaptic metaplasticity mechanism, thereby enhancing its neurobiological plausibility. Our proposed solution enables the training of binary multi-layer perceptrons by using exclusively XNOR, Popcount, and increment/decrement operations. Experimental results on multi-class classification benchmarks show test accuracy improvements of up to +35.47% over the only existing fully binary single-layer state-of-the-art solution. Compared to full-precision SGD, our solution improves test accuracy by up to +35.30% under the same total memory demand, while also reducing computational cost by two to three orders of magnitude in terms of the total number of Boolean gates. The proposed algorithm is made available to the scientific community as a public repository
Quantifying Cryptocurrency Unpredictability: A Comprehensive Study of Complexity and Forecasting
This paper offers a thorough examination of the univariate predictability in cryptocurrency time-series. By exploiting a combination of complexity measure and model predictions we explore the cryptocurrencies time-series forecasting task focusing on the exchange rate in USD of Litecoin, Binance Coin, Bitcoin, Ethereum, and XRP. On one hand, to assess the complexity and the randomness of these time-series, a comparative analysis has been performed using Brownian and colored noises as a benchmark. The results obtained from the Complexity-Entropy causality plane and power density spectrum analysis reveal that cryptocurrency time-series exhibit characteristics closely resembling those of Brownian noise when analyzed in a univariate context. On the other hand, the application of a wide range of statistical, machine and deep learning models for time-series forecasting demonstrates the low predictability of cryptocurrencies. Notably, our analysis reveals that simpler models such as Naive models consistently outperform the more complex machine and deep learning ones in terms of forecasting accuracy across different forecast horizons and time windows. The combined study of complexity and forecasting accuracies highlights the difficulty of predicting the cryptocurrency market. These findings provide valuable insights into the inherent characteristics of the cryptocurrency data and highlight the need to reassess the challenges associated with predicting cryptocurrency’s price movements
Deep learning via message passing algorithms based on belief propagation
Message-passing algorithms based on the Belief Propagation (BP) equations
constitute a well-known distributed computational scheme. It is exact on
tree-like graphical models and has also proven to be effective in many problems
defined on graphs with loops (from inference to optimization, from signal
processing to clustering). The BP-based scheme is fundamentally different from
stochastic gradient descent (SGD), on which the current success of deep
networks is based. In this paper, we present and adapt to mini-batch training
on GPUs a family of BP-based message-passing algorithms with a reinforcement
field that biases distributions towards locally entropic solutions. These
algorithms are capable of training multi-layer neural networks with discrete
weights and activations with performance comparable to SGD-inspired heuristics
(BinaryNet) and are naturally well-adapted to continual learning. Furthermore,
using these algorithms to estimate the marginals of the weights allows us to
make approximate Bayesian predictions that have higher accuracy than point-wise
solutions
Impact of dendritic non-linearities on the computational capabilities of neurons
These nonlinearities have motivated mathematical descriptions of single neurons as a two-layer computational units, which have been shown to increase substantially the computational abilities of neurons, compared to linear dendritic integration. However, current analytical studies are restricted to neurons with unconstrained synaptic weights and unplausible dendritic nonlinearities. Here we introduce a two-layer model with sign-constrained synaptic weights and a biologically plausible form of dendritic nonlinearity and investigate its properties using both statistical physics methods and numerical simulations. We find that the dendritic nonlinearity enhances both the number of possible learned input-output associations and the learning velocity. We characterize how capacity and learning speed depend on the implemented nonlinearity and the levels of dendritic and somatic inhibition. We calculate analytically the distribution of synaptic weights in networks close to maximal capacity and find that the dendritic nonlinearity increases the fraction of zero-weight (“silent” or “potential”) synapses, compared with the standard perceptron model, when no or weak robustness constraints are present, while the opposite occurs with strong robustness constraints. We test our model on standard real-world benchmark datasets and observe empirically that the nonlinearity provides an enhancement in generalization performance and that it enables to capture more complex input-output relations, compared to the perceptron model
Shaping the learning landscape in neural networks around wide flat minima
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry
We systematize the approach to the investigation of deep neural network
landscapes by basing it on the geometry of the space of implemented functions
rather than the space of parameters. Grouping classifiers into equivalence
classes, we develop a standardized parameterization in which all symmetries are
removed, resulting in a toroidal topology. On this space, we explore the error
landscape rather than the loss. This lets us derive a meaningful notion of the
flatness of minimizers and of the geodesic paths connecting them. Using
different optimization algorithms that sample minimizers with different
flatness we study the mode connectivity and relative distances. Testing a
variety of state-of-the-art architectures and benchmark datasets, we confirm
the correlation between flatness and generalization performance; we further
show that in function space flatter minima are closer to each other and that
the barriers along the geodesics connecting them are small. We also find that
minimizers found by variants of gradient descent can be connected by zero-error
paths composed of two straight lines in parameter space, i.e. polygonal chains
with a single bend. We observe similar qualitative results in neural networks
with binary weights and activations, providing one of the first results
concerning the connectivity in this setting. Our results hinge on symmetry
removal, and are in remarkable agreement with the rich phenomenology described
by some recent analytical studies performed on simple shallow models
Chaos and Correlated Avalanches in Excitatory Neural Networks with Synaptic Plasticity
A collective chaotic phase with power law scaling of activity events is observed in a disordered mean field network of purely excitatory leaky integrate-and-fire neurons with short-term synaptic plasticity. The dynamical phase diagram exhibits two transitions from quasisynchronous and asynchronous regimes to the nontrivial, collective, bursty regime with avalanches. In the homogeneous case without disorder, the system synchronizes and the bursty behavior is reflected into a period doubling transition to chaos for a two dimensional discrete map. Numerical simulations show that the bursty chaotic phase with avalanches exhibits a spontaneous emergence of persistent time correlations and enhanced Kolmogorov complexity. Our analysis reveals a mechanism for the generation of irregular avalanches that emerges from the combination of disorder and deterministic underlying chaotic dynamics
Dinamica complessa emergente in reti neurali con plasticità sinaptica
This thesis concerns the study of the emerging dynamical regimes in a neural network in the presence of the mechanism of short-term
synaptic plasticity. In particular, the aim has been to characterize and to study the collective regimes of synchronization,
chaos and criticality.
Thanks to the measures developed in the thesis, it has been possible to draw with great precision the phase diagram
(hitherto unknown) of the leaky integrate-and-fire single neuron model connected with a Tsodyks-Uziel-Markram model for short-term synaptic plasticity
on a mean field and disordered topology,
and to elucidate (also analytically, by means of the reduction of the dynamics to a few simple coupled equations)
the mechanism by which the model becomes chaotic in the mean field phase, preserves chaos and generates power-law
distributed avalanches in the disordered topology.Questa tesi riguarda lo studio dei regimi dinamici emergenti in una rete neurale, in presenza del meccanismo di plasticità sinaptica a breve termine. In particolare, l'obiettivo è stato quello di caratterizzare e studiare i regimi collettivi di sincronizzazione, caos e criticalità.
Grazie alle misure sviluppate nella tesi, è stato possibile stabilire con grande precisione il diagramma di fase (finora sconosciuto) del modello a singolo neurone leaky integrate-and-fire connesso con un modello di plasticità sinaptica Tsodyks-Uziel-Markram in campo medio e su una topologia disordinata,
e chiarire (anche analiticamente, mediante la riduzione della dinamica a poche semplici equazioni accoppiate) il meccanismo con cui il modello diventa caotico nella fase di campo medio e preserva il caos e genera valanghe con taglie distribuite a legge di potenza nella topologia disordinata
- …
