1,720,997 research outputs found
Sfruttare e Trasferire conoscenza a priori nelle Architetture di Deep Learning
Nell'ultimo decennio, il Deep Learning è diventato un argomento caldo oltre che uno strumento dirompente nel contesto del Machine Learning e della Computer Vision. Si basa su un paradigma di apprendimento in cui i dati (ad esempio, i video acquisiti da telecamere di video-sorveglianza poste su una strada pubblica) giocano un ruolo cruciale. Sfruttando un gran numero di esempi, è possibile imparare compiti complessi e simili a quelli svolti da esseri umani (ad esempio, riconoscere azioni anomale in un video-stream) con risultati impressionanti. Tuttavia, se la disponibilità di dati rappresenta la più grande forza delle tecniche di Deep Learning, essa nasconde anche la più grande debolezza: lo sviluppo di applicazioni e servizi è, infatti, spesso limitato da tale requisito, poiché l'acquisizione e il mantenimento di una enorme quantità di dati sono attività costose che richiedono personale esperto e attrezzature idonee.
Tuttavia, la progettazione delle moderne architetture di Deep Learning offre diversi gradi di libertà, i quali possono essere sfruttati per mitigare la mancanza di dati di allenamento, sia essa parziale che completa. L'idea di fondo è quella di compensare tale mancanza incorporando una conoscenza preliminare che gli umani (in particolare, colore che controllano e guidano il processo di apprendimento) detengono sul dominio in questione. Infatti, le regole e le proprietà intrinseche si estendono ben oltre i dati di formazione e spesso possono essere identificate e imposte al modello di learning. Se prendiamo in considerazione la classificazione delle immagini, il successo delle Reti Neurali Convoluzionali (CNN) rispetto alle soluzioni del passato (come le Reti Neurali Multistrato) può essere attribuito principalmente a tale pratica. Infatti, i principi di progettazione del suo elemento costitutivo fondamentale (cioè la convoluzione tra due segnali 2D) riflettono naturalmente ciò che sapevamo sulle immagini naturali: le correlazioni che sussistono tra le regioni vicine dell'immagine hanno fornito pertanto una potente intuizione per lo sviluppo di modelli efficienti ed efficaci come lo sono ancora le CNN.
Lo scopo di questa tesi riguarda l'indagine e la proposta di nuovi modi di modellare e iniettare la conoscenza a priori nelle architetture di Deep Learning. È importante sottolineare che tale discussione è trasversale: infatti, si concentra su diversi domini di dati (ad esempio, immagini, video, dati strutturati mediante un grafo, ecc.) e coinvolge diversi livelli della pipeline complessiva. Su quest'ultimo punto, il lettore viene guidato in questa ricerca attraverso la seguente triplice categorizzazione: i) approcci basati sui parametri, che limitano lo spazio delle soluzioni possibili a quelle regioni che riflettono le proprietà geometriche dei dati; ii) approcci goal-driven, che guidano il processo di apprendimento verso soluzioni che incarnano alcune proprietà vantaggiose; iii) approcci data-driven, che sfruttano i dati per estrarre la conoscenza da utilizzare successivamente per condizionare l'algoritmo di training. Insieme a una descrizione completa di entrambe le impostazioni e degli strumenti coinvolti, presentiamo ampi risultati sperimentali e studi di ablazione che dimostrano il valore delle tecniche proposte in questa ricerca.In the last decade, Deep Learning has arisen as a hot topic and a disruptive tool in the fields of Machine Learning and Computer Vision. It builds upon a learning paradigm in which data (e.g., videos acquired by surveillance cameras placed on a public road) play a crucial role. By leveraging a great number of data-points, it is possible to fit complex and human-like tasks (e.g., recognizing abnormal actions in a video-stream) with impressive results. However, if data availability represents the source of the greatest strength of Deep Learning techniques, it also reveals the greatest weakness: the development of applications and services is indeed often restrained by such a requirement, as the acquisition and maintenance of a huge amount of data are expensive activities that require expert staff and equipment.
However, the design of modern Deep Learning architectures offers several degrees of freedom that can be exploited to mitigate the lack of training data, either partial or complete. The underlying idea is to compensate for it by incorporating a prior knowledge that humans (specifically, those who control and guide the learning process) hold about the domain at hand. Indeed, intrinsic rules and properties extend far beyond training data and can often be identified and imposed on the learner. If we take image classification into account, the success of Convolutional Neural Networks (CNNs) over past solutions (such as Multi-Layered Neural Networks) can be mainly ascribed to such a practice. Indeed, the design principle of its fundamental building block (i.e., the convolution between two 2D-signals) naturally reflect what we knew about natural images: in this regard, the correlations that subsist between neighborhood regions of the image provided so a powerful insight for the development of efficient and effective models as CNNs still prove to be.
The ultimate aim of this thesis is the investigation and proposal of novel ways of modeling and injecting prior knowledge in Deep Learning architectures. Importantly, we conduct such a discussion across the board: in fact, it focuses on several data domains (e.g., images, videos, graph-structured data, etc.) and involves different levels of the overall training pipeline. On this latter point, we guide the reader towards this research by means of the following threefold categorization: i) parameter-based approaches, which limit the space of feasible solutions to those regions reflecting geometrical properties of the data; ii) goal-driven approaches, which guide the learning process towards solutions that embody some advantageous properties; iii) data-driven approaches, which exploit data to extract the knowledge to be used to condition the training algorithm. Along with a comprehensive description of both settings and tools involved, we present extensive experimental results and ablation studies that demonstrate the value of the techniques proposed in this research
ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision
The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches exploit prior knowledge to avoid unintended solutions; we acknowledge that, in real-world settings, it could be unfeasible to gather enough prior information to characterize the bias, or it could even raise ethical considerations. We hence propose a novel debiasing approach, termed ClusterFix, which does not require any external hint about the nature of biases. Such an approach alters the standard empirical risk minimization and introduces a per-example weight, encoding how critical and far from the majority an example is. Notably, the weights consider how difficult it is for the model to infer the correct pseudo-label, which is obtained in a self-supervised manner by dividing examples into multiple clusters. Extensive experiments show that the misclassification error incurred in identifying the correct cluster allows for identifying examples prone to bias-related issues. As a result, our approach outperforms existing methods on standard benchmarks for bias removal and fairness
Continual Semi-Supervised Learning through Contrastive Interpolation Consistency
Continual Learning (CL) investigates how to train Deep Networks on a stream
of tasks without incurring forgetting. CL settings proposed in literature
assume that every incoming example is paired with ground-truth annotations.
However, this clashes with many real-world applications: gathering labeled
data, which is in itself tedious and expensive, becomes infeasible when data
flow as a stream. This work explores Continual Semi-Supervised Learning (CSSL):
here, only a small fraction of labeled input examples are shown to the learner.
We assess how current CL methods (e.g.: EWC, LwF, iCaRL, ER, GDumb, DER)
perform in this novel and challenging scenario, where overfitting entangles
forgetting. Subsequently, we design a novel CSSL method that exploits metric
learning and consistency regularization to leverage unlabeled examples while
learning. We show that our proposal exhibits higher resilience to diminishing
supervision and, even more surprisingly, relying only on 25% supervision
suffices to outperform SOTA methods trained under full supervision.Comment: 7 pages, 2 figures, to appear in Pattern Recognition Letters, Volume
162, October 2022, Pages 9-1
Class-Incremental Continual Learning into the eXtended DER-verse
The staple of human intelligence is the capability of acquiring knowledge in
a continuous fashion. In stark contrast, Deep Networks forget catastrophically
and, for this reason, the sub-field of Class-Incremental Continual Learning
fosters methods that learn a sequence of tasks incrementally, blending
sequentially-gained knowledge into a comprehensive prediction.
This work aims at assessing and overcoming the pitfalls of our previous
proposal Dark Experience Replay (DER), a simple and effective approach that
combines rehearsal and Knowledge Distillation. Inspired by the way our minds
constantly rewrite past recollections and set expectations for the future, we
endow our model with the abilities to i) revise its replay memory to welcome
novel information regarding past data ii) pave the way for learning yet unseen
classes.
We show that the application of these strategies leads to remarkable
improvements; indeed, the resulting method - termed eXtended-DER (X-DER) -
outperforms the state of the art on both standard benchmarks (such as CIFAR-100
and miniImagenet) and a novel one here introduced. To gain a better
understanding, we further provide extensive ablation studies that corroborate
and extend the findings of our previous research (e.g. the value of Knowledge
Distillation and flatter minima in continual learning setups).Comment: 23 pages, 22 figures. To appear in IEEE TPAM
Latent Space Autoregression for Novelty Detection
Novelty detection is commonly referred to as the discrimination of observations that do not conform to a learned model of regularity. Despite its importance in different application settings, designing a novelty detector is utterly complex due to the unpredictable nature of novelties and its inaccessibility during the training procedure, factors which expose the unsupervised nature of the problem. In our proposal, we design a general framework where we equip a deep autoencoder with a parametric density estimator that learns the probability distribution underlying its latent representations through an autoregressive procedure.
We show that a maximum likelihood objective, optimized in conjunction with the reconstruction of normal samples, effectively acts as a regularizer for the task at hand, by minimizing the differential entropy of the distribution spanned by latent vectors. In addition to providing a very general formulation, extensive experiments of our model on publicly available datasets deliver on-par or superior performances if compared to state-of-the-art methods in one-class and video anomaly detection settings. Differently from prior works, our proposal does not make any assumption about the nature of the novelties, making our work readily applicable to diverse contexts
Towards Unbiased Continual Learning: Avoiding Forgetting in the Presence of Spurious Correlations
Multi-views Embedding for Cattle Re-identification
People re-identification task has seen enormous improvements in the latest years, mainly due to the development of better image features extraction from deep Convolutional Neural Networks (CNN) and the availability of large datasets. However, little research has been conducted on animal identification and re-identification, even if this knowledge may be useful in a rich variety of different scenarios. Here, we tackle cattle re-identification exploiting deep CNN and show how this task is poorly related to the human one, presenting unique challenges that make it far from being solved. We present various baselines, both based on deep architectures or on standard machine learning algorithms, and compared them with our solution. Finally, a rich ablation study has been conducted to further investigate the unique peculiarities of this task
Context-guided Prompt Learning for Continual WSI Classification
Whole Slide Images (WSIs) are crucial in histological diagnostics, providing high-resolution insights into cellular structures. In addition to challenges like the gigapixel scale of WSIs and the lack of pixel-level annotations, privacy restrictions further complicate their analysis. For instance, in a hospital network, different facilities need to collaborate on WSI analysis without the possibility of sharing sensitive patient data. A more practical and secure approach involves sharing models capable of continual adaptation to new data. However, without proper measures, catastrophic forgetting can occur. Traditional continual learning techniques rely on storing previous data, which violates privacy restrictions. To address this issue, this paper introduces Context Optimization Multiple Instance Learning (CooMIL), a rehearsal-free continual learning framework explicitly designed for WSI analysis. It employs a WSI-specific prompt learning procedure to adapt classification models across tasks, efficiently preventing catastrophic forgetting. Evaluated on four public WSI datasets from TCGA projects, our model significantly outperforms state-of-the-art methods within the WSI-based continual learning framework. The source code is available at https://github.com/FrancescaMiccolis/CooMIL
U-Net Transplant: The Role of Pre-training for Model Merging in 3D Medical Segmentation
Despite their remarkable success in medical image segmentation, the life cycle of deep neural networks remains a challenge in clinical applications. These models must be regularly updated to integrate new medical data and customized to meet evolving diagnostic standards, regulatory requirements, commercial needs, and privacy constraints. Model merging offers a promising solution, as it allows working with multiple specialized networks that can be created and combined dynamically instead of relying on monolithic models. While extensively studied in standard 2D classification, the potential of model merging for 3D segmentation remains unexplored. This paper presents an efficient framework that allows effective model merging in the domain of 3D image segmentation. Our approach builds upon theoretical analysis and encourages wide minima during pre-training, which we demonstrate to facilitate subsequent model merging. Using U-Net 3D, we evaluate the method on distinct anatomical structures with the ToothFairy2 and BTCV Abdomen datasets. To support further research, we release the source code and all the model weights in a dedicated repository: https://github.com/LucaLumetti/UNetTransplan
- …
