1,721,186 research outputs found
Characterizing Vision Backbones for Dense Prediction with Dense Attentive Probing
The paradigm of pretraining a backbone on a large set of (often unlabeled) images has gained popularity. The quality of the resulting features is commonly measured by freezing the backbone and training different task heads on top of it. However, current evaluations cover only classifications of whole images or require complex dense task heads which introduce a large number of parameters and add their own inductive biases. In this work, we propose dense attentive probing, a parameter-efficient readout method for dense prediction on arbitrary backbones – independent of the size and resolution of their feature volume. To this end, we extend cross-attention with distance-based masks of learnable sizes. We employ this method to evaluate 18 common backbones on dense predictions tasks in three dimensions: instance awareness, local semantics and spatial understanding. We find that DINOv2 outperforms all other backbones tested – including those supervised with masks and language – across all three task categories. Furthermore, our analysis suggests that self-supervised pretraining tends to yield features that separate object instances better than vision-language models.
Code is available at http://eckerlab.org/code/deap
One-Shot Segmentation in Clutter
We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call . Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce , an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes
Texture Modelling Using Convolutional Neural Networks
We introduce a new model of natural textures based on the feature spaces of convolutional neural networks optimised for object recognition. Samples from the model are of high perceptual quality demonstrating the generative power of neural networks trained in a purely discriminative fashion. Within the model, textures are represented by the correlations between feature maps in several layers of the network. We show that across layers the texture representations increasingly capture the statistical properties of natural images while making object information more and more explicit. Extending this framework to texture transfer, we introduce A Neural Algorithm of Artistic Style that can separate and recombine the image content and style of natural images. The algorithm allows us to produce new artistic imagery that combines the content of an arbitrary photograph with the appearance of numerous well-known artworks, thus offering a path towards an algorithmic understanding of how humans create and perceive artistic imagery
Reproducibility of predictive networks for mouse visual cortex
Deep predictive models of neuronal activity have recently enabled several new
discoveries about the selectivity and invariance of neurons in the visual
cortex. These models learn a shared set of nonlinear basis functions, which are
linearly combined via a learned weight vector to represent a neuron's function.
Such weight vectors, which can be thought as embeddings of neuronal function,
have been proposed to define functional cell types via unsupervised clustering.
However, as deep models are usually highly overparameterized, the learning
problem is unlikely to have a unique solution, which raises the question if
such embeddings can be used in a meaningful way for downstream analysis. In
this paper, we investigate how stable neuronal embeddings are with respect to
changes in model architecture and initialization. We find that
regularization to be an important ingredient for structured embeddings and
develop an adaptive regularization that adjusts the strength of regularization
per neuron. This regularization improves both predictive performance and how
consistently neuronal embeddings cluster across model fits compared to uniform
regularization. To overcome overparametrization, we propose an iterative
feature pruning strategy which reduces the dimensionality of
performance-optimized models by half without loss of performance and improves
the consistency of neuronal embeddings with respect to clustering neurons. This
result suggests that to achieve an objective taxonomy of cell types or a
compact representation of the functional landscape, we need novel architectures
or learning techniques that improve identifiability. We will make our code
available at publication time
- …
