1,720,985 research outputs found

    Local depth edge detection in humans and deep neural networks

    No full text
    Distinguishing edges caused by a change in depth from other types of edges is an important problem in early vision. We investigate the performance of humans and computer vision models on this task. We use spherical imagery with ground-truth LiDAR range data to build an objective ground-truth dataset for edge classification. We compare various computational models for classifying depth from non-depth edges in small images patches and achieve the best performance (86%) with a convolutional neural network. We investigate human performance on this task in a behavioral experiment and find that human performance is lower than the CNN. Although human and CNN depth responses are correlated, observers’ responses are better predicted by other observers than by the CNN. The responses of CNNs and human observers also show a slightly different pattern of correlation with low-level edge cues, which suggests that CNNs and human observers may weight these features differently for classifying edges

    A general account of peripheral encoding also predicts scene perception performance

    No full text
    People are good at rapidly extracting the "gist" of a scene at a glance, meaning with a single fixation. It is generally presumed that this performance cannot be mediated by the same encoding that underlies tasks such as visual search, for which researchers have suggested that selective attention may be necessary to bind features from multiple preattentively computed feature maps. This has led to the suggestion that scenes might be special, perhaps utilizing an unlimited capacity channel, perhaps due to brain regions dedicated to this processing. Here we test whether a single encoding might instead underlie all of these tasks. In our study, participants performed various navigation-relevant scene perception tasks while fixating photographs of outdoor scenes. Participants answered questions about scene category, spatial layout, geographic location, or the presence of objects. We then asked whether an encoding model previously shown to predict performance in crowded object recognition and visual search might also underlie the performance on those tasks. We show that this model does a reasonably good job of predicting performance on these scene tasks, suggesting that scene tasks may not be so special; they may rely on the same underlying encoding as search and crowded object recognition. We also demonstrate that a number of alternative "models" of the information available in the periphery also do a reasonable job of predicting performance at the scene tasks, suggesting that scene tasks alone may not be ideal for distinguishing between models. Keywords: scene perception; peripheral vision; crowding; parafoveal vision; navigationNational Science Foundation (U.S.) (Award IIS-1607486

    Dataset for: Category Systems for Real-World Scenes

    No full text
    Data from Experiments 1, 2, 3, and 4, including a MATLAB function that executes the clustering algorithm described in the paper: Category Systems for Real-World Scenes in Journal of vision by Anderson, M. D. et al.</span

    Canonical views of scenes depend on the shape of the space

    Full text link
    When recognizing or depicting objects, people show a preference for particular “canonical” views. Are there similar preferences for particular views of scenes? We investigated this question using panoramic images, which show a 360-degree view of a location. Observers used an interactive viewer to explore the scene and select the best view. We found that agreement between observers on the “best” view of each scene was generally high. We attempted to predict the selected views using a model based on the shape of the space around the camera location and on the navigational constraints of the scene. The model performance suggests that observers select views which capture as much of the surrounding space as possible, but do not consider navigational constraints when selecting views. These results seem analogous to findings with objects, which suggest that canonical views maximize the visible surfaces of an object, but are not necessarily functional views.National Science Foundation (U.S.) (NSF Career award (0546262))National Science Foundation (U.S.) (Grant 0705677)National Institutes of Health (U.S.) (Grant 1016862)National Eye Institute (grant EY02484)National Science Foundation (U.S.) (NSF Graduate Research Fellowship

    Category systems for real-world scenes

    Full text link
    Categorization performance is a popular metric of scene recognition and understanding in behavioral and computational research. However, categorical constructs and their labels can be somewhat arbitrary. Derived from exhaustive vocabularies of place names (e.g., Deng et al., 2009), or the judgements of small groups of researchers (e.g., Fei-Fei, Iyer, Koch,&amp;Perona, 2007), these categories may not correspond with human-preferred taxonomies. Here, we propose clustering by increasing the rand index via coordinate ascent (CIRCA): an unsupervised, data-driven clustering method for deriving ground-truth scene categories. In Experiment 1, human participants organized 80 stereoscopic images of outdoor scenes from the Southampton-York Natural Scenes (SYNS) dataset (Adams et al., 2016) into discrete categories. In separate tasks, images were grouped according to i) semantic content, ii) three-dimensional spatial structure, or iii) two-dimensional image appearance. Participants provided text labels for each group. Using the CIRCA method, we determined the most representative category structure and then derived category labels for each task/dimension. In Experiment 2, we found that these categories generalized well to a larger set of SYNS images, and new observers. In Experiment 3, we tested the relationship between our category systems and the spatial envelope model (Oliva&amp;Torralba, 2001). Finally, in Experiment 4, we validated CIRCA on a larger, independent dataset of same-different category judgements. The derived category systems outperformed the SUN taxonomy (Xiao, Hays, Ehinger, Oliva,&amp;Torralba, 2010) and an alternative clustering method (Greene, 2019). In summary, we believe this novel categorization method can be applied to a wide range of datasets to derive optimal categorical groupings and labels from psychophysical judgements of stimulus similarity.</p

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Rethinking the role of top-down attention in vision: effects attributable to a lossy representation in peripheral vision

    Full text link
    According to common wisdom in the field of visual perception, top-down selective attention is required in order to bind features into objects. In this view, even simple tasks, such as distinguishing a rotated T from a rotated L, require selective attention since they require feature binding. Selective attention, in turn, is commonly conceived as involving volition, intention, and at least implicitly, awareness. There is something non-intuitive about the notion that we might need so expensive (and possibly human) a resource as conscious awareness in order to perform so basic a function as perception. In fact, we can carry out complex sensorimotor tasks, seemingly in the near absence of awareness or volitional shifts of attention (“zombie behaviors”). More generally, the tight association between attention and awareness, and the presumed role of attention on perception, is problematic. We propose that under normal viewing conditions, the main processes of feature binding and perception proceed largely independently of top-down selective attention. Recent work suggests that there is a significant loss of information in early stages of visual processing, especially in the periphery. In particular, our texture tiling model (TTM) represents images in terms of a fixed set of “texture” statistics computed over local pooling regions that tile the visual input. We argue that this lossy representation produces the perceptual ambiguities that have previously been as ascribed to a lack of feature binding in the absence of selective attention. At the same time, the TTM representation is sufficiently rich to explain performance in such complex tasks as scene gist recognition, pop-out target search, and navigation. A number of phenomena that have previously been explained in terms of voluntary attention can be explained more parsimoniously with the TTM. In this model, peripheral vision introduces a specific kind of information loss, and the information available to an observer varies greatly depending upon shifts of the point of gaze (which usually occur without awareness). The available information, in turn, provides a key determinant of the visual system’s capabilities and deficiencies. This scheme dissociates basic perceptual operations, such as feature binding, from both top-down attention and conscious awareness.National Institutes of Health (U.S.) (Grant 1-R21-EY019366-01A1)National Science Foundation (U.S.). Graduate Research Fellowship Progra
    corecore