Caltech Submillimeter Observatory

CaltechCONF
Not a member yet
    239 research outputs found

    Effects of Optic-Flow Density on the Metric Estimation of Rotation and Expansion

    No full text
    Optic flow generated by rigid surface patches can be decomposed into four elementary motion types. We have shown that the human visual system can metrically estimate two of these motion types, rotation and expansion, by angular velocity and rate of expansion respectively. However, this contradicts previous work that reported linear velocity to be the parameter estimated. This discrepancy was caused by a difference in experimental methods. Experimental evidence shows that the visual system uses a different motion parameter based on the amount of motion information available. We've modeled this systematic switchover in information utilized in a probabilistic manner. Specifically, low motion information stimuli have a higher probability of being estimated by linear velocity than high motion information stimul

    Timecourse and Temporal Dynamics of Attention in Visual/Auditory Central/Peripheral Cuing

    No full text
    Timecourse is a performance signature of attention systems 1. In this study, an attention reaction paradigm2 measures the timecourse of attention in visual central (VC), visual peripheral (VP), auditory central (AC), and auditory peripheral (AP) cuing of visual spatial attention. Observers viewed four synchronized letter streams at the corners of a 28 by 28 deg box, while fixating at the center. In each stream, an independent random permutation of 22 letters appeared at 10 /s. Observers were instructed to report the earliest three letters available from the target stream, with payoffs decreasing with cue-report SOA. Four types of cues were used: an arrow at fixation (VC), an arrow adjacent to the target (VP), a tone coming from behind the target location (AP), and tones of four different frequencies at fixation (AC). Experiments were blocked by cue type, in Latin Square order. For VP and VC conditions, the first reported items occurred at about 100 ms and item report peaked around 200 ms post- cue (median 171 ms). In the AC condition, the reported items were from 100 ms to 400 ms (median 226 ms). Most interestingly, in the AP condition, the earliest reported items were simultaneous with the cue and the peak was at 100 ms post-cue (median 96 ms)! The full timecourse functions, or report distributions, were well described by gamma functions: the same shape for VC, VP and AP, a different shape for AC. Moreover, while VC and VP were fit with exactly the same parameters, the best fitting gamma function for AP was shifted backward (started earlier) by 75 ms relative to VC/VP. We conclude that the time courses of VC, VP and AP share the same distribution, but differ in offsets: completely equal for VC and VP and 75 ms faster for AP. AC is qualitatively different. Interestingly, the differences between cuing conditions were resolved by an analysis that considered report probability as a function of temporal position, relative to the first reported item (rather than the cue). A computation model of attention demonstrates that the timecourse function observed in each cuing condition can be accounted for by two temporal functions (1) a general attention gating function, describing the temporal dynamics of attention across all cuing conditions, and (2) an opening time distribution, specific to each cuing condition, describing the probability of opening the attention gate as a function of time. These results suggest that, as previously shown3, but contradictory to common belief, the same temporal characteristics underlie performance in different cuing conditions

    Fast and Biologically Plausible Computation with Perturbed Gaussian Markov Processes

    No full text
    In recent years a wide range of statistical models have been applied to vision related problems and have enjoyed much success. In a generative probabilistic model, the probability distributions of the observed images together with hidden variables describing the images are formulated (in a top-down fashion), and the visual perception and learning can be understood as an inference (bottom-up) operation that computes the posterior probabilities over the hidden variables, based on which model selection and parameter tuning, for example, can be carried out. A 'good' model requires a realistic probabilistic formulation that closely matches the statistics of the input data, and requires that the computation resulting from such formulation is tractable, and hopefully also biologically plausible. Those two requirements are not trivial. In factor analysis, for example, the observed image is expressed as a linear superposition of many basis functions. While the generation or synthesis of the image is immediate, the inference operation would typically require iterations if non-Gaussian prior is assumed or if direct matrix inversion is not allowed. If, on the other hand, the image is simply projected onto a set of filters, e.g., Gabor functions, then the probabilistic formulation is confounded, that is, it's not immediately clear how confident it would be to interpret a certain filter's response as the detection of a feature,e.g., an edge. In this talk, we present a generative probabilistic model that consists of a mixture of perturbed 2D Gaussian Markov processes. (Because of this mixing, the resulting model is non-Gaussian.) In each Gaussian Markov process, the adjacent hidden nodes on a 2D grid is coupled by some bond energy that resembles the energy prescribed in the "plate" model. This bond energy, however, can be subject to perturbation. Specifically, the 'bond' can be 'broken' or weakened. This is a manipulation on the inverse of the covariance matrix of the Gaussian process, instead of a constant amount of addition/subtraction to the covariance matrix as in the case of adding/removing a basis in the factor analysis. We show that the inference of the posterior probability of such perturbation amounts to the following computation: the input image is projected onto several receptive fields, and their outputs then go through a quadratic nonlinearity, subtract a threshold (controlled by the prior) and subsequently undergo a sigmoid function. Low-level features such as edges and bars of different scale and orientation can be obtained by suitable perturbations. Therefore the output of those feature detectors correspond to the data-likelihood given those components in our mixture model. We demonstrate how different features interact with each other: specifically, lateral inhibition and colinear facilitation. Also, we show that a contour can 'gate', or modify the extent of other feature detectors in its vicinity. Note that those phenomena fall directly from our probabilistic formulation; there are no heuristics involved. When we move beyond individual feature detectors and try to infer the posterior probability of contours, we will encounter the computation involving matrix inversion. We then show that there exists a family of effective preconditioners for different configurations of contours. In fact, those preconditioners are so good that the matrix inversion can be obtained in a single step! The posterior mean and covariance of the hidden nodes can therefore be easily obtained (in negligible time on a PC). In contrast, algorithms such as anisotropic diffusion or Graduated Non- Convexity would typically need many iterations of lateral propagation of information. In summary, apart from adapting a few parameters (e.g., noise level), the inference of our model can be carried out in predominantly feedforward, fan-in/fan-out type of computation, and seems biologically plausible

    Spatial content of faces may be critical for individualizing faces

    No full text
    Many of the phenomena associated with face (vs. object) recognition can be understood in terms of a representation for individuating faces that retains aspects of the original spatial filter activations, as posited by Malsburg's Gabor Jet model that mimics the functions of the columns of V1 simple cells. Objects, in contrast, may be represented by a structural description specifying explicit relations among view invariant properties of edges of simple parts. Subjects judged whether a sequentially presented pair of images was the face of the same person, in one condition, or the same chair, in another (Biederman & Kalocsai, 1997). The images were filtered (in the Fourier domain) into 8 scales and 8 orientations. Complementary pairs of each person or chair image were created by assigning the content of every other combination of scale and orientation to a given image. (If the scales are ordered as rows and the orientations as columns to form a checkerboard, then one member of a complementary pair would have the content from the red squares and the other member the content from the black squares.) On half the matching trials (i.e., the same chair or the same person), the images were complements; on the other half they were identical. Consistent with the hypothesis that face representations retained the original spatial content, matching complements of faces resulted in markedly greater error rates and RTs than the identical images. No such costs were apparent when matching chairs. However, the chairs different in small parts that could be discerned from their edges. The present study examined the costs of complementizing smooth, non-face, blobby objects (variations in the amplitudes of the harmonics of a sphere) that differed from each other metrically, as did the faces. A cost of complementizing was observed for the blobs but this cost was smaller than that for the faces. The necessity to make fine metric judgments of smooth surfaces may underlie part of the sensitivity of faces to the spatial content of the imag

    A mathematical framework for the design and analysis of feature biasing strategies

    No full text
    Please see attached pdf file

    Temporal tuning characteristics of perceptual templates

    No full text
    External noise presented in temporal contiguity with a target impairs perceptual performance, reflecting the temporal tuning of the perceptual template. Deriving the temporal weights of the perceptual template, however, requires an observer model that segregates the impact of non-linearities and intrinsic inefficiencies of the observer in order to account for the impact of external noise in various temporal configurations. We showed that the perceptual template model successfully accounts for temporal masking functions measured with a wide range of temporal configurations of external noise, and estimates the temporal characteristics of the perceptual template. This was first demonstrated in estimating the temporal tuning characteristics of the perceptual template in a foveal Gabor orientation identification task. The same procedure was then used to compare the temporal tuning characteristics of the perceptual template in pre- and simultaneous cuing of spatial attention in a peripheral Gabor orientation identification task. In both experiments, four non-overlapping temporal regions of external noise, each occurring at different temporal intervals from the target display, were combined in 10 different temporal configurations. Psychometric functions were measured in the 10 external noise temporal configurations along with a zero noise condition. The PTM model provides a full account of all the psychometric functions. The estimated full width of the perceptual template at half-height is about 80 ms in experiment 1, 67 ms in the pre-cuing condition and about 90 ms in the simultaneous cuing condition in experiment 2. Manipulations of the temporal configurations of the external noise coupled with the PTM thus provide a method to characterize the temporal tuning properties of perceptual templates with an intrinsically coherent structure

    Learning Motor Primitives with Reinforcement Learning

    No full text
    One of the major challenges in action generation for robotics and in the understanding of human motor control is to learn the "building blocks of move- ment generation," or more precisely, motor primitives. Recently, Ijspeert et al. [1, 2] suggested a novel framework how to use nonlinear dynamical systems as motor primitives. While a lot of progress has been made in teaching these mo- tor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this poster, we evaluate different reinforcement learning approaches can be used in order to improve the performance of motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and line out how these lead to a novel algorithm which is based on natural policy gradients [3]. We compare this algorithm to previous reinforcement learning algorithms in the context of dynamic motor primitive learning, and show that it outperforms these by at least an order of magnitude. We demonstrate the efficiency of the resulting reinforcement learning method for creating complex behaviors for automous robotics. The studied behaviors will include both discrete, finite tasks such as baseball swings, as well as complex rhythmic patterns as they occur in biped locomotion

    Recognizing Persons with One-Shot Learning

    No full text
    There have been several attempts to solve the problem of Human Recognition i.e. the ability to identify individual persons in novel situations. Using facial features (e.g. Wiskott et al, Facial Recognition using Elastic Bunch Graph Matching, 1997) for this purpose has proved to be quite successful. However when a person is at a appreciable distance, then the facial resolution is insufficient for reliable recognition. Therefore, some systems use additional information such as: Walking Patterns (Collins et al, Silhouette-based Human Identification from Body Shape and Gait, 2002) or distinguish color and shape features using Support Vector Machine classifiers (Nakajima et al, Full-body Person Recognition System,2003). We present here a simple system, which is able to recognize and track people from video sequences in real time. The implemented system learns the representation of the person using just a single video sequence (one-shot), with enough detail to permit later recognition and enough generality to deal with variation. To achieve this we divide the image of a person into regions: head, torso and legs, using a minimal model of the human body (corresponding to a virtually naive spectator). It learns the color and texture features for each region and stores them in a database of people. Thereafter, for recognition, it computes a similarity function between the input 'instance' and each person in the database. The person that generates the maximum similarity is chosen as the recognized person (this similarity value often exceeds the other people in the database by over three orders of magnitude). Furthermore, since there is no specific parameter tuning required for either learning or recognition, the system illustrates superior ability for automatic visual surveillance. This system could be used in conjunction with a face recognition system to reduce the search space for faces, by narrowing down the number of possibilities based on person recognition

    The Tsallis Entropy in the EEGs of Normal and Demented Individuals

    No full text
    The electroencephalogram (EEG) is a recording of the brain's total electrical activity. Since the brain processes information, the information in the brain's total electrical activity probably corresponds to the information processing in the brain. This assumption was used to study the entropy or 'self-information' in the EEGs of participants who were performing a short-term memory task. There were two groups of participants in this study; one group had a medical diagnosis of "normal aging,"(Normal) and the other group had a diagnosis of "very mild dementia," (Dementia). The dementia diagnosis means that they have short-term memory impairment. The EEG of each participant was recorded while they performed a short-term memory task; face recall. Their EEG was also recorded while they performed a second short-term memory task; object recall. These EEG data were used to test a basic hypothesis; the entropy in the EEGs of the Dementia group would be significantly different than the entropy in the EEGs of the Normal group. When choosing a method to test this hypothesis, it is important to account for how the brain processes a recall stimulus. The brain information processing that occurs during a recall task has informational, temporal and spatial properties. Therefore, to accurately analyze the entropy in the EEG data three things must be specified: 1) which entropy measure to use, 2) which time intervals of the EEG data to use, and 3) what locations on the participant's scalp (choice of EEG electrodes) to use. The specifications used are: 1) Entropy measure. The entropy measure used was the Tsallis entropy. Tsallis entropy is a generalization of the Shannon entropy to a non-additive entropy measure. An additive entropy measure assumes that the entropy of a whole system is equal to the sum of the entropies of each part of the system. EEG data may not conform to this assumption, so we used the Tsallis entropy instead of the Shannon entropy. 2) EEG time interval. The EEG data used were those data that occurred during the first 300 milliseconds (ms) after the appearance of the recall stimulus. A participant's response is purely perceptual/cognitive for about 300 ms after the appearance of the recall stimulus (Sternberg 1966, 1969). Muscle movement responses, responses which are more variable, begin later in the task. This suggests that the first 300 ms of the EEG data will be more specific to the task. 3) Spatial component - the choice of EEG electrodes. Electrodes chosen for this EEG data analysis should correspond with the way that information moves through the brain during the first 300 ms of the recall task. The task began with a visual stimulus. The information from this stimulus enters the posterior cortex at V1 (Broadman area 17). After about 150 ms, this (transformed and partially altered) information enters the anterior cortex. Therefore, the EEG data which correspond to the recall task are those data recorded by posterior electrodes during the first 150 ms and those data recorded by anterior electrodes for the next 150 ms. The entropy analysis of these data was accomplished by computing the Tsallis entropy in two posterior electrodes; P3 and P4 for the first 150 ms after the appearance of the stimulus. Then, the Tsallis entropy of the EEG data in the second, contiguous 150 ms time interval was computed. The EEG data for this second entropy measurement were recorded by two anterior electrodes: T7Fp3 and T8Fp4 (electrodes placed slightly behind the temples). Thus, the EEG data analyzed were data which corresponded to the flow of brain information during the first 300 ms of the recall task. It has been assumed that the entropy in an EEG corresponds to brain information processing during the recall task. However, these entropies, alone, do not show how brain information changes when moving from posterior cortex to anterior cortex. A commonsense solution to this difficulty would be to compute the mutual entropy (mutual information) measure. However, this measure is based on the assumption of a closed information channel. This assumption does not hold true for the brain. The neural information pathway from early visual cortex to anterior cortex is not a closed pathway. For this reason, a relative entropy measure, a measure of the amount of anterior EEG entropy relative to the amount of posterior EEG entropy is more appropriate. This relative entropy measure is the ratio of the anterior EEG entropy to the posterior EEG entropy. This ratio of entropies or "entropy ratio" was the measure used to compare Normal and Dementia participants. Normal participants were expected to have larger entropy ratios than those with dementia. Thus, the quantitative hypothesis is, the entropy ratios of the Normal participants will be higher than the entropy ratios of the Dementia participants. For the most part, this hypothesis was formulated before the testing of the 47 participants. It was ante hoc. To be more exact, the specifics of the method for testing the hypothesis were formulated during the EEG testing of the first few participants; about eight participants. A total of 33 normal aging participants and 14 very mildly demented participants were tested. The results are: 31 of the 33 Normals had higher entropy ratios than the 14 entropy ratios of the Dementia group. These 31 entropy ratios were all greater than one. 14 of the 14 entropy ratios of the Dementia group were less than or equal to one (at two decimal places of precision). Participant's entropy ratios can be used to discriminate between the Normal and Dementia groups. Assume that entropy ratios greater than one denote normal aging and entropy ratios less than or equal to one denote dementia (impaired short-term memory). Then these criteria distinguish between the Normal and Dementia groups with a specificity of 100% (14 of 14) and a sensitivity of 94% (31 of 33). This means that two Normal participants were incorrectly classified as having dementia. However, both of these participants have a family history of Alzheimer's Disease dementia. One participant had a parent, now deceased, who had severe dementia. This participants other parent has Alzheimer's Disease. The second participant also has a parent with Alzheimer's a genetic predisposition to Alzheimer's Disease dementia. For this reason, these two participants may have very early Alzheimer's Disease. This remains to be seen, as does further refinement and testing of this hypothesis

    Predicting EMG Activity from Neural Firing in M1 with Bayesian Backfitting

    No full text
    Much attention has been given to directly interpreting neural firing in the primary motor cortex as a force signal, i.e., a signal that correlates with force production in muscles. How to robustly predict EMG patterns from M1 firing and which M1 neurons contribute to a particular muscle behaviour are interesting questions that arise under this hypothesis. From a statistical point of view, this question corresponds to analyzing datasets with a large number of input dimensions to detect which inputs contribute the most to the outputs. This is, at worst, a computationally exhausting combinatorial task. We present a Bayesian Backfitting algorithm that automatically determines the relevant input dimensions in a regression problem. We compare this algorithm to a brute-force approach that considers combinations of relevant input dimensions. The dataset (Sergio & Kalasha, 1998) consists of neuronal firing of M1 neurons and the corresponding muscle EMG data. Bayesian Backfitting successfully determines the correlations between inputs and outputs and closely matches results from the brute-force analysis, performing the task in orders of magnitude faster. In addition to demonstrating that M1 neurons are good predictors of EMG traces, our work shows that Bayesian Backfitting can be used as a new, statistically sound tool to replace traditional tools in biological data analysis. Such new Bayesian methods enable data analyses that previously could only have been conducted on supercomputing facilities

    214

    full texts

    239

    metadata records
    Updated in last 30 days.
    CaltechCONF
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇