1,720,986 research outputs found

    Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices

    No full text
    1 online resource (PDF, 38 pages, includes illustrations)Bah, Bubacarr; Tanner, Jared. (2011). Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/181163

    Restricted isometry constants in compressed sensing

    Full text link
    Compressed Sensing (CS) is a framework where we measure data through a non-adaptive linear mapping with far fewer measurements that the ambient dimension of the data. This is made possible by the exploitation of the inherent structure (simplicity) in the data being measured. The central issues in this framework is the design and analysis of the measurement operator (matrix) and recovery algorithms. Restricted isometry constants (RIC) of the measurement matrix are the most widely used tool for the analysis of CS recovery algorithms. The addition of the subscripts 1 and 2 below reflects the two RIC variants developed in the CS literature, they refer to the ℓ1-norm and ℓ2-norm respectively. The RIC2 of a matrix A measures how close to an isometry is the action of A on vectors with few nonzero entries, measured in the ℓ2-norm. This, and related quantities, provide a mechanism by which standard eigen-analysis can be applied to topics relying on sparsity. Specifically, the upper and lower RIC2 of a matrix A of size n × N is the maximum and the minimum deviation from unity (one) of the largest and smallest, respectively, square of singular values of all (N/k)matrices formed by taking k columns from A. Calculation of the RIC2 is intractable for most matrices due to its combinatorial nature; however, many random matrices typically have bounded RIC2 in some range of problem sizes (k, n,N). We provide the best known bound on the RIC2 for Gaussian matrices, which is also the smallest known bound on the RIC2 for any large rectangular matrix. Our results are built on the prior bounds of Blanchard, Cartis, and Tanner in Compressed Sensing: How sharp is the Restricted Isometry Property?, with improvements achieved by grouping submatrices that share a substantial number of columns. RIC2 bounds have been presented for a variety of random matrices, matrix dimensions and sparsity ranges. We provide explicit formulae for RIC2 bounds, of n × N Gaussian matrices with sparsity k, in three settings: a) n/N fixed and k/n approaching zero, b) k/n fixed and n/N approaching zero, and c) n/N approaching zero with k/n decaying inverse logarithmically in N/n; in these three settings the RICs a) decay to zero, b) become unbounded (or approach inherent bounds), and c) approach a non-zero constant. Implications of these results for RIC2 based analysis of CS algorithms are presented. The RIC2 of sparse mean zero random matrices can be bounded by using concentration bounds of Gaussian matrices. However, this RIC2 approach does not capture the benefits of the sparse matrices, and in so doing gives pessimistic bounds. RIC1 is a variant of RIC2 where the nearness to an isometry is measured in the ℓ1-norm, which is both able to better capture the structure of sparse matrices and allows for the analysis of non-mean zero matrices. We consider a probabilistic construction of sparse random matrices where each column has a fixed number of non-zeros whose row indices are drawn uniformly at random. These matrices have a one-to-one correspondence with the adjacency matrices of fixed left degree expander graphs. We present formulae for the expected cardinality of the set of neighbours for these graphs, and present a tail bound on the probability that this cardinality will be less than the expected value. Deducible from this bound is a similar bound for the expansion of the graph which is of interest in many applications. These bounds are derived through a more detailed analysis of collisions in unions of sets using a dyadic splitting technique. This bound allows for quantitative sampling theorems on existence of expander graphs and the sparse random matrices we consider and also quantitative CS sampling theorems when using sparse non mean-zero measurement matrices

    Using Neural Networks to identify Individual Animals from Photographs

    Full text link
    Effective management needs to know sizes of animal populations. This can be accomplished in various ways, but a very popular way is mark-recapture studies. Mark-recapture studies need a way of telling if a captured animal has been previously seen. For traditional mark-recapture, this is achieved by applying a tag to the animal. For non-invasive mark-recapture methods which exploit photographs, there is no tag on the animal’s body. As a result, these methods require animals to be individually identifiable. They assess if an animal has been caught before by examining photographs for animals which have individual-specific marks (Cross et al., 2014; Gomez et al., 2016; Beijbom et al., 2016; Körschens, Barz, and Denzler, 2018). This study develops a model which can reliably match photographs of the same individual based on individual-specific marks. The model consists of two main parts, an object detection model, and a classifier which takes two photos as input and outputs a predicted probability that the pair is from the same individual (a match). The object detection model is a convolutional neural network (CNN) and the matching classifier is a special kind of CNN called a siamese network. The siamese network uses a pair of CNNs that share weights to summarise the images, followed by some dense layers which combine the summaries into measures of similarity which can be used to predict a match. The model is tested on two case studies, humpback whales (HBWs) and western leopard toads (WLTs). The HBW dataset consists of images originally collected by various institutions across the globe and uploaded to the Happywhale platform which encourages scientists to identify individual mammals. HBWs can be identified by their fins and specials markings. There is lots of data for this problem. The WLT dataset consists of images collected by citizen scientists in South Africa. They were either uploaded to iSpot, a citizen science project which collects images or sent to the (WLT) project, a conservation project staffed by volunteers. WLTs can be identified by their unique spots. There is a little data for this problem. One part of this dataset consists of labelled individuals and another part is unlabelled. The model was able to give good results for both HBWs and WLTs. In 95% of the cases the model managed to correctly identify if a pair of images is from the same HBW individual or not. It accurately identified if a pair of images is drawn from the same WLT individual or not in 87% of the cases. This study also assessed the effectiveness of the semi-supervised approach on the WLT unlabelled dataset. In this study, the semisupervised approach has been partially successful. The model was able to identify new individuals and matches which were not identified before, but they were relatively few in numbers. Without an exhaustive check of the data, it is not clear whether this is due to the failure of the semi-supervised approach, or because there are not many matches in the data. After adding the newly identified and labelled individuals to the WLT labelled dataset, the model slightly improved its performance and correctly identified 89% of WLT pairs. A number of computer-aided photo-matching algorithms have been proposed (Matthé et al., 2017). This study also assessed the performance of Wild-ID (Bolger et al., 2012), one of the commonly used photo-matching algorithm on both HBW and WLT datasets. The model developed in this thesis achieved very competitive results compared with Wild-ID. Model accuracies for the proposed siamese network were much higher than those returned by Wild-ID on the HBW dataset, and roughly the same on the WLT dataset

    Improved identification accuracy in equation learning via comprehensive R2\boldsymbol{R^2}-elimination and Bayesian model selection

    Full text link
    In the field of equation learning, exhaustively considering all possible equations derived from a basis function dictionary is infeasible. Sparse regression and greedy algorithms have emerged as popular approaches to tackle this challenge. However, the presence of multicollinearity poses difficulties for sparse regression techniques, and greedy steps may inadvertently exclude terms of the true equation, leading to reduced identification accuracy. In this article, we present an approach that strikes a balance between comprehensiveness and efficiency in equation learning. Inspired by stepwise regression, our approach combines the coefficient of determination, R2R^2, and the Bayesian model evidence, p(yM)p(\boldsymbol y|\mathcal M), in a novel way. Our procedure is characterized by a comprehensive search with just a minor reduction of the model space at each iteration step. With two flavors of our approach and the adoption of p(yM)p(\boldsymbol y|\mathcal M) for bi-directional stepwise regression, we present a total of three new avenues for equation learning. Through three extensive numerical experiments involving random polynomials and dynamical systems, we compare our approach against four state-of-the-art methods and two standard approaches. The results demonstrate that our comprehensive search approach surpasses all other methods in terms of identification accuracy. In particular, the second flavor of our approach establishes an efficient overfitting penalty solely based on R2R^2, which achieves highest rates of exact equation recovery.Comment: 12 pages main text and 11 pages appendix, Published in TMLR (https://openreview.net/forum?id=0ck7hJ8EVC

    On construction and analysis of sparse matrices and expander graphs with applications to CS

    No full text
    Publication in the conference proceedings of SampTA, Bremen, Germany, 201

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods
    corecore