1,720,986 research outputs found
Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices
1 online resource (PDF, 38 pages, includes illustrations)Bah, Bubacarr; Tanner, Jared. (2011). Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/181163
Restricted isometry constants in compressed sensing
Compressed Sensing (CS) is a framework where we measure data through a non-adaptive linear
mapping with far fewer measurements that the ambient dimension of the data. This is made
possible by the exploitation of the inherent structure (simplicity) in the data being measured.
The central issues in this framework is the design and analysis of the measurement operator
(matrix) and recovery algorithms. Restricted isometry constants (RIC) of the measurement
matrix are the most widely used tool for the analysis of CS recovery algorithms. The addition
of the subscripts 1 and 2 below reflects the two RIC variants developed in the CS literature,
they refer to the ℓ1-norm and ℓ2-norm respectively.
The RIC2 of a matrix A measures how close to an isometry is the action of A on vectors with
few nonzero entries, measured in the ℓ2-norm. This, and related quantities, provide a mechanism
by which standard eigen-analysis can be applied to topics relying on sparsity. Specifically,
the upper and lower RIC2 of a matrix A of size n × N is the maximum and the minimum
deviation from unity (one) of the largest and smallest, respectively, square of singular values of
all (N/k)matrices formed by taking k columns from A. Calculation of the RIC2 is intractable for
most matrices due to its combinatorial nature; however, many random matrices typically have
bounded RIC2 in some range of problem sizes (k, n,N). We provide the best known bound
on the RIC2 for Gaussian matrices, which is also the smallest known bound on the RIC2 for
any large rectangular matrix. Our results are built on the prior bounds of Blanchard, Cartis,
and Tanner in Compressed Sensing: How sharp is the Restricted Isometry Property?, with
improvements achieved by grouping submatrices that share a substantial number of columns.
RIC2 bounds have been presented for a variety of random matrices, matrix dimensions and
sparsity ranges. We provide explicit formulae for RIC2 bounds, of n × N Gaussian matrices
with sparsity k, in three settings: a) n/N fixed and k/n approaching zero, b) k/n fixed and
n/N approaching zero, and c) n/N approaching zero with k/n decaying inverse logarithmically
in N/n; in these three settings the RICs a) decay to zero, b) become unbounded (or approach
inherent bounds), and c) approach a non-zero constant. Implications of these results for RIC2
based analysis of CS algorithms are presented.
The RIC2 of sparse mean zero random matrices can be bounded by using concentration
bounds of Gaussian matrices. However, this RIC2 approach does not capture the benefits of
the sparse matrices, and in so doing gives pessimistic bounds. RIC1 is a variant of RIC2 where
the nearness to an isometry is measured in the ℓ1-norm, which is both able to better capture
the structure of sparse matrices and allows for the analysis of non-mean zero matrices.
We consider a probabilistic construction of sparse random matrices where each column has
a fixed number of non-zeros whose row indices are drawn uniformly at random. These matrices
have a one-to-one correspondence with the adjacency matrices of fixed left degree expander
graphs. We present formulae for the expected cardinality of the set of neighbours for these
graphs, and present a tail bound on the probability that this cardinality will be less than the
expected value. Deducible from this bound is a similar bound for the expansion of the graph
which is of interest in many applications. These bounds are derived through a more detailed
analysis of collisions in unions of sets using a dyadic splitting technique. This bound allows
for quantitative sampling theorems on existence of expander graphs and the sparse random
matrices we consider and also quantitative CS sampling theorems when using sparse non mean-zero
measurement matrices
Using Neural Networks to identify Individual Animals from Photographs
Effective management needs to know sizes of animal populations. This can be accomplished in various ways, but a very popular way is mark-recapture studies. Mark-recapture studies need a way of telling if a captured animal has been previously seen. For traditional mark-recapture, this is achieved by applying a tag to the animal. For non-invasive mark-recapture methods which exploit photographs, there is no tag on the animal’s body. As a result, these methods require animals to be individually identifiable. They assess if an animal has been caught before by examining photographs for animals which have individual-specific marks (Cross et al., 2014; Gomez et al., 2016; Beijbom et al., 2016; Körschens, Barz, and Denzler, 2018). This study develops a model which can reliably match photographs of the same individual based on individual-specific marks. The model consists of two main parts, an object detection model, and a classifier which takes two photos as input and outputs a predicted probability that the pair is from the same individual (a match). The object detection model is a convolutional neural network (CNN) and the matching classifier is a special kind of CNN called a siamese network. The siamese network uses a pair of CNNs that share weights to summarise the images, followed by some dense layers which combine the summaries into measures of similarity which can be used to predict a match. The model is tested on two case studies, humpback whales (HBWs) and western leopard toads (WLTs). The HBW dataset consists of images originally collected by various institutions across the globe and uploaded to the Happywhale platform which encourages scientists to identify individual mammals. HBWs can be identified by their fins and specials markings. There is lots of data for this problem. The WLT dataset consists of images collected by citizen scientists in South Africa. They were either uploaded to iSpot, a citizen science project which collects images or sent to the (WLT) project, a conservation project staffed by volunteers. WLTs can be identified by their unique spots. There is a little data for this problem. One part of this dataset consists of labelled individuals and another part is unlabelled. The model was able to give good results for both HBWs and WLTs. In 95% of the cases the model managed to correctly identify if a pair of images is from the same HBW individual or not. It accurately identified if a pair of images is drawn from the same WLT individual or not in 87% of the cases. This study also assessed the effectiveness of the semi-supervised approach on the WLT unlabelled dataset. In this study, the semisupervised approach has been partially successful. The model was able to identify new individuals and matches which were not identified before, but they were relatively few in numbers. Without an exhaustive check of the data, it is not clear whether this is due to the failure of the semi-supervised approach, or because there are not many matches in the data. After adding the newly identified and labelled individuals to the WLT labelled dataset, the model slightly improved its performance and correctly identified 89% of WLT pairs. A number of computer-aided photo-matching algorithms have been proposed (Matthé et al., 2017). This study also assessed the performance of Wild-ID (Bolger et al., 2012), one of the commonly used photo-matching algorithm on both HBW and WLT datasets. The model developed in this thesis achieved very competitive results compared with Wild-ID. Model accuracies for the proposed siamese network were much higher than those returned by Wild-ID on the HBW dataset, and roughly the same on the WLT dataset
Improved identification accuracy in equation learning via comprehensive -elimination and Bayesian model selection
In the field of equation learning, exhaustively considering all possible
equations derived from a basis function dictionary is infeasible. Sparse
regression and greedy algorithms have emerged as popular approaches to tackle
this challenge. However, the presence of multicollinearity poses difficulties
for sparse regression techniques, and greedy steps may inadvertently exclude
terms of the true equation, leading to reduced identification accuracy. In this
article, we present an approach that strikes a balance between
comprehensiveness and efficiency in equation learning. Inspired by stepwise
regression, our approach combines the coefficient of determination, , and
the Bayesian model evidence, , in a novel way. Our
procedure is characterized by a comprehensive search with just a minor
reduction of the model space at each iteration step. With two flavors of our
approach and the adoption of for bi-directional
stepwise regression, we present a total of three new avenues for equation
learning. Through three extensive numerical experiments involving random
polynomials and dynamical systems, we compare our approach against four
state-of-the-art methods and two standard approaches. The results demonstrate
that our comprehensive search approach surpasses all other methods in terms of
identification accuracy. In particular, the second flavor of our approach
establishes an efficient overfitting penalty solely based on , which
achieves highest rates of exact equation recovery.Comment: 12 pages main text and 11 pages appendix, Published in TMLR
(https://openreview.net/forum?id=0ck7hJ8EVC
On construction and analysis of sparse matrices and expander graphs with applications to CS
Publication in the conference proceedings of SampTA, Bremen, Germany, 201
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
