1,720,973 research outputs found
Operative Assessment of Predicted Generalization Errors on Non-Stationary Distributions in Data-Intensive Applications
Data-intensive applications use empirical methods to extract consistent information from huge samples. When applied to classification tasks, their aim is to optimize accuracy on unseen data hence a reliable prediction of the generalization error is of paramount importance. Theoretical models, such as Statistical Learning Theory, and empirical estimations, such as cross-validation, can both fit data-mining classification domains very well, provided some crucial assumptions are verified in advance. In particular, the stationary distribution of the observed data is critical, although it is sometimes overlooked in practice. The paper formulates an operative criterion to verify the stationary assumption; the method applies to both theoretical and practical predictions of generalization errors. The analysis addresses the specific case of clustering-based classifiers; the K-Winner Machine (KWM) model is used as a reference for its known theoretical bounds; cross-validation provides an empirical counterpart for practical comparison. The criterion, based on efficient unsupervised clustering-based probability distribution estimation, is tested experimentally on a set of different, data-intensive applications, including: intrusion detection for computer-network security, optical character recognition, text mining and pedestrian detection. Experimental results confirm the effectiveness of the proposed approach to efficiently detect non stationarity
Efficient Digital Implementation of Extreme Learning Machines for Classification
The availability of compact fast circuitry for the support of artificial neural systems is a long-standing and critical requirement for many important applications. This brief addresses the implementation of the powerful extreme learning machine (ELM) model on reconfigurable digital hardware (HW). The design strategy first provides a training procedure for ELMs, which effectively trades off prediction accuracy and network complexity. This, in turn, facilitates the optimization of HW resources. Finally, this brief describes and analyzes two implementation approaches: one involving field-programmable gate array devices and one embedding low-cost low-performance devices such as complex programmable logic devices. Experimental results show that, in both cases, the design approach yields efficient digital architectures with satisfactory performances and limited costs
Efficient approximate Regularized Least Squares by Toeplitz matrix
Machine Learning based on the Regularized Least Squares (RLS) model requires one to solve a system of linear equations. Direct-solution methods exhibit predictable complexity and storage, but often prove impractical for large-scale problems; iterative methods attain approximate solutions at lower complexities, but heavily depend on learning parameters. The paper shows that applying the properties of Toeplitz matrixes to RLS yields two benefits: first, both the computational cost and the memory space required to train an RLS-based machine reduce dramatically; secondly, timing and storage requirements are defined analytically. The paper proves this result formally for the one-dimensional case, and gives an analytical criterion for an effective approximation in multidimensional domains. The approach validity is demonstrated in several real-world problems involving huge data sets with highly dimensional dat
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Tactile Sensing Data Classification by Computational Intelligence
The two major components of a robotic tactile sensing system are the tactile sensing hardware at the lower level, and the computational/software tools at the higher level. Focusing on the later, this research assesses the suitability of Computational Intelligence tools for tactile data processing. In this context, this paper addresses the classification of sensed object material from the recorded tactile data. For this purpose, three computational intelligence paradigms, namely, Support Vector Machine (SVM), Regularized Least Square (RLS) and Regularized Extreme Learning Machine (RELM) have been employed and their performance compared for the said task. The comparative analysis shows that SVM provides the best trade-off between classification accuracy and computational complexity of the classification algorithm. Experimental results indicate that the Computational Intelligence tools are effective in dealing with the challenging problem of material classification
Between algorithm and model: different Molecular Surface definitions for the Poisson-Boltzmann based electrostatic characterization of biomolecules in solution.
- …
