1,721,062 research outputs found

    Neural network embeddings on corporate annual filings for portfolio selection

    Full text link
    In recent years, there has been an increased interest from both academics and practitioners in automatically analyzing the textual part of companies’ financial reports to extract meaning rich in information for future outcomes. In particular, tracking textual changes among companies’ reports can have a large and significant impact on stock prices. This impact happens with a lag implying that investors only gradually realize the implications of the news hinted by document changes. However, the length of these documents as well as their complexity in terms of structure and language have been increasing dramatically making this process more and more difficult to perform. In this paper, we analyzed how to face this complexity by learning arbitrary dimensional vector representations for US corporate filings (10-Ks) from 1998 to 2018, exploiting and comparing different neural network embedding techniques which take into account words’ semantics through vectors proximity. We also compared their ability to capture changes associated with future risk-adjusted abnormal returns with other more commonly used approaches in literature. Finally, we propose a novel investment strategy named Semantic Similarity Portfolio (SSP) that exploits these neural network embeddings. We show that firms that do not change their 10-Ks in a semantically important way from the previous year tend to have large and statistically significant future risk-adjusted abnormal returns. We, also document an amplifying effect when we incorporate a momentum-related criterion, where the companies selected must also have had positive previous year returns. Specifically, a portfolio that buys “non-changers” based on this strategy earns up to 10% in yearly risk-adjusted abnormal returns (alpha)

    GRASP with path relinking for the three-index assignement problem

    No full text
    This paper proposes and tests variants of GRASP (greedy randomized adaptive search procedure) with path relinking for the three-index assignment problem (AP3). GRASP is a multistart metaheuristic for combinatorial optimization. It usually consists of a construction procedure based on a greedy randomized algorithm and of a local search. Path relinking is an intensification strategy that explores trajectories that connect high-quality solutions. Several variants of the heuristic are proposed and tested. Computational results show clearly that this GRASP for AP3 benefits from path relinking and that the variants considered in this paper compare well with previously proposed heuristics for this problem. GRASP with path relinking was able to improve the solution quality of heuristics proposed by Balas and Saltzman (1991), Burkard et al. (1996), and Crama and Spieksma (1992) on all instances proposed in those papers. We show that the random variable “time to target solution,� for all proposed GRASP with path-relinking variants, fits a two-parameter exponential distribution. To illustrate the consequence of this, one of the variants of GRASP with path relinking is shown to benefit from parallelization

    CliSAT: A new exact algorithm for hard maximum clique problems

    Full text link
    Given a graph, the maximum clique problem (MCP) asks for determining a complete subgraph with the largest possible number of vertices. We propose a new exact algorithm, called CliSAT , to solve the MCP to proven optimality. This problem is of fundamental importance in graph theory and combinatorial optimization due to its practical relevance for a wide range of applications. The newly developed exact approach is a combinatorial branch-and-bound algorithm that exploits the state-of-the-art branching scheme enhanced by two new bounding techniques with the goal of reducing the branching tree. The first one is based on graph colouring procedures and partial maximum satisfiability problems arising in the branching scheme. The second one is a filtering phase based on constraint programming and domain propagation techniques. CliSAT is designed for structured MCP instances which are computationally difficult to solve since they are dense and contain many interconnected large cliques. Extensive experiments on hard benchmark instances, as well as new hard instances arising from different applications, show that CliSAT outperforms the state-of-the-art MCP algorithms, in some cases by several orders of magnitude

    A Multi-Head LSTM Architecture for Bankruptcy Prediction with Time Series Accounting Data

    Full text link
    With the recent advances in machine learning (ML), several models have been successfully applied to financial and accounting data to predict the likelihood of companies’ bankruptcy. However, time series have received little attention in the literature, with a lack of studies on the application of deep learning sequence models such as Recurrent Neural Networks (RNNs) and the recent Attention-based models in general. In this research work, we investigated the application of Long Short-Term Memory (LSTM) networks to exploit time series of accounting data for bankruptcy prediction. The main contributions of our work are the following: (a) We proposed a multi-head LSTM that models each financial variable in a time window independently and compared it with a single-input LSTM and other traditional ML models. The multi-head LSTM outperformed all the other models. (b) We identified the optimal time series length for bankruptcy prediction to be equal to 4 years of accounting data. (c) We made public the dataset we used for the experiments which includes data from 8262 different public companies in the American stock market generated in the period between 1999 and 2018. Furthermore, we proved the efficacy of the multi-head LSTM model in terms of fewer false positives and the better division of the two classes

    Supervised classification methods for mining cell differences as depicted by Raman spectroscopy

    No full text
    Discrimination of different cell types is very important in many medical and biological applications. Existing methodologies are based on cost inefficient technologies or tedious one-by-one empirical examination of the cells. Recently, Raman spectroscopy, a inexpensive and efficient method, has been employed for cell discrimination. Nevertheless, the traditional protocols for analyzing Raman spectra require preprocessing and peak fitting analysis which does not allow simultaneous examination of many spectra. In this paper we examine the applicability of supervised learning algorithms in the cell differentiation problem. Five different methods are presented and tested on two different datasets. Computational results show that machine learning algorithms can be employed in order to automate cell discrimination tasks. © 2011 Springer-Verlag Berlin Heidelberg

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore