1,721,523 research outputs found

    Inductive queries for a drug designing robot scientist

    No full text
    It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

    Constraint Based Mining of First Order Sequences in SeqLog (Extended Abstract)

    No full text
    Sau Dan Lee and Luc De Raedt Institut fur Informatik Albert-Ludwigs-Universitat Freiburg Germany {danlee,deraedt}@informatik.uni-freiburg.de Abstract. A logical language, SeqLog, for mining and querying sequential data and databases is presented. In SeqLog, data takes the form of a sequence of logical atoms, background knowledge can be specified using DataLog style clauses and sequential queries or patterns correspond to subsequences of logical atoms

    Relational random forests based on random relational rules

    No full text
    Random Forests have been shown to perform very well in propositional learning. FORF is an upgrade of Random Forests for relational data. In this paper we investigate shortcomings of FORF and propose an alternative algorithm, R⁴F, for generating Random Forests over relational data. R⁴F employs randomly generated relational rules as fully self-contained Boolean tests inside each node in a tree and thus can be viewed as an instance of dynamic propositionalization. The implementation of R⁴F allows for the simultaneous or parallel growth of all the branches of all the trees in the ensemble in an efficient shared, but still single-threaded way. Experiments favorably compare R⁴F to both FORF and the combination of static propositionalization together with standard Random Forests. Various strategies for tree initialization and splitting of nodes, as well as resulting ensemble size, diversity, and computational complexity of R⁴F are also investigated

    Abstraction Refinement Guided by a Learnt Probabilistic Model

    Full text link
    The core challenge in designing an effective static program analysis is to find a good program abstraction -- one that retains only details relevant to a given query. In this paper, we present a new approach for automatically finding such an abstraction. Our approach uses a pessimistic strategy, which can optionally use guidance from a probabilistic model. Our approach applies to parametric static analyses implemented in Datalog, and is based on counterexample-guided abstraction refinement. For each untried abstraction, our probabilistic model provides a probability of success, while the size of the abstraction provides an estimate of its cost in terms of analysis time. Combining these two metrics, probability and cost, our refinement algorithm picks an optimal abstraction. Our probabilistic model is a variant of the Erdos-Renyi random graph model, and it is tunable by what we call hyperparameters. We present a method to learn good values for these hyperparameters, by observing past runs of the analysis on an existing codebase. We evaluate our approach on an object sensitive pointer analysis for Java programs, with two client analyses (PolySite and Downcast)

    Kernels on Prolog Proof Trees: Statistical Learning in the ILP Setting

    Full text link
    We develop kernels for measuring the similarity between relational instances using background knowledge expressed in first-order logic. The method allows us to bridge the gap between traditional inductive logic programming (ILP) representations and statistical approaches to supervised learning. Logic programs are first used to generate proofs of given visitor programs that use predicates declared in the available background knowledge. A kernel is then defined over pairs of proof trees. The method can be used for supervised learning tasks and is suitable for classification as well as regression. We report positive empirical results on Bongard-like and M-of-N problems that are difficult or impossible to solve with traditional ILP techniques, as well as on real bioinformatics and chemoinformatics data sets

    Semiring programming: A semantic framework for generalized sum product problems

    Full text link
    sponsorship: Vaishak Belle was supported by a Royal Society University Research Fellowship. Luc De Raedt was supported by the European Research Council (ERC) Advanced Grant 694980 "SYNTH: Synthesising Inductive Data Models" and the Research Foundation Flanders. (Royal Society University Research Fellowship, European Research Council (ERC)|694980, Research Foundation Flanders, European Research Council (ERC)|694980)status: Publishe

    The twokey plot for multiple association rules control

    No full text
    The twokey plot for multiple association rules control / A. R. Unwin, H. Hofmann, K. Bernt. - In: Principles of data mining and knowledge discovery / Luc de Raedt ... (ed.). - Berlin u.a. : Springer, 2001. - S. 472-483. - (Lecture notes in computer science ; 2168 : Lecture notes in artificial intelligence

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore