1,721,126 research outputs found
Soft Constraint Based Pattern Mining
The paradigm of pattern discovery based on constraints was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. So far the research on this paradigm has mainly focused on the latter aspect: the development of efficient algorithms for the evaluation of constraint-based mining queries. Due to the lack of research on methodological issues, the constraint-based pattern mining framework still suffers from many problems which limit its practical relevance. In this paper, we analyze such limitations and we show how they flow out from the same source: the fact that in the classical constraint-based mining, a constraint is a rigid boolean function which returns either true or false. Indeed, interestingness is not a dichotomy. Following this consideration, we introduce the new paradigm of pattern discovery based on Soft Constraints, where constraints are no longer rigid boolean functions.
Albeit based on a simple idea, our proposal has many merits: it provides a rigorous theoretical framework, which is very general (having the classical paradigm as a particular instance), and which overcomes all the major methodological drawbacks of the classical constraint-based paradigm, representing an important step further towards practical pattern discovery
Using metadata for locating genomic datasets on a global scale
Genomic research benefitted from recent extraordinary improvements in DNA sequencing techniques, leading to the production of enormous amounts of datasets that store information such as nucleotide sequences, gene locations/levels of expression, proteins-DNA interactions. As this has now become a big data matter, characterized by an underlying disorganization, there is a strong need for integrative solutions. In this paper, we devote our efforts to the management of genomic data, to be organized and located using experimental studies descriptions. Such documentation, also referred to as metadata, contains fundamental information to understand the content of experimental samples (namely, how the biological material was extracted and processed, in which clinical conditions, with which techniques.) We propose a novel framework to manage metadata of genomic datasets, offering a unified view with respect to a number of heterogeneous data sources (usually big international consortia, but also small research centers) that currently display their metadata in disorganized and very cumbersome formats. The final outcome of this work is a search platform which allows easy location of relevant sources for specific genomic data analysis problems
- …
