1,720,973 research outputs found
Mining string data under similary and soft-frequency constraints :application to promoter sequence analysis
Nous étudions l\u27extraction de motifs sous contraintes dans des collections de chaînes de caractères et le développement de solveurs complets et génériques pour l\u27extraction de tous les motifs satisfaisant une combinaison de contraintes primitives. Un solveur comme FAVST permet d\u27optimiser des conjonctions de contraintes dites monotones et/ou anti-monotones (e.g., des contraintes de fréquence maximale et minimale). Nous avons voulu compléter ce type d\u27outil en taitant des contraintes pour la découverte de motifs tolérants aux exceptions. Nous proposons différentes définitions des occurrences approchées et l\u27exploitation de contraintes de fréquence approximative. Ceci nous conduit à spécifier des contraintes difficiles (e.g., pour l\u27expression de la similarité) comme des conjonctions de primitives monotones et anti-monotones optimisées par notre solveur MARGUERITE. Soucieux de sa mise en uvre dans des processus de découverte de connaissances à partir de données, nous avons analysé le réglage des paramètres d\u27extraction (e.g., quel seuil choisir pour les fréquences). Nous proposons une méthode originale pour estimer le nombre de motifs qui satisfont une contrainte au moyen d\u27un échantillonnage de l\u27espace des motifs. Nous avons également étudié l\u27identification des paramètres les plus stringents pour fournir des motifs qui ne sont probablement pas de faux positifs. Ces contributions ont été appliquées à l\u27analyse des séquences promotrices des gènes. En étroite collaboration avec une équipe de biologistes du CGMC, nous avons pu identifier des sites de fixation putatifs de facteurs transcription impliqués dans le processus de différenciation cellulaire
Fouille de chaînes de caractères sous contraintes de similarité et de fréquence approximative: application à l'analyse de séquences promotrices
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i.e., patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising results.Nous étudions l'extraction de motifs sous contraintes dans des collections de chaînes de caractères et le développement de solveurs complets et génériques pour l'extraction de tous les motifs satisfaisant une combinaison de contraintes primitives. Un solveur comme FAVST permet d'optimiser des conjonctions de contraintes dites monotones et/ou anti-monotones (e.g., des contraintes de fréquence maximale et minimale). Nous avons voulu compléter ce type d'outil en taitant des contraintes pour la découverte de motifs tolérants aux exceptions. Nous proposons différentes définitions des occurrences approchées et l'exploitation de contraintes de fréquence approximative. Ceci nous conduit à spécifier des contraintes difficiles (e.g., pour l'expression de la similarité) comme des conjonctions de primitives monotones et anti-monotones optimisées par notre solveur MARGUERITE. Soucieux de sa mise en œuvre dans des processus de découverte de connaissances à partir de données, nous avons analysé le réglage des paramètres d'extraction (e.g., quel seuil choisir pour les fréquences). Nous proposons une méthode originale pour estimer le nombre de motifs qui satisfont une contrainte au moyen d'un échantillonnage de l'espace des motifs. Nous avons également étudié l'identification des paramètres les plus stringents pour fournir des motifs qui ne sont probablement pas de faux positifs. Ces contributions ont été appliquées à l'analyse des séquences promotrices des gènes. En étroite collaboration avec une équipe de biologistes du CGMC, nous avons pu identifier des sites de fixation putatifs de facteurs transcription impliqués dans le processus de différenciation cellulaire
Fouille de chaînes de caractères sous contraintes de similarité et de fréquence approximative: application à l'analyse de séquences promotrices
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i.e., patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising results.Nous étudions l'extraction de motifs sous contraintes dans des collections de chaînes de caractères et le développement de solveurs complets et génériques pour l'extraction de tous les motifs satisfaisant une combinaison de contraintes primitives. Un solveur comme FAVST permet d'optimiser des conjonctions de contraintes dites monotones et/ou anti-monotones (e.g., des contraintes de fréquence maximale et minimale). Nous avons voulu compléter ce type d'outil en taitant des contraintes pour la découverte de motifs tolérants aux exceptions. Nous proposons différentes définitions des occurrences approchées et l'exploitation de contraintes de fréquence approximative. Ceci nous conduit à spécifier des contraintes difficiles (e.g., pour l'expression de la similarité) comme des conjonctions de primitives monotones et anti-monotones optimisées par notre solveur MARGUERITE. Soucieux de sa mise en œuvre dans des processus de découverte de connaissances à partir de données, nous avons analysé le réglage des paramètres d'extraction (e.g., quel seuil choisir pour les fréquences). Nous proposons une méthode originale pour estimer le nombre de motifs qui satisfont une contrainte au moyen d'un échantillonnage de l'espace des motifs. Nous avons également étudié l'identification des paramètres les plus stringents pour fournir des motifs qui ne sont probablement pas de faux positifs. Ces contributions ont été appliquées à l'analyse des séquences promotrices des gènes. En étroite collaboration avec une équipe de biologistes du CGMC, nous avons pu identifier des sites de fixation putatifs de facteurs transcription impliqués dans le processus de différenciation cellulaire
Fouille de chaînes de caractères sous contraintes de similarité et de fréquence approximative: application à l'analyse de séquences promotrices
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i.e., patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising results.Nous étudions l'extraction de motifs sous contraintes dans des collections de chaînes de caractères et le développement de solveurs complets et génériques pour l'extraction de tous les motifs satisfaisant une combinaison de contraintes primitives. Un solveur comme FAVST permet d'optimiser des conjonctions de contraintes dites monotones et/ou anti-monotones (e.g., des contraintes de fréquence maximale et minimale). Nous avons voulu compléter ce type d'outil en taitant des contraintes pour la découverte de motifs tolérants aux exceptions. Nous proposons différentes définitions des occurrences approchées et l'exploitation de contraintes de fréquence approximative. Ceci nous conduit à spécifier des contraintes difficiles (e.g., pour l'expression de la similarité) comme des conjonctions de primitives monotones et anti-monotones optimisées par notre solveur MARGUERITE. Soucieux de sa mise en œuvre dans des processus de découverte de connaissances à partir de données, nous avons analysé le réglage des paramètres d'extraction (e.g., quel seuil choisir pour les fréquences). Nous proposons une méthode originale pour estimer le nombre de motifs qui satisfont une contrainte au moyen d'un échantillonnage de l'espace des motifs. Nous avons également étudié l'identification des paramètres les plus stringents pour fournir des motifs qui ne sont probablement pas de faux positifs. Ces contributions ont été appliquées à l'analyse des séquences promotrices des gènes. En étroite collaboration avec une équipe de biologistes du CGMC, nous avons pu identifier des sites de fixation putatifs de facteurs transcription impliqués dans le processus de différenciation cellulaire
Fouille de chaînes de caractères sous contraintes de similarité et de fréquence approximative: application à l'analyse de séquences promotrices
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i.e., patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising results.Nous étudions l'extraction de motifs sous contraintes dans des collections de chaînes de caractères et le développement de solveurs complets et génériques pour l'extraction de tous les motifs satisfaisant une combinaison de contraintes primitives. Un solveur comme FAVST permet d'optimiser des conjonctions de contraintes dites monotones et/ou anti-monotones (e.g., des contraintes de fréquence maximale et minimale). Nous avons voulu compléter ce type d'outil en taitant des contraintes pour la découverte de motifs tolérants aux exceptions. Nous proposons différentes définitions des occurrences approchées et l'exploitation de contraintes de fréquence approximative. Ceci nous conduit à spécifier des contraintes difficiles (e.g., pour l'expression de la similarité) comme des conjonctions de primitives monotones et anti-monotones optimisées par notre solveur MARGUERITE. Soucieux de sa mise en œuvre dans des processus de découverte de connaissances à partir de données, nous avons analysé le réglage des paramètres d'extraction (e.g., quel seuil choisir pour les fréquences). Nous proposons une méthode originale pour estimer le nombre de motifs qui satisfont une contrainte au moyen d'un échantillonnage de l'espace des motifs. Nous avons également étudié l'identification des paramètres les plus stringents pour fournir des motifs qui ne sont probablement pas de faux positifs. Ces contributions ont été appliquées à l'analyse des séquences promotrices des gènes. En étroite collaboration avec une équipe de biologistes du CGMC, nous avons pu identifier des sites de fixation putatifs de facteurs transcription impliqués dans le processus de différenciation cellulaire
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Mining string data under similarity and soft-frequency constraints (application to promoter sequence analysis)
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of pattern (i.e., patterns from different pattern languages). An instance of such a generic solver exists that is capable of mining string patterns from string data sets. In our main application, promoter sequence analysis, there is a requirement to handle fault-tolerance, as the data intrinsically contains errors, and the phenomenon we are trying to capture is fundamentally degenerate. Our research contribution to fault-tolerant pattern extraction in string data sets is the use of a generic solver, based on a non-trivial formalisation of fault-tolerant pattern extraction as a constraint-based mining task. We identified the stages in the process of the extraction of such patterns where state-of-art strategies can be applied to prune the search space. We then developed a fault-tolerant pattern match function InsDels that generic constraint solving strategies can soundly tackle. We also focused on making local patterns actionable. The bottleneck of most local pattern extraction methods is the burden of spurious patterns. As the analysis of patterns by the application domain experts is time consuming, we cannot afford to present patterns without any objective clue about their relevancy. Therefore we have developed two methods of computing the expected number of patterns extracted in random data sets. If the number of extracted patterns is strongly different from the expected number from random data sets, one can then state that the results exhibits local associations that are a priori relevant because they are unexpected. Among others applications, we have applied our approach to support the discovery of new motifs in gene promoter sequences with promising resultsVILLEURBANNE-DOC'INSA LYON (692662301) / SudocSudocFranceF
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
About softness for inductive querying on sequence databases
International audienceIn many application domains (e.g., WWW usage mining, telecommunication data analysis, molecular biology), large sequence databases are available and yet under-exploited. The inductive database framework assumes that both such databases and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the main topics in database mining research. Indeed, constraint-based mining techniques on sequence databases have been studied extensively the last few years and efficient algorithms enable to compute complete collections of patterns (e.g., sequences) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in potentially large sequence databases (e.g., minimal and maximal frequency constraints). Studying new applications of these techniques, we consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysts. In this paper, we address some of the open problems when computing soft occurrences of patterns within database sequences instead of the classical exact matching ones. Such an extension is not trivial since it prevents the clever use of monotonicity for pruning the search space. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approac
- …
