1,720,992 research outputs found
Recent advances in mining patterns from complex data
Data mining and knowledge discovery are advanced research fields with numerous algorithms and studies to extract patterns and models from complex data sources like blogs, event or log data, biological data, spatio-temporal data, social networks, mobility data, and sensor data and streams. The works presented in this special issue of the Journal of Intelligent Information Systems should keep the attention of both researchers and practitioners of data mining who are interested in the advances and latest developments in the area of extracting patterns. Behavioral Process Mining for Unstructured Processes by Claudia Diamantini, Laura Genga and Domenico Potena addresses the challenging problem of extracting useful information from the huge volume of events recorded by several of today's enterprise systems
Periodicity Detection of Emotional Communities in Microblogging
Social media allow users convey emotions, which are often related to real-world events, social relationships or personal experiences. Indeed, emotions can determine the propension of the users to socialize or attend events. Similarly, interactions with people can influence the personality and feelings of the individuals. Therefore, studying emotional content generated by the users can reveal information on the behavior of users or collectives of users. However, such an information is related only to a specific moment when the emotions are sporadic or episodic, therefore they could have little usefulness. On the contrary, it can have greater significance tracing emotions over time and understanding whether they may appear with regularity or whether they are associated to behaviors already observed in past and could recur. In this paper, we focus on the periodicity with which emotional words appear in the micro-blogs as indication of a collective emotional behavior expressed with regularity. We propose a computational solution that builds a cyberspace based on the emotional content produced by the users and determines communities of users who express with periodicity similar emotional behaviors. We show the viability of the method on the data of the social media platform Twitter and provide a quantitative evaluation and qualitative considerations
Preface of selection for 8th International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2019
This book constitutes the refereed post-conference proceedings of the 8th International Workshop on New Frontiers in Mining Complex Patterns, NFMCP 2019, held in conjunction with ECML-PKDD 2019 in Würzburg, Germany, in September 2019.
The workshop focused on the latest developments in the analysis of complex and massive data sources, such as blogs, event or log data, medical data, spatio-temporal data, social networks, mobility data, sensor data and streams
Condensed representations of changes in dynamic graphs through emerging subgraph mining
Change mining is one of the main subjects of analysis on time-evolving data. Regardless of the distribution of the changes over the data, often the algorithms return very large sets of results. In fact, one class of algorithms designed for change mining is based on pattern mining, which notoriously suffers from the problem of a huge number of returned patterns. Moreover, the complexity of some types of data, like dynamic graphs, could make the size of the final changes even larger, which makes interpretation difficult or even impossible. This paper represents the first attempt, to our knowledge, to build condensed representations of changes from dynamic graphs. We study changes captured with the pattern (subgraph) mining framework and focus on the discovery of subgraphs able to (i) represent evident changes and (ii) convey graph-based information that is not already expressed by other subgraphs. To do this, we revise an existing approach by introducing the notion of emerging subgraphs, used to remove uninteresting changes and the notions of closed and maximal subgraphs, used to remove redundant changes. Experiments performed on real-world dynamic graphs show that the condensed representations maintain the accuracy levels of the original approach and often offer a loss-less representation of the detected changes
Simultaneous Process Drift Detection and Characterization with Pattern-Based Change Detectors
Traditional process mining approaches learn process models assuming that processes are in steady-state. This does not comply with the flexibility and adaptation often requested for information systems and business models. In fact, these approaches should discover variations to adapt to new circumstances, which is a peculiarity that conventional change analysis based on time-series, could not provide, because the processes are complex artifacts. This problem can be handled with change-aware structured representations, such as those typically used for network data. In this paper, we propose a novel pattern-based change detection (PBCD) algorithm for discovering and characterizing changes in event logs encoded as dynamic networks. In particular, PBCDs are unsupervised change detection methods, based on observed changes in sets of patterns observed over time, which are able to simultaneously detect and characterize changes in evolving data. Experimental results, on both real and synthetic data, show the usefulness and the increased accuracy with respect to state-of-the-art solutions
Mining microscopic and macroscopic changes in network data streams (discussion paper)
Network data streams offer an abstraction of complex systems from the real-world, which can be seen as producers of unbounded sequences of complex data generated at high speed. Many complex systems evolve according to stochastic processes which remain unknown to the interested users. As a consequence, changes happen in an unpredictable manner and may involve various portions of the observed complex systems. In this scenario, an interesting problem concerns the identiffcation and characterization of the changes that may concern both the whole structure of a complex system and small parts of it. We conjecture that the former can be explained by the latter and conversely, the latter can trigger the former. This type of problem requires a quite holistic strategy that traditional approaches often do not carry out because they focus on either the whole network or on some portions only. In this discussion paper, we describe a descriptive data mining approach based on frequent pattern discovery that we designed for recent research work. It combines frequent pattern with automatic time-window setting, in order to identify and characterize macroscopic changes and microscopic changes as changes that have an impact on a substantial part of the network or on specific portions, respectively. We provide arguments of the viability to real-world applications through two case studies, more precisely, telecommunication networks and geo-sensor networks
Finding generalized closed frequent itemsets for mining non redundant association rules
Generalized association rules are a very important extension of traditional association rules which allows to exploit taxonomical knowledge defined over items to be mined. However, by using a taxonomy several thousands of rules are discovered and the most of them can be redundant. In this paper, we propose a solution to the problem of mining non redundant generalized association rules by resorting to the closed itemset framework and to the concept of minimal non-redundant rules. We define a formal framework and design an algorithm which solves the problem of mining generalized closed frequent itemsets. Generation of non-redundant generalized rules from the set of generalized closed frequent itemsets is considered as well. The proposed framework has been applied to biomedical textual data analysis. Experimental results are reported and conclusions are drawn
jKarma: A highly-modular framework for pattern-based change detection on evolving data
Pattern-based change detection (PBCD) describes a class of change detection algorithms for evolving data. Contrary to conventional solutions, PBCD seeks changes exhibited by the patterns over time and therefore works on an abstract form of the data, which prevents the search for changes on the raw data. Moreover, PBCD provides arguments on the validity of the results because patterns mirror changes occurred with any form of evidence. However, the existing solutions differ on data representation, pattern mining algorithm and change identification strategy, which we can deem as main modules of a general architecture, so that any PBCD task could be designed by accommodating custom implementations for those modules. This is what we propose in this paper through jKarma, a highly-modular framework written in Java for defining and performing PBCD
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
