1,721,105 research outputs found
Data analytics on the board game Go for the discovery of interesting sequences of moves in joseki
Data analytics on the board game Go for the discovery of interesting sequences of moves in josek
Knowledge Discovery from Social Graph Data
AbstractHigh volumes of a wide variety of valuable data can be easily collected and generated from a broad range of data sources of different veracities at a high velocity. In the current era of big data, many traditional data management and analytic approaches may not be suitable for handling the big data due to their well-known 5V's characteristics. Over the past few years, several systems and applications have developed to use cluster, cloud or grid computing to manage and analyze big data so as to support data science (e.g., knowledge discovery and data mining). In this paper, we present a knowledge-based system for social network analysis so as to support big data mining of interesting patterns from big social networks that are represented as graphs
Computing theoretically-sound upper bounds to expected support for frequent pattern mining problems over uncertain big data
Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, where each item in a transaction is usually associated with an existential probability expressing the likelihood of its presence in that transaction, the UF-growth algorithm captures important information about uncertain data in a UF-tree structure so that expected support can be computed for each pattern. A pattern is considered frequent if its expected support meets or exceeds the user-specified threshold. However, a challenge is that the UF-tree can be large. To handle this challenge, several algorithms use smaller trees such that upper bounds to expected support can be computed. In this paper, we examine these upper bounds, and determine which ones provide tighter upper bounds to expected support for frequent pattern mining of uncertain big data
Item-centric mining of frequent patterns from big uncertain data
Item-centric mining of frequent patterns from big uncertain dat
Frequent subgraph mining from streams of uncertain data
In the current era of Big data, high volumes of high-value data---such as social network data---can be generated at a high velocity. The quality and accuracy of these data depend on their veracity: uncertainty of the data. A collection of these uncertain data can be viewed as a big, interlinked, dynamic graph structure. Embedded in these big data are implicit, previously unknown, and potentially useful knowledge. Hence, efficient and effective knowledge discovery algorithms for mining frequent subgraphs from these dynamic streaming graph structured data are in demand. Most of the existing algorithms mine frequent subgraph from streams of precise data. However, there are many real-life scientific and engineering applications, in which data are uncertain. Hence, in this paper, we propose algorithms that use limited memory space for mining frequent subgraphs from streams of uncertain data. Evaluation results show the effectiveness of our algorithms in mining frequent subgraphs from streams of uncertain data
Approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data
Knowledge discovery and data mining generally discovers implicit, previously unknown, and useful knowledge from data. As one of the popular knowledge discovery and data mining tasks, frequent itemset mining, in particular, discovers knowledge in the form of sets of frequently co-occurring items, events, or objects. On the one hand, in many real-life applications, users mine frequent patterns from traditional databases of precise data, in which users know certainly the presence of items in transactions. On the other hand, in many other real-life applications, users mine frequent itemsets from probabilistic sets of uncertain data, in which users are uncertain about the likelihood of the presence of items in transactions. Each item in these probabilistic sets of uncertain data is often associated with an existential probability expressing the likelihood of its presence in that transaction. To mine frequent itemsets from these probabilistic datasets, many existing algorithms capture lots of information to compute expected support. To reduce the amount of space required, algorithms capture some but not all information in computing or approximating expected support. The tradeoff is that the upper bounds to expected support may not be tight. In this paper, we examine several upper bounds and recommend to the user which ones consume less space while providing good approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data
Frequent subgraph mining from streams of linked graph structured data
Nowadays, high volumes of high-value data (e.g., semantic web data) can be generated and published at a high velocity. A collection of these data can be viewed as a big, interlinked, dynamic graph structure of linked resources. Embedded in them are implicit, previously unknown, and potentially useful knowledge. Hence, ecient knowledge discovery algorithms for mining frequent subgraphs from these dynamic, streaming graph structured data are in demand. Some existing algorithms require very large memory space to discover frequent subgraphs; some others discover collections of frequently co-occurring edges (which may be disjoint). In contrast, we propose|in this paper|algorithms that use limited memory space for discovering collections of frequently co-occurring connected edges. Evaluation results show the effectiveness of our algorithms in frequent subgraph mining from streams of linked graph structured data
Edge-based mining of frequent subgraphs from graph streams
In the current era of Big data, high volumes of valuable data can be generated at a high velocity from high-varieties of data sources in various real-life applications ranging from sensor networks to social networks, from bio-informatics to chemical informatics. In addition, Big data are also available in business, education, engineering, finance, healthcare, scientific, telecommunication, and transportation domains. A collection of these data can be viewed as a big dynamic graph structure. Embedded in them are implicit, previously unknown, and potentially useful knowledge. Consequently, efficient knowledge discovery algorithms for mining frequent subgraphs from these dynamic streaming graph structured data are in demand. On the one hand, some existing algorithms discover collections of frequently co-occurring edges, which may be disjoint. On the other hand, some other existing algorithms discover frequent subgraphs by requiring very large memory space. With high volumes of Big data, available memory space may be limited. To discover collections of frequently co-occurring connected edges, we present in this paper two efficient algorithms that require small memory space. Evaluation results show the efficiency of our edge-based algorithms in mining frequent subgraphs from graph streams
Frequent pattern mining from dense graph streams
As technology advances, streams of data can be produced in many applications such as social networks, sensor networks, bioinformatics, and chemical informatics. These kinds of streaming data share a property in common.namely, they can be modeled in terms of graph-structured data. Here, the data streams generated by graph data sources in these applications are graph streams. To extract implicit, previously unknown, and potentially useful frequent patterns from these streams, efficient data mining algorithms are in demand. Many existing algorithms capture important streaming data and assume that the captured data can fit into main memory. However, problems arise when such an assumption does not hold (e.g., when the available memory is limited). In this paper, we propose a data structure called DSMatrix for capturing important data from the streams.especially, dense graph streams.onto the disk when the memory space is limited. In addition, we also propose two stream mining algorithms that use DSMatrix to mine frequent patterns. The tree-based horizontal mining algorithm applies an e.ective frequency counting approach to avoid recursive construction of sub-trees as in many tree-based mining. The vertical mining algorithm makes good use of the information captured in the DSMatrix for mining
- …
