1,720,969 research outputs found
A case study in Sequential Pattern Mining for IT-Operational Risk
IT-operational risk management consists of identifying, assessing, monitoring and mitigating the adverse risks of loss resulting from hardware and software system failures. We present a case study in IT-operational risk measurement in the context of a network of Private Branch eXchanges (PBXs). The approach relies on preprocessing and data mining tasks for the extraction of sequential patterns and their exploitation in the definition of a measure called expected risk
A kernel-based ensemble classifier for evolving stream of trees with double concept drifting reaction
Modern mining approaches should be able to properly deal with the increased availability of structured data. Here we focus on the problem of processing streams of trees. Specifically, we cope with classification tasks. We show that by adopting a double concept drifting reaction mechanism in the context of a kernel-based ensemble of classifiers, it is actually possible to have an effective and efficient system to process streams of trees. The original contribution consists into the introduction of a local concept drifting mechanism, specifically designed for structured data, and used to compute the ensemble score function in such a way to focus only on reliable (sub)trees belonging to the classification models which constitute the ensemble. Experimental results seem to support the relevance and usefulness of this local component for concept drifting management
Driving profiles computation and monitoring for car insurance CRM
Customer segmentation is one of the most traditional and valued tasks in customer relationship management (CRM). In this article, we explore the problem in the context of the car insurance industry, where the mobility behavior of customers plays a key role: Different mobility needs, driving habits, and skills imply also different requirements (level of coverage provided by the insurance) and risks (of accidents). In the present work, we describe a methodology to extract several indicators describing the driving profile of customers, and we provide a clustering-oriented instantiation of the segmentation problem based on such indicators. Then, we consider the availability of a continuous flow of fresh mobility data sent by the circulating vehicles, aiming at keeping our segments constantly up to date. We tackle a major scalability issue that emerges in this context when the number of customers is large-namely, the communication bottleneck-by proposing and implementing a sophisticated distributed monitoring solution that reduces communications between vehicles and company servers to the essential. We validate the framework on a large database of real mobility data coming from GPS devices on private cars. Finally, we analyze the privacy risks that the proposed approach might involve for the users, providing and evaluating a countermeasure based on data perturbation
Survey on using constraints in data mining
This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints in a data mining task requires specific definition and satisfaction tools during knowledge extraction. This survey proposes three groups of studies based on classification, clustering and pattern mining, whether the constraints are on the data, the models or the measures, respectively. We consider the distinctions between hard and soft constraint satisfaction, and between the knowledge extraction phases where constraints are considered. In addition to discussing how constraints can be used in data mining, we show how constraint-based languages can be used throughout the data mining process
Data mining and constraints: An overview
This paper provides an overview of the current state-of-theart on using constraints in knowledge discovery and data mining. The use of constraints requires mechanisms for defining and evaluating them during the knowledge extraction process.We give a structured account of three main groups of constraints based on the specific context in which they are defined and used. The aim is to provide a complete view on constraints as a building block of data mining methods
Data science at SoBigData: the European research infrastructure for social mining and big data analytics
Most people have become “big data” producers in their daily life. Our desires, opinions, sentiments, social links as well as our mobile phone calls and GPS track leave traces of our behaviours. To transform these data into knowledge, value is a complex task of data science. This paper shows how the SoBigData Research Infrastructure supports data science towards the new frontiers of big data exploitation. Our research infrastructure serves a large community of social sensing and social mining researchers and it reduces the gap between existing research centres present at European level. SoBigData integrates resources and creates an infrastructure where sharing data and methods among text miners, visual analytics researchers, socio-economic scientists, network scientists, political scientists, humanities researchers can indeed occur. The main concepts related to SoBigData Research Infrastructure are presented. These concepts support virtual and transnational (on-site) access to the resources. Creating and supporting research communities are considered to be of vital importance for the success of our research infrastructure, as well as contributing to train the new generation of data scientists. Furthermore, this paper introduces the concept of exploratory and shows their role in the promotion of the use of our research infrastructure. The exploratories presented in this paper represent also a set of real applications in the context of social mining. Finally, a special attention is given to the legal and ethical aspects. Everything in SoBigData is supervised by an ethical and legal framework
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
- …
