1,721,202 research outputs found

    Towards mining the organizational structure of a dynamic event scenario

    No full text
    The increasing volume and value of data is an important enabler for data science. In this study, we consider the event data, i.e. information on things that happen in organizations, machines, systems and people’s lives. Each event refers to a well-defined activity in a certain business process execution, the resource (i.e. person or device) executing or initiating the activity, the timestamp of the event, as well as to various data elements recorded with the event (e.g. the geo-location of an activity). Process mining aims to analyze event data, in order to mine knowledge that can contribute to improving a business process behavior. In particular, the focus of this study is on organizational mining, that is a sub-field of process mining that aims at understanding the life cycle of a dynamic organizational structure (i.e. a configuration of organization units) and the interactions among co-workers (resources) arising from the analysis of real-world event logs. The innovative contribution of this study is that the organizational mining goal is here achieved by combining concepts from process mining, stream mining and social network analysis. This combination is an original contribution of this study, not still explored in organizational mining field. In an assessment, benchmark event data are explored, in order to understand how the presented solution allows us to identify the life cycle a dynamic organizational structure

    Segmentation-aided classification of hyperspectral data using spatial dependency of spectral bands

    Full text link
    Classifying every pixel of a hyperspectral image with a certain land-cover type is the cornerstone of hyperspectral image analysis. In the present study a segmentation-aided methodology for the spectral-spatial classification of hyperspectral data is proposed. It considers the spatial dependence of the spectral bands, deals with the curse of dimensionality and handles the spectral variability. A local spatial regularization of spectral information is used, in order to derive an informative joint spectral-spatial representation of the data. A contiguity-based segmentation algorithm is formulated, in order to build the object-wise texture that can aid classifier learning. The hybrid use of the segmentation texture is evaluated in both pre-processing (i.e. selecting representative pixels to learn the classifier) and post-processing (i.e. refining predicted labels and removing possible outlier classifications). The experiments performed with the proposed methodology provide encouraging results, also compared to several recent state-of-the-art approaches

    Spatial Associative Classification: Propositional vs. Structural approach

    No full text
    In Spatial Data Mining, spatial dimension adds a substantial complexity to the data mining task. First, spatial objects are characterized by a geometrical representation and relative positioning with respect to a reference system, which implicitly define both spatial relationships and properties. Second, spatial phenomena are characterized by autocorrelation, i.e., observations of spatially distributed random variables are not location-independent. Third, spatial objects can be considered at different levels of abstraction (or granularity). The recently proposed SPADA algorithm deals with all these sources of complexity, but it offers a solution for the task of spatial association rules discovery. In this paper the problem of mining spatial classifiers is faced by building an associative classification framework on SPADA. We consider two alternative solutions for associative classification: a propositional and a structural method. In the former, SPADA obtains a propositional representation of training data even in spatial domains which are inherently non-propositional, thus allowing the application of traditional data mining algorithms. In the latter, the Bayesian framework is extended following a multi-relational data mining approach in order to cope with spatial classification tasks. Both methods are evaluated and compared on two real-world spatial datasets and results provide several empirical insights on them

    Leveraging the power of local spatial autocorrelation in geophysical interpolative clustering

    No full text
    Nowadays ubiquitous sensor stations are deployed worldwide, in order to measure several geophysical variables (e.g. temperature, humidity, light) for a growing number of ecological and industrial processes. Although these variables are, in general, measured over large zones and long (potentially unbounded) periods of time, stations cannot cover any space location. On the other hand, due to their huge volume, data produced cannot be entirely recorded for future analysis. In this scenario, summarization, i.e. the computation of aggregates of data, can be used to reduce the amount of produced data stored on the disk, while interpolation, i.e. the estimation of unknown data in each location of interest, can be used to supplement station records. We illustrate a novel data mining solution, named interpolative clustering, that has the merit of addressing both these tasks in time-evolving, multivariate geophysical applications. It yields a time-evolving clustering model, in order to summarize geophysical data and computes a weighted linear combination of cluster prototypes, in order to predict data. Clustering is done by accounting for the local presence of the spatial autocorrelation property in the geophysical data. Weights of the linear combination are defined, in order to reflect the inverse distance of the unseen data to each cluster geometry. The cluster geometry is represented through shape-dependent sampling of geographic coordinates of clustered stations. Experiments performed with several data collections investigate the trade-off between the summarization capability and predictive accuracy of the presented interpolative clustering algorith

    A Co-training Strategy for Multiple View Clustering in Process Mining

    No full text
    Process mining refers to the discovery, conformance and enhancement of process models from event logs currently produced by several information systems (e.g. workflow management systems). By tightly coupling event logs and process models, process mining makes possible to detect deviations, predict delays, support decision making and recommend process redesigns. Event logs are data sets containing the executions (called traces) of a business process. Several process mining algorithms have been defined to mine event logs and deliver valuable models (e.g. Petri nets) of how logged processes are being executed. However, they often generate spaghetti-like process models, which can be hard to understand. This is caused by the inherent complexity of real-life processes, which tend to be less structured and more flexible than what the stakeholders typically expect. In particular, spaghetti-like process models are discovered when all possible behaviors are shown in a single model as a result of considering the set of traces in the event log all at once. To minimize this problem, trace clustering can be used as a preprocessing step. It splits up an event log into clusters of similar traces, so as to handle variability in the recorded behavior and facilitate process model discovery. In this paper, we investigate a multiple view aware approach to trace clustering, based on a co-training strategy. In an assessment, using benchmark event logs, we show that the presented algorithm is able to discover a clustering pattern of the log, such that related traces result appropriately clustered. We evaluate the significance of the formed clusters using established machine learning and process mining metrics

    Collective regression for handling autocorrelation of network data in a transductive setting

    Full text link
    Sensor networks, communication and financial networks, web and social networks are becoming increasingly important in our day-to-day life. They contain entities which may interact with one another. These interactions are often characterized by a form of autocorrelation, where the value of an attribute at a given entity depends on the values at the entities it is interacting with. In this situation, the collective inference paradigm offers a unique opportunity to improve the performance of predictive models on network data, as interacting instances are labeled simultaneously by dealing with autocorrelation. Several recent works have shown that collective inference is a powerful paradigm, but it is mainly developed with a fully-labeled training network. In contrast, while it may be cheap to acquire the network topology, it may be costly to acquire node labels for training. In this paper, we examine how to explicitly consider autocorrelation when performing regression inference within network data. In particular, we study the transduction of collective regression when a sparsely labeled network is a common situation. We present an algorithm, called CORENA (COllective REgression in Network dAta), to assign a numeric label to each instance in the network. In particular, we iteratively augment the representation of each instance with instances sharing correlated representations across the network. In this way, the proposed learning model is able to capture autocorrelations of labels over a group of related instances and feed-back the more reliable labels predicted by the transduction in the labeled network. Empirical studies demonstrate that the proposed approach can boost regression performances in several spatial and social tasks

    A novel spectral-spatial co-training algorithm for the transductive classification of hyperspectral imagery data

    Full text link
    The automatic classification of hyperspectral data is made complex by several factors, such as the high cost of true sample labeling coupled with the high number of spectral bands, as well as the spatial correlation of the spectral signature. In this paper, a transductive collective classifier is proposed for dealing with all these factors in hyperspectral image classification. The transductive inference paradigm allows us to reduce the inference error for the given set of unlabeled data, as sparsely labeled pixels are learned by accounting for both labeled and unlabeled information. The collective inference paradigm allows us to manage the spatial correlation between spectral responses of neighboring pixels, as interacting pixels are labeled simultaneously. In particular, the innovative contribution of this study includes: (1) the design of an application-specific co-training schema to use both spectral information and spatial information, iteratively extracted at the object (set of pixels) level via collective inference; (2) the formulation of a spatial-aware example selection schema that accounts for the spatial correlation of predicted labels to augment training sets during iterative learning and (3) the investigation of a diversity class criterion that allows us to speed-up co-training classification. Experimental results validate the accuracy and efficiency of the proposed spectral-spatial, collective, co-training strategy

    A Deep Semantic Segmentation Approach to Map Forest Tree Dieback in Sentinel-2 Data

    Full text link
    Massive tree dieback events triggered by various disturbance agents, such as insect outbreaks, pests, fires and windstorms, have recently compromised the health of forests in numerous countries with a significant impact on ecosystems. The inventory of forest tree dieback plays a key role in understanding the effects of forest disturbance agents and improving forest management strategies. In this article, we illustrate a deep learning approach that trains a U-Net model for the semantic segmentation of Sentinel-2 images of forest areas. The proposed U-Net architecture integrates an attention mechanism to amplify the crucial information and a self-distillation approach to transfer the knowledge within the U-Net architecture. Experimental results demonstrate the significant contribution of both attention and self-distillation to gaining accuracy in two case studies in which we perform the inventory mapping of forest tree dieback caused by insect outbreaks and wildfires, respectively
    corecore