1,720,989 research outputs found

    Parallel approaches for a decision tree-based explainability algorithm

    Full text link
    While nowadays Machine Learning (ML) algorithms have achieved impressive prediction accuracy in various fields, their ability to provide an explanation for the output remains an issue. The explainability research field is precisely devoted to investigating techniques able to give an interpretation of ML algorithms’ predictions. Among the various approaches to explainability, we focus on GLEAMS: a decision tree-based solution that has proven to be rather promising under various perspectives, but suffers a sensible increase in the execution time as the problem size grows. In this work, we analyse the state-of-the-art parallel approaches to decision tree-building algorithms and we adapt them to the peculiar characteristics of GLEAMS. Relying on an increasingly popular distributed computing engine called Ray, we propose and implement different parallelization strategies for GLEAMS. An extensive evaluation highlights the benefits and limitations of each strategy and compares the performance with other existing explainability algorithms

    Precise Worst-case Blocking Time of Tasks under Priority Inheritance Protocol

    Full text link
    The problem of precisely computing the worst-case blocking time that tasks may experience is one of the fundamental issues of schedulability analysis of real-time applications. While exact methods have been proposed for more sophisticated protocols, the problem is indeed complex in case of the Priority Inheritance Protocol, even restricting the attention to uniprocessor systems, non-nested resource accesses, and non-self-suspending tasks. Besides a very simple method leading in general to loose upper bounds, only one algorithm of exponential complexity has been so far reported in literature to tighten such bounds. In this work, we describe a novel approach which, leveraging an operational research technique for modeling the problem, computes the same tight bounds in polynomial time. We then discuss the scenarios in which, assuming no conditional statements in the tasks’ code, the computed bounds derive from an actually impossible blocking chain, and we refine the initial model to more precisely compute the worst-case blocking times for any task set in any possible operating condition

    Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint

    No full text
    The scale of nowadays High Performance Computing (HPC) systems is the key element that determines the achievement of impressive performance, as well as the reason for their relatively limited reliability. Over the last decade, specific areas of the High Performance Computing (HPC) research field have addressed the issue at different levels, by enriching the infrastructure, the platforms, or the algorithms with fault tolerance features. In this work, we focus on the rather-pervasive task of computing the solution of a dense, unstructured linear system and we propose an algorithm-based technique to obtain fault tolerance to multiple anywhere-located faults during the parallel computation. We particularly study the ways to boost the performance of the rollback-free recovery, and we provide an extensive evaluation of our technique w.r.t. to other state-of-the-art algorithm-based methods

    MapReduce over the Hybrid Cloud: a novel Infrastructure Management Policy

    No full text
    Over the last few years, the context of big data has gained a significant traction due to many factors. While the public cloud model had been deeply studied to face the increasing demand for large-scale data processing capabilities, many organizations are now focusing on the hybrid cloud model, where the classic scenario is enriched with a private (company owned) cloud – e.g., for the management of sensible data. In this work, we propose HyMR, a policy to enable autonomic cloud bursting for clusters of virtual machines operating MapReduce jobs over a hybrid cloud. This policy – together with an infrastructure level system for resource provisioning in hybrid clouds – can be used to face the temporary (or permanent) lack of computational resources on the private cloud, allowing cloud bursting in the context of big data applications. By means of an empirical evaluation of the system scale-up/-down performance, we show that HyMR policy allows the user to significantly reduce the data-processing time, although it is inevitably influenced by the inter-cloud bandwidth

    A Hybrid Cloud Infrastructure for Big Data Applications

    No full text
    The trending evolution towards the Internet of things and the general increase in broadband are constantly creating large volumes of data that need to be processed to extract further knowledge. Recently, the cloud computing model has seen the evolution from the initial scenario of a public cloud offering its resources to customers through virtualization and Internet, toward the concept of hybrid cloud, where the classic scenario is enriched with a private (company owned) cloud e.g., for the management of sensible data. In this work, we propose a software layer for the deployment and dynamic scaling of virtual clusters on a hybrid cloud. This system can be used for cloud bursting in the context of big data applications. Our work shows that although the execution is significantly influenced by the inter-cloud bandwidth, a dynamic off-premise provisioning mechanism could allow the user to significantly increase the application performance

    SHYAM: A system for autonomic management of virtual clusters in hybrid clouds

    Full text link
    While the public cloud model has been vastly explored over the last few years to face the demand for large-scale distributed computing capabilities, many organizations are now focusing on the hybrid cloud model, where the classic scenario is enriched with a private (company owned) cloud – e.g., for the management of sensible data. In this work, we propose SHYAM, a software layer for the autonomic deployment and configuration of virtual clusters on a hybrid cloud. This system can be used to face the temporary (or permanent) lack of computational resources on the private cloud, allowing cloud bursting in the context of big data applications. We firstly provide an empirical evaluation of the overhead introduced by SHYAM provisioning mechanism. Then we show that, although the execution time is significantly influenced by the intercloud bandwidth, an autonomic off-premise provisioning mechanism can significantly improve the application performance

    Distributed Compliance Monitoring of Business Processes over MapReduce Architectures

    No full text
    In the era of IoT, large volumes of event data from different sources are collected in the form of streams. As these logs need to be online processed to extract further knowledge about the underlying business process, it is becoming more and more important to give support to run-time monitoring. In particular, increasing attention has been turned to conformance checking as a way to identify when a sequence of events deviates from the expected behavior. Albeit rather straightforward on a small log file, conformance verification techniques may show poor performance when dealing with big data, making increasingly attractive the possibility to improve scalability through distributed computation. In this paper, we adopt a previously implemented framework for compliance verification (which provides a high-level logic-based notation for the monitoring specification) and we show how it can be efficiently distributed on a set of computing nodes to support scalable run-time monitoring when dealing with large volumes of event logs

    A distributed approach to compliance monitoring of business process event streams

    Full text link
    In recent years, the significant advantages brought to business processes by process mining account for its evolution as a major concern in both industrial and academic research. In particular, increasing attention has been turned to compliance monitoring as a way to identify when a sequence of events deviates from the expected behaviour. As we are entering the IoT era, an increasing variety of smart objects can be introduced in business processes (e.g., tags to track products in a plant, smartphones and badge swiping to draw the activities of customers and employees in a shopping centre, etc.). All these objects produce large volumes of log data in the form of streams, which need to be run-time analysed to extract further knowledge about the underlying business process and to identify unexpected, non-conforming events. Albeit rather straightforward on a small log file, compliance verification techniques may show poor performances when dealing with big data and streams, thus calling for scalable approaches. This work investigates the possibility of spreading the compliance monitoring task over a network of computing nodes, achieving the desired scalability. The monitor is realised through the existing SCIFF framework for compliance checking, which provides a high level logic-based language for expressing the properties to be monitored and nicely supports the partitioning of the monitoring task. The distributed computation is achieved through a MapReduce approach and the adoption of an existing general engine for large scale stream processing. Experimental results show the feasibility of the approach as well as the advantages in performance brought to the compliance monitoring task

    Abduction for Generating Synthetic Traces

    No full text
    In this paper we report our preliminary experience on the design of a generator of synthetic logs. Sometimes real logs might not be available, or their quality might not be good enough: synthetic logs instead can be generated with all the desired features and characteristics. Our tool takes as input a structured workflow model, encoded in the abductive declarative language SCIFF, and provides as output a log containing positive traces, i.e. traces deemed as conformant w.r.t. the model. Distinctive features of our approach are the capability of generating trace templates as well as grounded traces, the possibility of taking into account user-specified constraints on data and timestamps, and the capability of generating traces starting from a user-specified partial trace. Although our tool is still in its preliminary version, we have successfully exploited it to generate synthetic logs of different dimension, thus proving the viability of our approach
    corecore