1,720,971 research outputs found

    A Recovery-Oriented Approach for Software Fault Diagnosis in Complex Critical Systems

    No full text
    This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, encompassing the phases of error detection, fault location, and system recovery. Errors are detected in the first phase, exploiting the operating system support. Faults are identified during the location phase, adopting on a machine learning approach; this phase then triggers the proper recovery action for the occurred fault - actuated in the third phase. Feedback actions are also adopted in the location phase to improve detection quality over time. A real world application from the Air Traffic Control field has been used as case study for evaluating the proposed approach. Experimental results, achieved by means of fault injection, show that the diagnosis engine is able to diagnose faults with high accuracy and at a low overhead

    An effort allocation method to optimal code sanitization for quality-Aware energy efficiency improvement

    No full text
    Software energy efficiency has been shown to remarkably affect the energy consumption of IT platforms. Besides "performance" of the code in efficiently accomplishing a task, its "correctness" matters too. Software containing defects is likely to fail and the computational cost to complete an operation becomes much higher if the user encounters a failure. Both performancerelated energy efficiency of software and its defectiveness are impacted by the quality of the code. Exploiting the relation between code quality and energy/defectiveness attributes is the main idea behind this position paper. Starting from the authors' previous experience in this field, we define a method to first predict the applications of a software system more likely to impact energy consumption and with higher residual defectiveness, and then to exploit the prediction for optimally scheduling the effort for code sanitization -Thus supporting, by quantitative figures, the quality assurance teams' decision-makers

    The Role of Field Data for Analyzing the Dependability of Short Range Wireless Technologies

    Full text link
    The migration from mobile to ubiquitous Internet is at hand, due to the intense growth of short range wireless technologies. Users accessing the Internet through wireless devices are increasing, if compared to “wired” ones, and they expect the same dependability level they already experience on wired networks, that is high quality “always on” wireless networks. But how can we analyze the dependability level of a wireless network? Direct analysis of failures from the field of application is an effective practice to understand the actual dependability behavior of an operational system. However, despite its wide use over the last four decades on a large variety of systems, field data analysis has rarely been applied to wireless networks. Through the experience gained from extensive failure analysis of Bluetooth networks, the article shows how field failure data can play a key role to fill the gap on understanding the dependability behavior of wireless networks

    Memory Leak Analysis of Mission-Critical Middleware

    No full text
    Memory leaks are recognized to be one of the major causes of memory exhaustion problems in complex software systems. This paper proposes a practical approach to detect aging phenomena caused by memory leaks in distributed objects Off-The-Shelf middleware, which are commonly used to develop critical applications. The approach, which is validated on a real-world case study from the Air Traffic Control domain, defines algorithms and ad hoc support tools to perform data filtering and to find the best trade off between experimentation time and statistical accuracy of aging trend estimates. Experiments show that fixing memory leaks is not always the key to solve memory exhaustion problems

    An Optimized Workload for Failure Data Analysis of Mobile P2P over Bluetooth Ad-Hoc Networks

    No full text
    Mobile Peer-to-Peer (P2P) is a base paradigm for many new killer applications for mobile ad-hoc networks and the Mobile Internet. Currently, it is not well understood whether this paradigm is able to meet business and consumer dependability expectations. Dependability assessment of P2P applications can be achieved by field failure data analysis. The collection of failure data from wireless ad-hoc networks is a challenging task due to the intermittent usage and the mobility of users that do not allow to measure time-based dependability parameters. For this reason, we propose to deploy automated workloads on the actual peer nodes which have to operate continuously. Specifically, this paper formalizes the problem and presents the design of a workload for mobile P2P that aims to orchestrate the peers uniformly, letting the failure occurrence be independent of the network load. Simulation results and experimentation over an actual Bluetooth network demonstrate that the proposed workload meets the defined requirements

    Dependability evaluation and modelling of the Bluetooth data communication channel

    No full text
    This work presents a measurement-based dependability evaluation of the Bluetooth data communication channel, i.e., the Baseband layer. The main contribution is the definition of the Baseband's error/recovery model according to the Markov chains formalism. The model is derived by analyzing field data, which are collected via a commercial air sniffer deployed over real- world Bluetooth piconets. The model is parametric and actual values for its parameters are estimated by analyzing the field data. The paper also proposes the evaluation of dependability statistics (e.g., the error and failure times distributions, and the availability estimate), and the study of the failing behavior of the Bluetooth communication channel under Wi-Fi interferences

    Operating System Support to Detect Application Hangs

    No full text
    On-line failure detection is an essential means to control and assess the dependability of complex and critical software systems. In such context, effective detection strategies are required, in order to minimize the possibility of catastrophic consequences. This objective is however difficult to achieve in complex systems, especially due to the several sources of non-determinism (e.g., multi-threading and distributed interaction) which may lead to software hangs, i.e., the system is active but no longer capable of delivering its services. The paper proposes a detection approach to uncover application hangs. It exploits multiple indirect data gathered at the operating system level to monitor the system and to trigger alarms if the observed behavior deviates from the expected one. By means of fault injection experiments conducted on a research prototype, it is shown how the combination of several operating system monitors actually leads to an high quality of detection, at an acceptable overhead
    corecore