1,721,002 research outputs found

    Dependability evaluation of middleware technology for large-scale distributed caching

    No full text
    Distributed caching systems (e.g., Memcached) are widely used by service providers to satisfy accesses by millions of concurrent clients. Given their large-scale, modern distributed systems rely on a middleware layer to manage caching nodes, to make applications easier to develop, and to apply load balancing and replication strategies. In this work, we performed a dependability evaluation of three popular middleware platforms, namely Twemproxy by Twitter, Mcrouter by Facebook, and Dynomite by Netflix, to assess availability and performance under faults, including failures of Memcached nodes and congestion due to unbalanced workloads and network link bandwidth bottlenecks. We point out the different availability and performance trade-offs achieved by the three platforms, and scenarios in which few faulty components cause cascading failures of the whole distributed system

    Towards Cognitive Security Defense from Data

    No full text
    IT organizations rely on a variety of independent security monitors and data sources to develop situational awareness for detecting and responding to security incidents. In spite of the advances in Security Information and Event Management (SIEM) for handling monitoring data in production environments, computer defense still depends on many cognitive human processes. In this context, having machines doing part of the cognitive work in lieu of humans is by now a real necessity. We present our framework towards the vision of cognitive SIEM, its building components and ongoing work on the topic

    DRACO: Distributed Resource-aware Admission Control for large-scale, multi-tier systems

    No full text
    Modern distributed systems are designed to manage overload conditions, by throttling the traffic in excess that cannot be served through overload control techniques. However, the adoption of large-scale NoSQL datastores make systems vulnerable to unbalanced overloads, where specific datastore nodes are overloaded because of hot-spot resources and hogs. In this paper, we propose DRACO, a novel overload control solution that is aware of data dependencies between the application and the datastore tiers. DRACO performs selective admission control of application requests, by only dropping the ones that map to resources on overloaded datastore nodes, while achieving high resource utilization on non-overloaded datastore nodes. We evaluate DRACO on two case studies with high availability and performance requirements, a virtualized IP Multimedia Subsystem and a distributed fileserver. Results show that the solution can achieve high performance and resource utilization even under extreme overload conditions, up to 100x the engineered capacity

    ThorFI: a Novel Approach for Network Fault Injection as a Service

    No full text
    In this work, we present a novel fault injection solution (ThorFI) for virtual networks in cloud computing infrastructures. ThorFI is designed to provide non-intrusive fault injection capabilities for a cloud tenant, and to isolate injections from interfering with other tenants on the infrastructure. We present the solution in the context of the OpenStack cloud management platform, and release this implementation as open-source software. Finally, we present two relevant case studies of ThorFI, respectively in an NFV IMS and of a high-availability cloud application. The case studies show that ThorFI can enhance functional tests with fault injection, as in 4%–34% of the test cases the IMS is unable to handle faults; and that despite redundancy in virtual networks, faults in one virtual network segment can propagate to other segments, and can affect the throughput and response time of the cloud application as a whole, by about 3 times in the worst case

    A Comparative Analysis of Software Aging in Image Classifiers on Cloud and Edge

    No full text
    Image classifiers for recognizing real-world objects are widely used in the Internet of Things (IoT) and Cyber-Physical Systems(CPSs). A classifier is trained offline by machine learning algorithms with training data sets, and then it is deployed on a cloud or an edge computing system for online label predictions. As the classifier's performance depends on the underlying software infrastructure, it may degrade over time due to software faults causing software aging. In this paper, we address this issue and experimentally investigate software aging observed in an image classification system that continuously runs on cloud and edge computing environments. We apply several statistical techniques to analyze degradation trends in the systems under stress tests. Our statistical trend analysis confirms the degradation trends in the throughput as well as the available memory resources both in the cloud and the edge environments. Contrary to our expectation, the edge computing environment under test had much less impact on the performance degradation than our cloud environment when the workload is high, although the latter one has four times larger allocated memory resources. We also show that the observed performance degradation trends are associated with the memory usage of specific processes by performing correlation analysis

    Fault Injection Analytics: A Novel Approach to Discover Failure Modes in Cloud-Computing Systems

    No full text
    Cloud computing systems fail in complex and unexpected ways, due to unexpected combinations of events and interactions between hardware and software components. Fault injection is an effective means to bring out these failures in a controlled environment. However, fault injection experiments produce massive amounts of data, and manually analyzing these data is inefficient and error-prone, as the analyst can miss severe failure modes that are yet unknown. This paper introduces a new paradigm (fault injection analytics) that applies unsupervised machine learning on execution traces of the injected system, to ease the discovery and interpretation of failure modes. We evaluated the proposed approach in the context of fault injection experiments on the OpenStack cloud computing platform, where we show that the approach can accurately identify failure modes with a low computational cost

    A comprehensive study on software aging across android versions and vendors

    No full text
    This paper analyzes the phenomenon of software aging – namely, the gradual performance degradation and resource exhaustion in the long run – in the Android OS. The study intends to highlight if, and to what extent, devices from different vendors, under various usage conditions and configurations, are affected by software aging and which parts of the system are the main contributors. The results demonstrate that software aging systematically determines a gradual loss of responsiveness perceived by the user, and an unjustified depletion of physical memory. The analysis reveals differences in the aging trends due to the workload factors and to the type of running applications, as well as differences due to vendors’ customization. Moreover, we analyze several system-level metrics to trace back the software aging effects to their main causes. We show that bloated Java containers are a significant contributor to software aging, and that it is feasible to mitigate aging through a micro-rejuvenation solution at the container level

    Automating the correctness assessment of AI-generated code for security contexts

    No full text
    Evaluating the correctness of code generated by AI is a challenging open problem. In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with the human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a full y automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ∼0.17 s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience
    corecore