1,721,035 research outputs found
Training and Serving Machine Learning Models at Scale
In recent years, Web services are becoming more and more intelligent (e.g., in understanding user preferences) thanks to the integration of components that rely on Machine Learning (ML). Before users can interact (inference phase) with an ML-based service (ML-Service), the underlying ML model must learn (training phase) from existing data, a process that requires long-lasting batch computations. The management of these two, diverse phases is complex and meeting time and quality requirements can hardly be done with manual approaches.This paper highlights some of the major issues in managing ML-services in both training and inference modes and presents some initial solutions that are able to meet set requirements with minimum user inputs. A preliminary evaluation demonstrates that our solutions allow these systems to become more efficient and predictable with respect to their response time and accuracy
PAPS: A Serverless Platform for Edge Computing Infrastructures
Edge computing infrastructures are often employed to run applications with low latency requirements. Users can exploits nodes that are close to their physical positions so that the delay of sending computations and data to the Cloud is mitigated. Since users frequently change their locations, and the resources available in the Edge are limited, the management of these infrastructures poses new, difficult challenges. This paper presents PAPS (Partitioning, Allocation, Placement, and Scaling), a framework for the efficient, automated and scalable management of large-scale Edge topologies. PAPS acts as a serveless platform for the Edge. Service providers can upload applications as compositions of lightweight and stateless functions along with latency constraints. At runtime, PAPS manages these applications by executing them in containers, it changes their placement in the Edge topology according to the geographical distribution of the workload, and efficiently allocates resources according to their needs. This paper also presents the architecture of a PAPS prototype built atop Kubernetes and OpenFaaS. The assessment shows both the feasibility of the approach and the ability of efficiently managing hundreds of serverless concurrent functions and of dealing with intense and unpredictable workload variations
A Simulation-based Comparison between Industrial Autoscaling Solutions and COCOS for Cloud Applications
Dynamic resource allocation is the mechanism that allows one to change the resources associated with applications at runtime and match their actual needs. The autoscaling solutions offered by cloud infrastructures are probably the most widely-used incarnation of this concepts. Originally conceived to manage virtual machines according to user-defined rules, they are now much more sophisticated and can also allocate containers (lighter than virtual machines). This paper surveys the autoscaling solutions provided by the major cloud vendors and analyzes the services they provide. It also compares them against the solution we developed, called COCOS autoscaling. We simulated the different proposals and fed them with diverse workloads. Obtained results show that COCOS autoscaling outperforms its competitors in most of the cases: it optimizes resource allocation and keeps applications' response times under set thresholds
COCOS: A scalable architecture for containerized heterogeneous systems
Nowadays software systems are organized around several and heterogeneous components. For example, a modern application can be composed of different microservices, along with dedicated components for machine learning analytics and recurring batch processing jobs. While containers offer a means to deploy the system and tackle heterogeneity, these components have different execution models, can exploit different resource types (e.g., CPUs and GPUs) and result in completely different execution times (milliseconds vs hours). This complexity calls for a new, scalable architecture to allow the systems to operate efficiently. This paper presents COCOS, an architecture, based on containers and control-theory, that is able to manage large and heterogeneous software systems. The architecture is based on a three-level hierarchy of controllers that cooperatively enforce user-defined requirements on execution times and consumed resources. The paper also shows a prototype implementation of COCOS based on Kubernetes, a well-known container orchestrator. The evaluation shows the efficiency of COCOS when dealing with microservices, Spark jobs and machine learning applications
Predictive maintenance of infrastructure code using “fluid” datasets: An exploratory study on Ansible defect proneness
This work consolidates and compounds previous investigations in recognizing defects for infrastructure-as-code (IaC) scripts using general software development quality metrics with a focus on defect severity but adding to previous work an explorative look at creating datasets, which may boost the predictive power of provided models-we call this notion a fluid dataset. More specifically, we experiment with 50 different metrics harnessing a multiple dataset creation process whereby different versions of the same datasets are rigged with auto-training facilities for model retraining and redeployment in a DataOps fashion. At this point, with a focus on the Ansible infrastructure code language-as a de facto standard for industrial-strength infrastructure code-we build defect prediction models and manage to improve on the state of the art by finding an F1 score of 0.52 and a recall of 0.57 using a Naive-Bayes classifier. On the one hand, by improving state-of-the-art defect prediction models using metrics generalizable for different IaC languages, we provide interesting leads for the future of infrastructure-as-code. On the other hand, we have barely scratched the surface on the novel approach of fluid-datasets creation and automated retraining of Machine Learning (ML) defect prediction models, warranting for more research on the same direction in the future
Intelligent re-deployment feedback loop for hybrid applications
We propose enabling continuous performance optimisation of distributed hybrid applications in heterogeneous cloud, Edge, and HPC environments by employing an intelligent re-deployment feedback loop
Big-data applications as self-Adaptive systems of systems
Virtualization technologies have enabled a new way of thinking of computing resources and cloud computing frameworks offer many pay-per-use solutions for renting these resources. Conventional physical servers had to be acquired, provisioned, and configured beforehand; virtual resources can be allocated on demand, and changes can be managed quickly. Deploying systems on virtualized resources allows one to allocate resources given the actual workload and KPIs of interest, but it requires that resource management be part of the system itself. Traditional application components must be augmented with probes and actuators to sense the application behavior and provision resources accordingly. Big data applications are a prominent example of these modern systems, and the paper discusses dynaSpark, that is, the work done by the authors to extend Spark standalone-A well-known framework widely used for parallel processing and big data applications-And augment it with resource management capabilities. It also introduces the key problems the integration and the particular batch applications bring in, and identifies additional aspects that are still to be taken into account and that would lead to a better solution
Fine-Grained Dynamic Resource Allocation for Big-Data Applications
Many big-data applications are batch applications that exploit dedicated frameworks to perform massively parallel computations across clusters of machines. The time needed to process the entirety of the inputs represents the application's response time, which can be subject to deadlines. Spark, probably the most famous incarnation of these frameworks today, allocates resources to applications statically at the beginning of the execution and deviations are not managed: to meet the applications' deadlines, resources must be allocated carefully. This paper proposes an extension to Spark, called dynaSpark, that is able to allocate and redistribute resources to applications dynamically to meet deadlines and cope with the execution of unanticipated applications. This work is based on two key enablers: containers, to isolate Spark's parallel executors and allow for the dynamic and fast allocation of resources, and control-theory to govern resource allocation at runtime and obtain required precision and speed. Our evaluation shows that dynaSpark can (i) allocate resources efficiently to execute single applications with respect to set deadlines and (ii) reduce deadline violations (w.r.t. Spark) when executing multiple concurrent applications
Federated Machine Learning as a Self-Adaptive Problem
Machine Learning (ML) enables the creation of a new generation of applications that 'learn' from collected data, transferred and analyzed on centralized servers. Moving data may imply a significant overhead and may also undermine users' privacy. Federated Machine Learning (FedML) tries to address these issues by means of local training phases on client devices: only lightweight aggregated data are then sent to the centralized server. FedML solutions must offer response times and accuracy similar to traditional ML applications, but their management is distributed on devices that may be heterogeneous, may become unavailable, and are not as powerful as (cloud-based) servers. This paper considers FedML systems a novel example of self-adaptive applications, where clients and servers must cooperate to provide required results. In particular, this paper proposes: i) the formalization of FedML applications as self-adaptive systems, ii) an initial prototype that shows the feasibility of the approach, and iii) a preliminary evaluation that demonstrates the benefit of the proposed solution
Resource Management for TensorFlow Inference
TensorFlow, a popular machine learning (ML) platform, allows users to transparently exploit both GPUs and CPUs to run their applications. Since GPUs are optimized for compute-intensive workloads (e.g., matrix calculus), they help boost executions, but introduce resource heterogeneity. TensorFlow neither provides efficient heterogeneous resource management nor allows for the enforcement of user-defined constraints on the execution time. Most of the works address these issues in the context of creating models on existing data sets (training phase), and only focus on scheduling algorithms. This paper focuses on the inference phase, that is, on the application of created models to predict the outcome on new data interactively, and presents a comprehensive resource management solution called ROMA (Resource Constrained ML Applications). ROMA is an extension of TensorFlow that (a) provides means to easily deploy multiple TensorFlow models in containers using Kubernetes b) allows users to set constraints on response times, (c) schedules the execution of requests on GPUs and CPUs using heuristics, and (d) dynamically refines the CPU core allocation by exploiting control theory. The assessment conducted on four real-world benchmark applications compares ROMA against four different systems and demonstrates a significant reduction (75 % ) in constraint violations and 24 % saved resources on average
- …
