1,721,028 research outputs found
Towards operator-less data centers through data-driven, predictive, proactive autonomics
Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using live data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating predictive models for node failures. Our results support the practicality of a data-driven approach by showing the effectiveness of predictive models based on data found in typical data center logs. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing node state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if nodes will fail in a future 24-h window. Our evaluation reveals that if we limit false positive rates to 5 %, we can achieve true positive rates between 27 and 88 % with precision varying between 50 and 72 %. This level of performance allows us to recover large fraction of jobs’ executions (by redirecting them to other nodes when a failure of the present node is predicted) that would otherwise have been wasted due to failures. We discuss the feasibility of including our predictive model as the central component of a data-driven autonomic manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available on GitHub
Use of non-traditional data sources to nowcast migration trends through Artificial Intelligence technologies
In recent years the pursuit of original drivers and methods is becoming an increasing requirement for migration studies, considering the new technologies used to characterise and understand the human migration phenomenon. In addition to the traditional data typically used in migration studies (e.g., indicators related to the labour market or economic status, measures obtained from surveys and official statistics, either from national censuses or from the population registries), many researchers like Bosco et al. (2022), Fiorio et al. (2017), Gendronneau et al. (2019), Jisu, Sîrbu, Rossetti, Giannotti, and Rapoport (2021), Salah (2021), Spyratos et al. (2018), Sîrbu et al. (2021), Zagheni, Garimella, Weber, and State (2014), Zagheni, Polimis, Alexander, Weber, and Billari (2018), Zagheni, Weber, and Gummadi (2017), have proposed to employ non-traditional data sources to study migration. These can consist in news data, satellite data, but also in digital traces of humans generated by using internet services, mobile phones, IoT devices, fidelity cards, online social networks and many others. This unconventional approach is intended to find an alternative methodology to answer open questions about the human migration framework (i.e., nowcasting flows and stocks, studying the integration of multiple sources and knowledge, and investigating migration drivers). The new data have the advantage of timeliness and large geographical coverage, but also disadvantages in terms of selection bias and amount of resources required to process, as reported by Sîrbu et al. (2021) and Pollacci, Milli, Bircan, and Rossetti (2022). Therefore, models extracted from these data need to be carefully validated, typically with traditional data sources. In this context of meaningful data combination, many types of data exist, still very scattered and heterogeneous, making integration far from straightforward
A SOA-based Solution for Resource Monitoring within a Grid System
The paper presents the architectural details and the
practical deployment aspects concerning a service-oriented
application focused on the resource monitoring within a Globus like
Grid system. The implementation employs the Java and .NET
technologies and the user interaction is facilitated by a usable
Web interface
Use of non-traditional data sources to nowcast migration trends through Artificial Intelligence technologies
In recent years the pursuit of original drivers and methods is becoming an increasing requirement for migration studies, considering the new technologies used to characterise and understand the human migration phenomenon. In addition to the traditional data typically used in migration studies (e.g., indicators related to the labour market or economic status, measures obtained from surveys and official statistics, either from national censuses or from the population registries), many researchers like Bosco et al. (2022), Fiorio et al. (2017), Gendronneau et al. (2019), Jisu, Sîrbu, Rossetti, Giannotti, and Rapoport (2021), Salah (2021), Spyratos et al. (2018), Sîrbu et al. (2021), Zagheni, Garimella, Weber, and State (2014), Zagheni, Polimis, Alexander, Weber, and Billari (2018), Zagheni, Weber, and Gummadi (2017), have proposed to employ non-traditional data sources to study migration. These can consist in news data, satellite data, but also in digital traces of humans generated by using internet services, mobile phones, IoT devices, fidelity cards, online social networks and many others. This unconventional approach is intended to find an alternative methodology to answer open questions about the human migration framework (i.e., nowcasting flows and stocks, studying the integration of multiple sources and knowledge, and investigating migration drivers). The new data have the advantage of timeliness and large geographical coverage, but also disadvantages in terms of selection bias and amount of resources required to process, as reported by Sîrbu et al. (2021) and Pollacci, Milli, Bircan, and Rossetti (2022). Therefore, models extracted from these data need to be carefully validated, typically with traditional data sources. In this context of meaningful data combination, many types of data exist, still very scattered and heterogeneous, making integration far from straightforward
A data-driven approach to modeling power consumption for a hybrid supercomputer
Power consumption of current High Performance Computing systems has to be reduced by at least one order of magnitude before they can be scaled up towards ExaFLOP performance. While we can expect novel hardware technologies and architectures to contribute towards this goal, significant advances have to come also from software technologies such as proactive and power-aware scheduling, resource allocation, and fault-tolerant computing. Development of these software technologies in turn relies heavily on our ability to model and accurately predict power consumption in large computing systems. In this paper, we present a data-driven model of power consumption for a hybrid supercomputer (which held the top spot in the Green500 ranking in June 2013) that combines CPU, GPU, and MIC technologies to achieve high levels of energy efficiency. Our model takes as input workload characteristics-the number and location of resources that are used by each job at a certain time-and calculates a predicted power consumption at the system level. The model is application-code-agnostic and is based solely on a data-driven predictive approach, where log data describing the past jobs in the system are employed to estimate future power consumption. For this, three different model components are developed and integrated. The first employs support vector regression to predict power usage for jobs before these are started. The second uses a simple heuristic to predict the length of jobs, again before they start. The two predictions are then combined to estimate power consumption due to the job at all computational elements in the system. The third component is a linear model that takes as input the power consumption at the computing units and predicts system-wide power consumption. Our method achieves highly-accurate predictions starting solely from workload information and user histories. The model can be applied to power-aware scheduling and power capping: alternative workload dispatching configurations can be evaluated from a power perspective and more efficient ones can be selected. The methodology outlined here can be easily adapted to other HPC systems where the same types of log data are available
Towards Data-Driven Autonomics in Data Centers
Continued reliance on human operators for managing
data centers is a major impediment for them from
ever reaching extreme dimensions. Large computer systems
in general, and data centers in particular, will ultimately be
managed using predictive computational and executable models
obtained through data-science tools, and at that point, the
intervention of humans will be limited to setting high-level
goals and policies rather than performing low-level operations.
Data-driven autonomics, where management and control are
based on holistic predictive models that are built and updated
using generated data, opens one possible path towards limiting
the role of operators in data centers. In this paper, we present
a data-science study of a public Google dataset collected in a
12K-node cluster with the goal of building and evaluating a
predictive model for node failures. We use BigQuery, the big
data SQL platform from the Google Cloud suite, to process
massive amounts of data and generate a rich feature set
characterizing machine state over time. We describe how an
ensemble classifier can be built out of many Random Forest
classifiers each trained on these features, to predict if machines
will fail in a future 24-hour window. Our evaluation reveals
that if we limit false positive rates to 5%, we can achieve true
positive rates between 27% and 88% with precision varying
between 50% and 72%. We discuss the practicality of including
our predictive model as the central component of a data-driven
autonomic manager and operating it on-line with live data
streams (rather than off-line on data logs). All of the scripts
used for BigQuery and classification analyses are publicly
available from the authors’ website
Cognified distributed computing
Cognification - the act of transforming ordinary objects or processes into their intelligent counterparts through Data Science and Artificial Intelligence - is a disruptive technology that has been revolutionizing disparate fields ranging from corporate law to medical diagnosis. Easy access to massive data sets, data analytics tools and High-Performance Computing (HPC) have been fueling this revolution. In many ways, cognification is similar to the electrification revolution that took place more than a century ago when electricity became a ubiquitous commodity that could be accessed with ease from anywhere in order to transform mechanical processes into their electrical counterparts. In this paper, we consider two particular forms of distributed computing - Data Centers and HPC systems - and argue that they are ripe for cognification of their entire ecosystem, ranging from top-level applications down to low-level resource and power management services. We present our vision for what 'Cognified Distributed Computing' might look like and outline some of the challenges that need to be addressed and new technologies that need to be developed in order to make it a reality. In particular, we examine the role cognification can play in tackling power consumption, resiliency and management problems in these systems. We describe intelligent software-based solutions to these problems powered by on-line predictive models built from streamed real-time data. While we cast the problem and our solutions in the context of large Data Centers and HPC systems, we believe our approach to be applicable to distributed computing in general. We believe that the traditional systems research agenda has much to gain by crossing discipline boundaries to include ideas and techniques from Data Science, Machine Learning and Artificial Intelligence
Where do migrants and natives belong in a community : a Twitter case study and privacy risk analysis
Today, many users are actively using Twitter to express their opinions and to share information. Thanks to the availability of the data, researchers have studied behaviours and social networks of these users. International migration studies have also benefited from this social media platform to improve migration statistics. Although diverse types of social networks have been studied so far on Twitter, social networks of migrants and natives have not been studied before. This paper aims to fill this gap by studying characteristics and behaviours of migrants and natives on Twitter. To do so, we perform a general assessment of features including profiles and tweets, and an extensive network analysis on the network. We find that migrants have more followers than friends. They have also tweeted more despite that both of the groups have similar account ages. More interestingly, the assortativity scores showed that users tend to connect based on nationality more than country of residence, and this is more the case for migrants than natives. Furthermore, both natives and migrants tend to connect mostly with natives. The homophilic behaviours of users are also well reflected in the communities that we detected. Our additional privacy risk analysis showed that Twitter data can be safely used without exposing sensitive information of the users, and minimise risk of re-identification, while respecting GDPR
A Big Data analyzer for large trace logs
Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors including computing hardware, multiple layers of intricate software, networking and storage devices, electrical power and cooling plants. During the course of their operation, many of these components produce large amounts of data in the form of event and error logs that are essential not only for identifying and resolving problems but also for improving data center efficiency and management. Most of these activities would benefit significantly from data analytics techniques to exploit hidden statistical patterns and correlations that may be present in the data. The sheer volume of data to be analyzed makes uncovering these correlations and patterns a challenging task. This paper presents Big Data analyzer (BiDAl), a prototype Java tool for log-data analysis that incorporates several Big Data technologies in order to simplify the task of extracting information from data traces produced by large clusters and server farms. BiDAl provides the user with several analysis languages (SQL, R and Hadoop MapReduce) and storage backends (HDFS and SQLite) that can be freely mixed and matched so that a custom tool for a specific task can be easily constructed. BiDAl has a modular architecture so that it can be extended with other backends and analysis languages in the future. In this paper we present the design of BiDAl and describe our experience using it to analyze publicly-available traces from Google data clusters, with the goal of building a realistic model of a complex data center
- …
