1,721,549 research outputs found
Reinforce-lib: A Reinforcement Learning Library for Scientific Research
Reinforcement Learning (RL) has already achieved several breakthroughs on complex, high-dimensional, and even multi-agent tasks, gaining increasingly interest from not only the research
community. Although very powerful in principle, its applicability is still limited to solving games and control problems, leaving plenty opportunities to apply and develop RL algorithms for (but
not limited to) scientific domains like physics, and biology. Apart from the domain of interest, the applicability of RL is also limited by numerous difficulties encountered while training agents, like training instabilities and sensitivity to hyperparameters. For such reasons, we propose a modern,
modular, simple and understandable Python RL library called reinforce-lib. Our main aim is to enable newcomers, practitioners, and researchers to easily employ RL to solve new scientific problems. Our library is available at https://github.com/Luca96/reinforce-lib
MLaaS4HEP: Machine Learning as a Service for HEP
Machine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the ̄
Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods
Machine Learning inference using PYNQ environment in a AWS EC2 F1 Instance
In the past few years, using Machine and Deep Learning techniques has become more and more viable, thanks to the availability of tools which make the need of specific knowledge in the realm of data science and complex networks less vital to achieve a satisfactory final result in a variety of research fields. This process has caused an explosion in the adoption of such techniques, e.g. in the context of High Energy Physics.
The range of applications for ML becomes even larger if we consider the implementation of these algorithms on low-latency hardware like FPGAs which promise smaller latency with respect to traditional inference algorithms running on general purpose CPUs.
This paper presents and discusses the activity running at the University of Bologna and INFN-Bologna where a new open-source project from Xilinx called PYNQ is being tested. Its purpose is to grant designers the possibility to exploit the benefits of programmable logic and microprocessors using the Python language and libraries. This new software environment can be deployed on a variety of Xilinx platforms, from the simplest ones like ZYNQ boards, to more advanced and high performance ones, like Alveo accelerator cards and AWS EC2 F1 instances.
The use of cloud computing in this work lets us test the capabilities of this new workflow, from the creation and training of a Neural Network and the creation of a HLS project using HLS4ML, to testing the predictions of the NN using PYNQ APIs and functions written in Pytho
Enhancing CMS data analyses using a distributed high throughput platform
A flexible and dynamic environment capable of accessing distributed data and resources efficiently, is a key aspect for HEP data analysis, especially for the HL-LHC era. A quasi-interactive declarative solution, like ROOT RDataFrame, with scale-up capabilities via open-source standards like Dask, can profit from the "HPC, Big Data and Quantum Computing" Italian Center DataLake model under development. The starting point is a prototypal CMS high throughput analysis platform, offloaded on local Tier-2.
This contribution evaluates the scalability, identifies bottlenecks and explores the interactivity of such platform, on two use-cases: a CMS physics analysis with high-rate triggered events and a study of the CMS muon detector performance in phase-space regions driven by analysis needs, accessing detector datasets. The metrics used to evaluate the scaling and speed-up performance will be reported and results will be discussed, emphasising the differences with the legacy analysis workflows
Prototype of a cloud native solution of Machine Learning as Service for HEP
To favor the usage of Machine Learning (ML) techniques in High-Energy Physics (HEP) analyses it would be useful to have a service allowing to perform the entire ML pipeline (in terms of reading the data, training a ML model, and serving predictions) directly using ROOT files of arbitrary size from local or remote distributed data sources. The MLaaS4HEP framework aims to provide such kind of solution. It was successfully validated with a CMS physics use case which gave important feedback about the needs of analysts. For instance, we introduced the possibility for the user to provide pre-processing operations, such as defining new branches and applying cuts. To provide a real service for the user and to integrate it into the INFN Cloud, we started working on MLaaS4HEP cloudification. This would allow to use cloud resources and to work in a distributed environment. In this work, we provide updates on this topic, and in particular, we discuss our first working prototype of the service. It includes an OAuth2 proxy server as authentication/authorization layer, a MLaaS4HEP server, an XRootD proxy server for enabling access to remote ROOT data, and the TensorFlow as a Service (TFaaS) service in charge of the inference phase. With this architecture the user is able to submit ML pipelines, after being authenticated and authorized, using local or remote ROOT files simply using HTTP call
Cloud native approach for Machine Learning as a Service for High Energy Physics
Nowadays Machine Learning (ML) techniques are widely adopted in many areas of High Energy Physics (HEP) and certainly will play a significant role also in the upcoming High-Luminosity LHC (HL-LHC) upgrade foreseen at CERN. A huge amount of data will be produced by LHC and collected by the experiments, facing challenges at the exascale.
Here, we present Machine Learning as a Service solution for HEP (MLaaS4HEP) to perform an entire ML pipeline (in terms of reading data, processing data, training ML models, serving predictions) in a completely model-agnostic fashion, directly using ROOT files of arbitrary size from local or distributed data sources.
With the new version of MLaaS4HEP code based on uproot4, we provide new features to improve users’ experience with the framework and their workflows, e.g. users can provide some preprocessing operations to be applied to ROOT data before starting the ML pipeline. Then our approach is extended to use local and cloud resources via HTTP proxy which allows physicists to submit their workflows using the HTTP protocol. We discuss how this pipeline could be enabled in the INFN Cloud Provider and what could be the final architecture
Prototype of Machine Learning “as a Service” for CMS Physics in Signal vs Background discrimination
Exploring Patterns and Correlations in CMS Computing Operations Data with Big Data Analytics Techniques
Development of Machine Learning based muon trigger algorithms for the Phase2 upgrade of the CMS detector
After the high-luminosity upgrade of the LHC, the muon chambers of CMS Barrel must cope with an increase in the number of interactions per bunch crossing. Therefore, new algorithmic techniques for data acquisition and processing will be necessary in preparation for such a high pile-up environment. Using Machine Learning as a technique to tackle this problem, this paper focuses in the production of models - with data obtained through Monte Carlo simulations - capable of predicting the transverse momentum of muons crossing the CMS Barrel muon chambers, comparing them with the transverse momentum () assigned by the current CMS Level-1 trigger system
Machine Learning as a Service for High Energy Physics on heterogeneous computing resources
Machine Learning (ML) techniques in the High-Energy Physics (HEP) domain are ubiquitous and will play a significant role also in the upcoming High-Luminosity LHC (HL-LHC) upgrade foreseen at CERN: a huge amount of data will be produced by LHC and collected by the ex- periments, facing challenges at the exascale. Despite ML models are successfully applied in many use-cases (online and offline reconstruction, particle identification, detector simulation, Monte Carlo generation, just to name a few) there is a constant seek for scalable, performant, and production-quality operations of ML-enabled workflows. In addition, the scenario is complicated by the gap among HEP physicists and ML experts, caused by the specificity of some parts of the HEP typical workflows and solutions, and by the difficulty to formulate HEP problems in a way that match the skills of the Computer Science (CS) and ML community and hence its potential ability to step in and help. Among other factors, one of the technical obstacles resides in the difference of data-formats used by ML-practitioners and physicists, where the former use mostly flat-format data representations while the latter use to store data in tree-based objects via the ROOT data format. Another obstacle to further development of ML techniques in HEP resides in the difficulty to secure the adequate computing resources for training and inference of ML models, in a scalable and transparent way in terms of CPU vs GPU vs TPU vs other resources, as well as local vs cloud resources. This yields a technical barrier that prevents a relatively large portion of HEP physicists from fully accessing the potential of ML-enabled systems for scientific research. In order to close this gap, a Machine Learning as a Service for HEP (MLaaS4HEP) solution is presented as a product of R&D activities within the CMS experiment. It offers a service that is capable to directly read ROOT-based data, use the ML solution provided by the user, and ultimately serve predictions by pre-trained ML models “as a service” accessible via HTTP protocol. This solution can be used by physicists or experts outside of HEP domain and it provides access to local or remote data storage without requiring any modification or integration with the experiment specific framework. Moreover, MLaaS4HEP is built with a modular design allowing independent resource allocation that opens up a possibility to train ML models on PB-size datasets remotely accessible from the WLCG sites without physically downloading data into local storage.
To prove the feasibility and utility of the MLaaS4HEP service with large datasets and thus be ready for the next future when an increase of data produced is expected, an exploration of different hardware resources is required. In particular, this work aims to provide the MLaaS4HEP service transparent access to heterogeneous resources, which opens up the usage of more powerful resources without requiring any effort from the user side during the access and use phase
- …
