1,720,968 research outputs found
Dynamic Federated Learning for Heterogeneous Learning Environments
The emergence of the Internet of Things (IoT) has resulted in a massive influx of data generated by various edge devices. Machine learning models trained on this data can provide valuable insights and predictions, leading to better decision-making and intelligent applications. Federated Learning (FL) is a distributed learning paradigm that enables remote devices to collaboratively train models without sharing sensitive data, thus preserving user privacy and reducing communication overhead. However, despite recent breakthroughs in FL, the heterogeneous learning environments significantly limit its performance and hinder its real-world applications. The heterogeneous learning environment is mainly embodied in two aspects. Firstly, the statistically heterogeneous (usually non-independent identically distributed) data from geographically distributed clients can deteriorate the FL training accuracy. Secondly, the heterogeneous computing and communication resources in IoT devices often result in unstable training processes that slow down the training of a global model and affect energy consumption. Most existing studies address only the unilateral side of the heterogeneity issue, either the statistical or the resource heterogeneity. However, the resource heterogeneity among various devices does not necessarily correlate with the distribution of their training data. We propose Dynamic Federated Learning (DFL) to address the joint problem of data and resource heterogeneity in FL. DFL combines resource-aware split computing of deep neural networks and dynamic clustering of training participants based on the similarity of their sub-model layers. Using resource-aware split learning, the allocation of the FL training tasks on resource-constrained participants is adjusted to match their heterogeneous computing capabilities, while resource-capable participants carry out the classic FL training. We employ centered kernel alignment for determining the similarity of neural network layers to address the data heterogeneity and carry out layerwise sub-model aggregation. Preliminary results indicate that the proposed technique can improve training performance (i.e., training time, accuracy, and energy consumption) in heterogeneous learning environments with both data and resource heterogeneity
Distributed and Federated Learning Optimization with Federated Clustering of IID-users
Federated Learning (FL) is one of the leading learning paradigms for enabling a more significant presence of intelligent applications in networked and Internet of Things (IoT) systems. It consists of individual user devices performing machine learning (ML) models training locally, so that only trained models due to privacy concerns, but not raw data, is transferred through the network for aggregation at the edge or cloud data centers [Li et al. 2019]. Due to the pervasive presence of connected devices such as smart phones and IoT devices in peoples lives, there is a growing concern about how we can preserve and secure users’ information. FL reduces the risk of exposing user information to attackers during transmission over networks or information leakages at the central data centers. Another advantage of FL is scalability and maintainability of intelligent applications in networked and IoT systems. Considering highly distributed environments in which such systems are deployed, collecting and transmitting raw user data for training of ML models at central data centers is a challenging task as it imposes huge workload on the networks and consumes high bandwidth. Training of ML models is distributed over locations and transmitting the trained models for aggregation alleviates these challenges.
Among others, distributed and federated learning have applications in smart healthcare systems, where very sensitive user data is involved, and industrial IoT applications, where the amount of data for training may be too large and cumbersome to transport to central data centers. However, FL has the significant shortcoming of requiring user data to be Independent Identically Distributed (IID) (i.e., users which have similar data statistical distributions and are not mutually dependent) and make reliable predictions for a given group of users aggregated into a single model. IID users have similar statistical features, and thus can be aggregated into the same ML models. Since raw data is not available at the model aggregator, it is necessary to find IID users based solely on their trained machine learning models.
We present a Neural Network-based Federated Clustering mechanism capable of clustering IID with no access to their raw data called Neural-network SIMilarity estimator, NSIM. Such mechanism performs significantly better than competing techniques for neural-network clustering [Pacheco et al. 2021]. We also present an alternative to the FedAvg aggregation algorithm used in traditional FL, which significantly increases the aggregated models’ reliability in terms of Mean Square Error by creating several training models over IID users in a real-world mobility prediction dataset. We observe improvements of up to 97.52% in terms of Pearson correlation between the similarity estimation by NSIM and ground truth based on the LCSS (Longest Common Sub-Sequence) similarity metric, in comparison with other state-of-the-art approaches. Federated Clustering of IID data in different geographical locations can improve performance of early warning applications such as flood prediction [Samikwa et al. 2020], where the data for some locations may have more statistical similarities. We further present a technique for accelerating ML inference in resource-constrained devices through distributed computation of ML models over IoT networks, while preserving privacy. This has the potential to improve the performance of time sensitive ML applications
ARES: Adaptive Resource-Aware Split Learning for Internet of Things
Distributed training of Machine Learning models in edge Internet of Things (IoT) environments is challenging because of three main points. First, resource-constrained devices have large training times and limited energy budget. Second, resource heterogeneity of IoT devices slows down the training of the global model due to the presence of slower devices (stragglers). Finally, varying operational conditions, such as network bandwidth, and computing resources, significantly affect training time and energy consumption. Recent studies have proposed Split Learning (SL) for distributed model training with limited resources but its efficient implementation on the resource-constrained and decentralized heterogeneous IoT devices remains minimally explored. We propose Adaptive REsource-aware Splitlearning (ARES), a scheme for efficient model training in IoT systems. ARES accelerates local training in resource-constrained devices and minimizes the effect of stragglers on the training through device-targeted split points while accounting for time-varying network throughput and computing resources. ARES takes into account application constraints to mitigate training optimization tradeoffs in terms of energy consumption and training time. We evaluate ARES prototype on a real testbed comprising heterogeneous IoT devices running a widely-adopted deep neural network and dataset. Results show that ARES accelerates model training on IoT devices by up to 48% and minimizes the energy consumption by up to 61.4% compared to Federated Learning (FL) and classic SL, without sacrificing the model convergence and accurac
DFL: Dynamic Federated Split Learning in Heterogeneous IoT
Federated Learning (FL) in edge Internet of Things (IoT) environments is challenging due to the heterogeneous nature of the learning environment, mainly embodied in two aspects. Firstly, the statistically heterogeneous data, usually non-independent identically distributed (non-IID), from geographically distributed clients can deteriorate the FL training accuracy. Secondly, the heterogeneous computing and communication resources in IoT devices often result in unstable training processes that slow down the training of a global model and affect energy consumption. Most existing solutions address only the unilateral side of the heterogeneity issue but neglect the joint problem of resources and data heterogeneity for the resource-constrained IoT. In this article, we propose Dynamic Federated split Learning (DFL) to address the joint problem of data and resource heterogeneity for distributed training in IoT. DFL enhances training efficiency in heterogeneous dynamic IoT through resource-aware split computing of deep neural networks and dynamic clustering of training participants based on the similarity of their sub-model layers. We evaluate DFL on a real testbed comprising heterogeneous IoT devices using two widely-adopted datasets, in various non-IID settings. Results show that DFL improves training performance in terms of training time by up to 48%, accuracy by up to 32%, and energy consumption by up to 62.8% compared to classic FL and Federated Split Learning in scenarios with both data and resource heterogeneity
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
