1,720,996 research outputs found

    Using Triangular Gate Voltage Pulses to Evaluate Hysteresis and Charge Trapping Effects in GaN on Si HEMTs

    Full text link
    Charge carrier traps due to crystal defects in GaN on Si HEMT devices are responsible for dynamic performance degradation, long-term reliability limitations, and peculiar failure modes. The behavior of traps depends on many variables including heterostructure quality, the specific device structure, and operating conditions. To study the short time dynamics of charge trapping and release on the threshold voltage shift and hysteresis of commercial normally off GaN HEMTs we use triangular 0–5 V gate voltage pulses in the μs to ms duration range. Measurements are performed for single pulses by varying pulse duration and for a train of a few pulses by varying their number. The results indicate that hysteresis and related threshold voltage shift occur after repeated pulses, suggesting an accumulation of trapped charges. However, for a triangular wave hysteresis vanishes, meaning that a dynamic balance between charge trapping and release is established in the device. This can be considered as a positive indicator of device robustness and reliability. The same method, used to measure the gate threshold voltage shift and dynamic RON after a 30 min off-state DC stress at VDS = 55 V with a floating gate, highlights an appreciable performance degradation of the device

    A study on the synchronization behaviour of Differential Evolution and a self-adaptive extension

    No full text
    Differential Evolution (DE) is a popular and efficient continuous optimization technique based on the principles of Darwinian evolution. Asynchronous Differential Evolution is a DE generalization that allows to regulate the synchronization mechanism of the algorithm by tuning two additional parameters. This paper, after providing a further experimental analysis of the impact of the DE synchronization scheme on the evolution, introduces three self-adaptive techniques to handle the synchronization parameters. Moreover the integration of these new regulatory synchronization techniques into state-of-the-art (self) adaptive DE schemes are also proposed. Experiments on widely accepted benchmark problems show that the new schemes are able to improve performances of the state-of-theart (self) adaptive DEs by introducing the new synchronization parameters in the online automated tuning process

    Graph analytics on modern massively parallel systems

    Full text link
    Graphs provide a very flexible abstraction for understanding and modeling complex systems in many fields such as physics, biology, neuroscience, engineering, and social science. Only in the last two decades, with the advent of Big Data era, supercomputers equipped by accelerators –i.e., Graphics Processing Unit (GPUs)–, advanced networking, and highly parallel file systems have been used to analyze graph properties such as reachability, diameter, connected components, centrality, and clustering coefficient. Today graphs of interest may be composed by millions, sometimes billions, of nodes and edges and exhibit a highly irregular structure. As a consequence, the design of efficient and scalable graph algorithms is an extraordinary challenge due to irregular communication and memory access patterns, high synchronization costs, and lack of data locality. In the present dissertation, we start off with a brief and gentle introduction for the reader to graph analytics and massively parallel systems. In particular, we present the intersection between graph analytics and parallel architectures in the current state-of-the-art and discuss the challenges encountered when solving such problems on large-scale graphs on these architectures (Chapter 1). In Chapter 2, some preliminary definitions and graph-theoretical notions are provided together with a description of the synthetic graphs used in the literature to model real-world networks. In Chapters 3-5, we present and tackle three different relevant problems in graph analysis: reachability (Chapter 3), Betweenness Centrality (Chapter 4), and clustering coefficient (Chapter 5). In detail, Chapter 3 tackles reachability problems by providing two scalable algorithms and implementations which efficiently solve st-connectivity problems on very large-scale graphs Chapter 4 considers the problem of identifying most relevant nodes in a network which plays a crucial role in several applications, including transportation and communication networks, social network analysis, and biological networks. In particular, we focus on a well-known centrality metrics, namely Betweenness Centrality (BC), and present two different distributed algorithms for the BC computation on unweighted and weighted graphs. For unweighted graphs, we present a new communication-efficient algorithm based on the combination of bi-dimensional (2D) decomposition and multi-level parallelism. Furthermore, new algorithms which exploit the underlying graph topology to reduce the time and space usage of betweenness centrality computations are described as well. Concerning weighted graphs, we provide a scalable algorithm based on an algebraic formulation of the problem. Finally, thorough comprehensive experimental results on synthetic and real- world large-scale graphs, we show that the proposed techniques are effective in practice and achieve significant speedups against state-of-the-art solutions. Chapter 5 considers clustering coefficients problem. Similarly to Betweenness Centrality, it is a fundamental tool in network analysis, as it specifically measures how nodes tend to cluster together in a network. In the chapter, we first extend caching techniques to Remote Memory Access (RMA) operations on distributed-memory system. The caching layer is mainly designed to avoid inter-node communications in order to achieve similar benefits for irregular applications as communication-avoiding algorithms. We also show how cached RMA is able to improve the performance of a new distributed asynchronous algorithm for the computation of local clustering coefficients. Finally, Chapter 6 contains a brief summary of the key contributions described in the dissertation and presents potential future directions of the work

    Solutions to the st-connectivity problem using a gpu-based distributed bfs

    No full text
    The st-connectivity problem (ST-CON) is a decision problem that asks, for vertices ss and tt in a graph, if tt is reachable from ss. Although originally defined for directed graphs, it can also be studied on undirected graphs and used as a building block for solving more complex tasks on large scale graphs. We present solutions to ST-CON based on a high performance Breadth First Search (BFS) executed on clusters of Graphics Processing Units (GPUs) using the Nvidia CUDA platform. To measure performances, we use the number of ST-CONs per second. We present the results for two different implementations that highlight the impact of atomic operations in CUDA

    Dynamic merging of frontiers for accelerating the evaluation of betweenness centrality

    No full text
    Betweenness Centrality (BC) is a widely used metric of the relevance of a node in a network. The fastest-known algorithm for the evaluation of BC on unweighted graphs builds a tree representing information about the shortest paths for each vertex to calculate its contribution to the BC score. Actually, for specific vertices, the shortest-path trees of neighboring nodes could be leveraged to reduce the computational burden, but existing BC algorithms do not exploit that information and carry out redundant computations. We propose a new algorithm, called dynamic merging of frontiers, which makes use of such information to derive the BC score of degree-2 vertices by re-using the results of the sub-trees of the neighbors. We implemented our idea in parallel fashion exploiting Graphics Processing Units. Compared to state-of-the-art implementations, our approach achieves a linear improvement in the number of degree-2 vertices and an average improvement of × over a variety of real-world graphs

    Betweenness centrality on Multi-GPU systems

    No full text
    Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the inuence of a vertex in a graph. The exact BC computation for a large scale graph is an extraordinary challenging and requires high performance computing techniques to provide results in a reasonable amount of time. Here, we present the techniques we developed to speed-up the computation of the BC on Multi-GPU systems. Our approach combines the bi-dimensional (2-D) decomposition of the graph and multi-level parallelism. Experimental results show that the proposed techniques are well suited to compute BC scores in graphs which are too large to fit in single GPU memory. In particular, the computation time of a 234 million edges graph is reduced to less than 2 hour

    Scalable betweenness centrality on multi-GPU systems

    No full text
    Betweenness Centrality (BC) is steadily growing in popularity as a metrics of the influence of a vertex in a graph. The BC score of a vertex is proportional to the number of all-pairs-shortest-paths passing through it. However, complete and exact BC computation for a large-scale graph is an extraordinary challenge that requires high performance computing techniques to provide results in a reasonable amount of time. Our approach combines bi-dimensional (2-D) decomposition of the graph and multi-level parallelism together with a suitable data-thread mapping that overcomes most of the difficulties caused by the irregularity of the computation on GPUs. In order to reduce time and space requirements of BC computation, a heuristics based on 1-degree reduction technique is developed as well. Experimental results on synthetic and real-world graphs show that the proposed techniques are well suited to compute BC scores in graphs which are too large to fit in the memory of a single computational node

    Accelerating energy games solvers on modern architectures

    No full text
    Quantitative games, where quantitative objectives are defined on weighted game arenas, provide natural tools for designing faithful models of embedded controllers. Instances of these games are the so called Energy Games. Starting from a sequential baseline implementation, we investigate the use of massively data computation capabilities supported by modern GPUs to solve the initial credit problem for Energy Games. We present different parallel implementations on multi-core CPU and GPU systems. Our solution outperforms the baseline implementation by up to 36x speedup and obtains a faster convergence time on real-world graphs

    Scaling betweenness centrality using communication-efficient sparse matrix multiplication

    No full text
    Betweenness centrality (BC) is a crucial graph problem that measures the significance of a vertex by the number of shortest paths leading through it. We propose Maximal Frontier Betweenness Centrality (MFBC): a succinct BC algorithm based on novel sparse matrix multiplication routines that performs a factor of p 1/3 less communication on p processors than the best known alternatives, for graphs withn vertices and average degree k = n/p 2/3. We formulate, implement, and prove the correctness of MFBC for weighted graphs by leveraging monoids instead of semirings, which enables a surprisingly succinct formulation. MFBC scales well for both extremely sparse and relatively dense graphs. It automatically searches a space of distributed data decompositions and sparse matrix multiplication algorithms for the most advantageous configuration. The MFBC implementation outperforms the well-known CombBLAS library by up to 8x and shows more robust performance. Our design methodology is readily extensible to other graph problems
    corecore