1,720,959 research outputs found

    A Bayesian network approach for compiler auto-tuning for embedded processors

    No full text
    The complexity and diversity of today's architectures require an additional effort from the programmers in porting and tuning the application code across different platforms. The problem is even more complex when considering that also the compiler requires some tuning, since standard optimization options have been customized for specific architectures or designed for the average case. This paper proposes a machine-learning approach for reducing the cost of the compiler auto-tuning phase and to speedup the application performance in embedded architectures. The proposed framework is based on an application characterization done dynamically with microarchitecture independent features and based on the usage of Bayesian Networks. The main characteristic of the Bayesian Network approach consists of not describing the solution as a strict set of compiler transformations to be applied, but as a complex probability distribution function to be sampled. Experimental results, carried out on an ARM platform and GCC transformation space, proved the effectiveness of the proposed methodology for the selected benchmarks. The selected set of solutions (less than 10% of the search space) demonstrated to be very close to the optimal sequence of transformations, showing also an applications performance speedup up to 2.8 (1.5 on average) with respect to -O2 and -O3 for the cBench suite. Additionally, the proposed method demonstrated a 3× speedup in terms of search time with respect to an iterative compilation approach, given the same quality of the solution

    Predictive modeling methodology for compiler phase-ordering

    No full text
    Today's compilers offer a huge number of transformation options to choose among and this choice can significantly impact on the performance of the code being optimized. Not only the selection of compiler options represents a hard problem to be solved, but also the ordering of the phases is adding further complexity, making it a long standing problem in compilation research. This paper presents an innovative approach for tackling the compiler phase-ordering problem by using predictive modeling. The proposed methodology enables i) to efficiently explore compiler exploration space including optimization permutations and repetitions and ii) to extract the application dynamic features to predict the next-best optimization to be applied to maximize the performance given the current status. Experimental results are done by assessing the proposed methodology with utilizing two different search heuristics on the compiler optimization space and it demonstrates the effectiveness of the methodology on the selected set of applications. Using the proposed methodology on average we observed up to 4% execution speedup with respect to LLVM standard baseline

    COBAYN: Compiler autotuning framework using Bayesian networks

    Full text link
    The variety of today's architectures forces programmers to spend a great deal of time porting and tuning application codes across different platforms. Compilers themselves need additional tuning, which has considerable complexity as the standard optimization levels, usually designed for the average case and the specific target architecture, often fail to bring the best results. This article proposes COBAYN: Compiler autotuning framework using Bayesian Networks, an approach for a compiler autotuning methodology using machine learning to speed up application performance and to reduce the cost of the compiler optimization phases. The proposed framework is based on the application characterization done dynamically by using independent microarchitecture features and Bayesian networks. The article also presents an evaluation based on using static analysis and hybrid feature collection approaches. In addition, the article compares Bayesian networks with respect to several state-of-the-art machine-learning models. Experiments were carried out on an ARM embedded platform and GCC compiler by considering two benchmark suites with 39 applications. The set of compiler configurations, selected by the model (less than 7% of the search space), demonstrated an application performance speedup of up to 4.6× on Polybench (1.85× on average) and 3.1× on cBench (1.54× on average) with respect to standard optimization levels. Moreover, the comparison of the proposed technique with (i) random iterative compilation, (ii) machine learning-based iterative compilation, and (iii) noniterative predictive modeling techniques shows, on average, 1.2×, 1.37×, and 1.48×speedup, respectively. Finally, the proposed method demonstrates 4×and 3×speedup, respectively, on cBench and Polybench in terms of exploration efficiency given the same quality of the solutions generated by the random iterative compilation model

    MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning

    Full text link
    Recent compilers offer a vast number of multilayered optimizations targeting different code segments of an application. Choosing among these optimizations can significantly impact the performance of the code being optimized. The selection of the right set of compiler optimizations for a particular code segment is a very hard problem, but finding the best ordering of these optimizations adds further complexity. Finding the best ordering represents a long standing problem in compilation research, named the phase-ordering problem. The traditional approach of constructing compiler heuristics to solve this problem simply cannot cope with the enormous complexity of choosing the right ordering of optimizations for every code segment in an application. This article proposes an automatic optimization framework we call MiCOMP, which Mitigates the Compiler Phase-ordering problem. We perform phase ordering of the optimizations in LLVM’s highest optimization level using optimization sub-sequences and machine learning. The idea is to cluster the optimization passes of LLVM’s O3 setting into different clusters to predict the speedup of a complete sequence of all the optimization clusters instead of having to deal with the ordering of more than 60 different individual optimizations. The predictive model uses (1) dynamic features, (2) an encoded version of the compiler sequence, and (3) an exploration heuristic to tackle the problem. Experimental results using the LLVM compiler framework and the Cbench suite show the effectiveness of the proposed clustering and encoding techniques to application-based reordering of passes, while using a number of predictive models. We perform statistical analysis on the results and compare against (1) random iterative compilation, (2) standard optimization levels, and (3) two recent prediction approaches. We show that MiCOMP’s iterative compilation using its sub-sequences can reach an average performance speedup of 1.31 (up to 1.51). Additionally, we demonstrate that MiCOMP’s prediction model outperforms the -O1, -O2, and -O3 optimization levels within using just a few predictions and reduces the prediction error rate down to only 5%. Overall, it achieves 90% of the available speedup by exploring less than 0.001% of the optimization space

    A hybrid autotuning framework for performance optimization of heterogeneous systems

    Full text link
    LAUREA MAGISTRALELa crescente complessit`a del moderno design hardware multi e manycore rende l’ottimizzazione delle prestazioni delle applicazioni un compito difficile. Mentre l’aiuto della sintonizzazione automatica conclusa con successo `e stata la riduzione dei tempi di esecuzione, sono emersi i nuovi obiettivi di prestazione che comprendono il consumo di energia, il costo computazionale e l’area. Gli approcci di ottimizzazione automatica spaziano dal relativamente non intrusivo (ad esempio, utilizzando le opzioni del compilatore) alle estese modifiche del codice che tentano di sfruttare specifiche caratteristiche architettoniche. Le tecniche intrusive spesso portano a modifiche del codice che non sono facilmente reversibili, il che pu`o avere un impatto negativo sulla leggibilitA ̃ , sulla manutenibilit`a e sulle prestazioni su diverse architetture. Pertanto, sono necessari metodi piu` sofisticati in grado di sfruttare e identificare i trade-off tra questi obiettivi. Introduciamo una struttura di ottimizzazione ibrida per ottimizzare il codice per due criteri principali che si confrontano reciprocamente, ad es. Tempo di esecuzione e utilizzo delle risorse in diversi livelli, a partire dal codice sorgente originale fino ad un livello di sintesi di alto livello. Sono coinvolti diversi strumenti e ottimizzazioni efficaci, ovvero il framework OpenTuner per la creazione di autotuner di programmi multi-obiettivo specifici del dominio, il sistema di ottimizzazione empirica basato su Annotation chiamato Orio e uno strumento di sintesi di alto livello denominato LegUp sono i componenti di ottimizzazione del nostro framework . Il framework mira a migliorare sia le prestazioni che la produttivit`a attraverso una procedura semi-automatica. La nostra catena supporta l’ottimizzazione del codice indipendente dall’ architettura e l’architettura specifica e pu`o essere adattata a qualsiasi architettura di piattaforma hardware. Dopo aver identificato i parametri di ottimizzazione dell’applicazione tramite OpenTuner, passiamo il codice an- notato come input a Orio che genera molte versioni ottimizzate e restituisce la versione con le migliori prestazioni. Inoltre, LLVM esegue un numero di passaggi di ottimizzazione in base al risultato di Orio e, infine, LegUp utilizzer`a l’output LLVM per la sintesi di una determinata piattaforma target aggiungendo le sue ottimizzazioni. Dimostriamo che il nostro approccio automatizzato pu`o migliorare i tempi di esecuzione e l’utilizzo delle risorse su HLS attraverso diversi livelli di ottimizzazione.The increasing complexity of modern multi and many-core hardware design makes performance tuning of the applications a difficult task. While the aid of the successful past automatic tuning has been the execution time minimization, the new performance objectives have emerged comprise of energy consumption, computational cost, and area. Automatic Tuning approaches range from the relatively non-intrusive (e.g., by using compiler options) to extensive code modifications that attempt to exploit specific architectural features. Intrusive techniques often result in code changes that are not easily reversible, which can negatively impact readability, maintainability, and performance on different architectures. Therefore, more sophisticated methods capable of exploiting and identifying the trade-offs among these goals are required. We introduce a Hybrid Optimization framework to optimize the code for two main mutually competing criteria, e.g., execution time and resource usage in several layers starting from the original source code to a high-level synthesis level. Several effective tools and optimizations are involved, i.e., OpenTuner framework for building domain-specific multi-objective program autotuners, Annotation-based empirical tuning system called Orio, and a high-level synthesis tool named LegUp are the optimization components of our framework. The framework aims at improving both performance and productivity over a semi-automated procedure. Our chain supports both architecture-independent and architecture-specific code optimization and can be adapted to any hardware platform architecture. After identifying the application’s optimization parameters through OpenTuner, we pass the annotated code as input to Orio which generates many tuned versions and returns the version with the best performance. Furthermore, LLVM performs a number of optimization passes according to the Orio’s result and finally, LegUp will use the LLVM output to synthesis for a particular target platform adding its optimizations. We show that our automated approach can improve the execution time and resource usage on HLS through different optimization levels

    Compiler autotuning using machine learning techniques

    Full text link
    I recenti sviluppi nella produzione di silicio e la fabbricazione hanno portato alla creazione di unità molto più veloce di calcolo, come CPU, GPU, FPGA, e chip simili con diversi set di istruzioni architetture (ISA). Software (SW) programmazione paradigmi tra cui OpenMP, MPI, OpenCL, e OpenACC consentono agli sviluppatori di software di sfruttare hardware (HW) parallelismo codici seriali porta legacy su queste piattaforme emergenti per ottenere incrementi nella velocità di applicazione. I compilatori lottano per tenere il passo con il ritmo crescente sviluppo di continua espansione hardware e software paradigmi di programmazione. Inoltre, crescente complessità dei compilatori moderni e la preoccupazione per la sicurezza sono tra i problemi più gravi che i compilatori dovrebbero rispondere. La legge di Moore afferma che la densità transistor dovrebbe raddoppiare ogni due anni; tuttavia, il tasso di compilatori, che si trovano ad affrontare molti problemi aperti di ricerca, non sono stati in grado di migliorare più di un paio di punti percentuali ogni anno. La diversità delle architetture di oggi hanno costretto i programmatori a spendere ulteriore ef- forte alla porta e ottimizzare il loro codice di applicazione su diverse piattaforme. I compilatori all'interno di questo processo hanno bisogno di ulteriori operazioni di ottimizzazione, che è un compito difficile in sé. compilatori recenti di- fer un vasto numero di ottimizzazioni multistrato, capaci di colpire diversi segmenti di codice di un'applicazione. Scegliendo tra queste ottimizzazioni può significativamente impatto le prestazioni del codice essere ottimizzato. La scelta del giusto set di ottimizzazioni del compilatore per un particolare segmento di codice è un problema molto difficile, ma trovare, ing il migliore ordinamento di queste ottimizzazioni aggiunge ulteriore complessità. In effetti, trovare il miglior ordinamento è un annoso problema nella ricerca di compilazione chiamato il problema ordine di fase. L'approccio tradizionale di costruire euristiche compilatore per risolvere questo problema semplicemente non possono far fronte con l'enorme complessità di scegliere il giusto ordine delle ottimizzazioni per ogni segmento di codice in un'applicazione. In questa tesi di dottorato, forniamo approcci break-through per affrontare e mitigare i ben noti problemi di ottimizzazione del compilatore utilizzando l'esplorazione dello spazio di progettazione e tecniche di apprendimento macchina. Abbiamo dimostrato che non tutti i passi di ottimizzazione sono utili per essere utilizzato all'interno di una sequenza di ottimizzazione e di fatto molti dei passaggi disponibili sono cancellando l'effetto di uno con l'altro in fase d'ordine delle fasi sono presi in considerazione. I risultati sperimentali mostrano notevole miglioramento metriche di performance in cui i nostri modelli di previsione personalizzati sono in atto contro ottimizzazione fissa standard di pass predefiniti all'interno dello Stato-of-the-art quadri compilatore per esempio GCC, LLVM, ecc perfetta- ottimizzazione specifica applicazione modulo in base alle caratteristiche delle applicazioni oggetto di analisi e si dimostra che questa metodologia è utile per mitigare il problema difficile di selezionare i migliori ottimizzazioni del compilatore e il problema della fase-ordinazione. In ritardo ma non meno importante, ci auguriamo che gli approcci proposti in questa tesi di dottorato saranno utili per una vasta gamma di lettori, tra cui gli architetti informatici, sviluppatori del compilatore, ricercatori e professionisti tecnici.Recent developments in silicon production and fabrication led to the creation of much faster computational units such as CPUs, GPUs, FPGAs, and similar chips with varying instruction set architectures (ISAs). Software (SW) programming paradigms including OpenMP, MPI, OpenCL, and OpenACC allow software developers to exploit Hardware (HW) parallelism to port legacy serial codes on these emerging platforms to attain application speedups. Compilers struggle to keep up with the increasing development pace of ever-expanding hardware and software programming paradigms. Additionally, growing complexity of the modern compilers and the concern over security are among the more serious problems that compilers should answer. Moore’s law states that transistor density should double every two years; however, the rate of compilers, which are faced with many open-research problems, have not been able to improve more than a few percentage points each year. Diversity of today’s architectures have forced programmers to spend additional ef- fort to port and tune their application code across different platforms. Compilers within this process need additional tuning which is a hard task itself. Recent compilers of- fer a vast number of multilayered optimizations, capable of targeting different code segments of an application. Choosing among these optimizations can significantly im- pact the performance of the code being optimized. The selection of the right set of compiler optimizations for a particular code segment is a very hard problem, but find- ing the best ordering of these optimizations adds further complexity. In fact, finding the best ordering is a long standing problem in compilation research called the phase- ordering problem. The traditional approach of constructing compiler heuristics to solve this problem simply can not cope with the enormous complexity of choosing the right ordering of optimizations for every code segment in an application. In this PhD thesis, we provide break-through approaches to tackle and mitigate the well-known problems of compiler optimization using design space exploration and ma- chine learning techniques. We show that not all the optimization passes are beneficial to be used within an optimization sequence and in fact many of the available passes are obliterating the effect of one another when ordering of the phases are taken into account. Experimental results show major improvement in performance metrics when our customized prediction models are in place versus standard fixed optimization passes predefined within state-of-the-art compiler frameworks e.g. GCC, LLVM, etc. We per- form application specific optimization based on the characteristics of applications under analysis and we show that this methodology is beneficial to mitigate the hard problem of selecting the best compiler optimizations and the phase-ordering problem. Late but not least, we hope that the proposed approaches in this PhD thesis will be useful for a wide range of readers, including computer architects, compiler developers, researchers and technical professionals.DIPARTIMENTO DI ELETTRONICA, INFORMAZIONE E BIOINGEGNERIAComputer Science and Engineering28AMIGONI, FRANCESCOBONARINI, ANDRE

    Design space exploration methodology for compiler parameters in VLIW processors

    Full text link
    LAUREA MAGISTRALEI sistemi embedded possono essere considerati come sistemi di calcolo specializzati che possono essere usati per applicazioni multi-purpose che possono spaziare da telefoni cellulari fino ad applicazioni militari o di domotica. Sebbene le funzionalità di questi dispositivi siano diverse, la struttura di calcolo e il relativo progetto è strettamente collegato con la piattaforma e il paradigma di programmazione utilizzato. Di conseguenza, introducendo la tecnologia VLSI, il progetto di piattaforme complesse di tipo System-on-Chip (SoC) e della relativa rete di interconnessione on-chip (Network-on-Chip) deve essere dettagliatamente raffinato. L'obiettivo è massimizzare le prestazioni della piattaforma e minimizzare la potenza dissipata e altre metriche non funzionali del sistema. In tale fase di progetto, l'esplorazione dello spazio di progetto (Design Space Exploration) gioco un ruolo fondamentale per filtrare automaticamente i punti dello spazio di progetto e supportare il progettista nella fase di analisi. La presente tesi di ricerca ha come obiettivo principale l'eplorazione dei parametri del compilatore, in modo da esplorare automaticamente lo spazio di progetto e analizzare in modo congiunto i paramteri del compilatore e architetturali nei processori VLIW applicando tecniche casuali per il progetto degli esperimenti (Design of Experiment). La tesi affronta il problema proponendo una metodologia automatica basata su una tool-chain che include il tool MOST (Multi-Objective System Tune), un wrapper Ubunti e due compilatori open-source: LLVM e VEX. La tool-chain proposta consente al progettista di esplorare automaticamente, di ottimizzare e di analizzare le opzioni dello spazio di progetto usando diversi benchmark standard per applicazioni high-end embedded e di elaborazione dei segnali. La metodologia di analisi proposta può essere usata come tool-chain di benchmarking per valutare i parametri del compilatore e come sviluppo futuro per valutare i paramteri architetturali. La fase di ottimizzazione può essere eseguita come sviluppo futuro del progetto di ricerca per generalizzare gli andamenti evidenziati nell'analisi dei risultati sperimentali. Nel presente lavoro di tesi, l'approccio proposto è supportato da un ampio insieme di risultati sperimentatli che si basano su un insieme solido di analisi statistiche che evidenziano chiaramente le carattersitiche e gli effetti di ogni trasformazione applicata. L'analisi presenta risultati ottenuti utilizzando la metodologia proposta basata sui tool MOST, VEX e LLM che forniscono un solido ambiente di sperimentazione. Inoltre, nell'Appendice sono raccolti tutti i risultati sperimentali ottenuti nella presente tesi da utilizzare come rifermento per analisi successive.Embedded systems can be considered as specialized computing systems which can be used for multi-purpose application varying from mobile-phone to military and home-automation devices. Although the functionalities of these devices are differed, the computational structure and design is tightly connected with the platform and programmability in which they rely on. Consequently, by introducing the VLSI technology, designing complex systems-on-chip (SoC) platform and related Network-on-Chip (NoC) has to be finely tuned. The target is a multi-objective optimization problem: to maximize the performance of the platform and minimize the power consumption or other non-functional metrics. During this design phase, Design Space Exploration (DSE) plays a major role to benefit the designer, to prune the large design space and support the designer during the analysis phase. The research thesis targets the exploration of compiler options parameters, in order to automatically explore the design space and analyze the compiler-architecture co-design in VLIW processor by applying random design of experiment algorithm. The thesis tackles the aforementioned problem by proposing an automatic methodology based on a tool-chain including the MOST tool(Multi-Objective System Tuner), a Ubuntu wrapper and two open-source compilers; namely, LLVM and VEX. The proposed tool-chain enables the designer to automatically explore, optimize and analyze the options by using several standard benchmarks for both high-end embedded and signal processing applications. The analysis could be used as a tool-chain for benchmarking the compiler options and expanded to architectural options in the near future. The optimization phase could be done as a further step of the research to generalize the explored trends in the results' analysis. In this dissertation, the thesis is supported by a large set of experimental results relying on solid sets of statistical analysis which clearly shows the characteristics and the effects of each transformation. We targeted benchmarking with MOST software, VEX and LLVM simulator to provide solid experimental setup. In addition, the Appendix provided a complete hand-manual for designers in order to use as a multiple-purpose reference

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore