1,721,070 research outputs found

    Enhancing Code Completion for Computer Musicians: A Dataset and Predictive Model for Pure Data

    No full text
    Pure Data (PD), a widely used visual programming language (VPL) in computer music, lacks robust code completion tools that are essential for enhancing usability and improving the user experience. While code completion tools are prevalent in textual programming languages, there is a gap in research on visual code completion for graph-based VPLs like PD, particularly regarding the selection of appropriate objects (nodes) and connections (edges) in specific contexts to enhance the user experience. PD's unique graph-based structure is fundamentally different from the linear nature of textual programming languages, making existing textual code completion tools unsuitable for computer musicians, and highlighting the need for a dedicated visual code completion solution. To address this gap in visual code completion, this thesis introduces TriGraph, a graph-based probabilistic model that predicts nodes and edges in PD graphs, providing an effective support tool for computer musicians. To develop TriGraph, we created a publicly available PD dataset by analyzing 6,534 projects from GitHub, then trained and evaluated 5 TriGraph models using statistical analysis of 1-node, 2-node, and 3-node subgraph frequencies to predict unknown nodes and edges in PD graphs. We also compared the performance of TriGraph with an n-gram-based KenLM model to assess the effectiveness of our graph-based approach. Our evaluations indicate that our TriGraph model achieves an average Mean Reciprocal Rank (MRR) score of 0.39 for node prediction, outperforming the KenLM model, and an average MRR score of 0.57 for edge prediction, placing the correct answer within the top 2-3 suggestions. Additionally, our analysis of the PD dataset revealed that most PD projects are small and simple, with few nodes, connections, and revisions, and are typically developed by a single author, with minimal changes made between successive revisions. This work significantly advances the field of computer music by providing improved support for PD users through the visual code completion model and a comprehensive dataset, helping both academic researchers and computer musicians navigate the complexities of visual programming languages more efficiently, and enhancing practical development in computer music

    Application of Natural Language Processing and Information Retrieval in Two Software Engineering Tools

    Full text link
    Many software engineering problems have traditionally been approached by applying techniques based on static analysis and fixed sets of rules. I created two novel techniques to tackle three software engineering problems: typo location, fix suggestion, and crash report bucket creation. However, unlike previous techniques based on static analysis or a fixed set of rules, these techniques are based on methods commonly used to handle natural language artifacts. Existing tools and previous work typically tries to be general and work with any valid program or theoretically possible output. In contrast, this thesis builds upon the success of prior work that successfully applied NLP models to code to improve code completion in an IDE (Integrated Development Environment). This thesis continues in that vein and presents tools that focus on the code that programmers actually write and the crashes that actually occur. First, I applied natural-language models to locate errors in source code that cause the code to fail to compile or create an error when the code runs. Language models can adapt to coding styles and idioms. My co-authors and I showed that a tool using an n-gram model of code previously compiled successfully could supplement errors with locations produced by the Java compiler. Using our tool to suggest a location after each error message produced by the Java compiler resulted in an MRR score 11-40% closer to a perfect score than the Java compiler's score. Then, my co-authors and I showed that a similar approach also worked with the Python interpreter, though it faced significantly more challenges. When combined with the Python interpreter's error messages, our approach correctly located an additional 9-23% of tested typos made by mutation. Next, my co-authors and I showed that the technique still worked in a more restricted offline setting. In addition, we showed that the approach could also accurately suggest changes to repair around a third of typos made by students. I also applied the TF-IDF representation and distance function to the task of bucketing (clustering) software crash reports. In all cases, performance (in terms of F1-score) matched or beat commonly used rule-based techniques. The TF-IDF-driven approach can adapt automatically to patterns in crash reports as they evolve. Additionally, several side benefits arose from using statistical techniques.Some errors in source code can be automatically repaired using a language model. Patterns in crash metadata can be extracted easily using a bag-of-words approach with a suitable tokenizer. This thesis’s results encourage research on approaches based on on-line off-the-shelf algorithms or models initially developed for natural-language artifacts with programming language and other software artifacts. However, this thesis’s results do not necessarily guarantee that such uses will be successful; it does indicate that they should, at least, be considered

    Classification and Analysis of 12-Lead Electrocardiograms

    Full text link
    The electrocardiogram is the standard tool for detecting cardiac abnormalities, such as atrial fibrillation, irregular complexes, and heart blocks. However, the interpretation of this data is an unsolved problem with discrepancies among panels of cardiologists and automated analysis requiring additional human over-reading. This thesis explores the classification of 12-lead ECGs to a set of 27 diagnoses as defined in the PhysioNet/CinC 2020 Challenge. I propose three approaches, starting with manual feature engineering and classification using shallow gradient boosted tree ensembles. Our second approach uses a deep learning approach by combining fixed and variable length autoencoders to learn the features, followed by a multi layer perceptron (MLP) classifier. Our third approach combines the deep autoencoders and our shallow decision tree ensembles by training the shallow gradient boosted trees with both the manually extracted features as well as the bottleneck dimension representation of the 12-lead ECG record. I empirically evaluate our different approaches using a weighted classification scoring function using repeated random subsampling of the publicly available challenge dataset. This thesis concludes with future ways to approach the multi-channel signal classification problem that addresses some of the limitations discovered in the prior approaches. Our best model, using the averaged top 1000 manually engineered features with autoencoder embeddings, attains a mean test split challenge metric of 0.4366 with an overall mean classification accuracy of 30.7%

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Green Mining: a Methodology of Relating Software Change and Configuration to Power Consumption

    Full text link
    Power consumption is becoming more and more important with the increased popularity of smart-phones, tablets and laptops. The threat of reducing a customer’s battery-life now hangs over the software developer, who now asks, “will this next change be the one that causes my software to drain a customer’s battery?” One solution is to detect power consumption regressions by measuring the power usage of tests, but this is time-consuming and often noisy. An alternative is to rely on software metrics that allow us to estimate the impact that a change might have on power consumption thus relieving the developer from expensive testing. This paper presents a general methodology for investigating the impact of software change on power consumption, we relate power consumption to software changes, and then investigate the impact of OO software metrics and churn metrics on power consumption. We demonstrated that software change can effect power consumption using the Firefox web-browser and the Azureus/Vuze BitTorrent client. We found evidence of a potential relationship between some software metrics and power consumption. We also investigate the effect of library versioning on the power consumption of rTorrent. In conclusion, we investigate the effect of software change on power consumption on two projects; and we provide an initial investigation on the impact of software metrics on power consumption

    Orchestrating Your Cloud Orchestra

    No full text
    Cloud computing potentially ushers in a new era of computer music performance with exceptionally large computer music instruments consisting of 10s to 100s of virtual machines which we propose to call a ‘cloud-orchestra’. Cloud computing allows for the rapid provisioning of resources, but to deploy such a complicated and interconnected network of software synthesizers in the cloud requires a lot of manual work, system administration knowledge, and developer/operator skills. This is a barrier to computer musicians whose goal is to produce and perform music, and not to administer 100s of computers. This work discusses the issues facing cloud-orchestra deployment and offers an abstract solution and a concrete implementation. The abstract solution is to generate cloud orchestra deployment plans by allowing computer musicians to model their network of synthesizers and to describe their resources. A model optimizer will compute near-optimal deployment plans to synchronize, deploy, and orchestrate the start-up of a complex network of synthesizers deployed to many computers. This model driven development approach frees computer musicians from much of the hassle of deployment and allocation. Computer musicians can focus on the configuration of musical components and leave the resource allocation up to the modelling software to optimize
    corecore