Search CORE

1,721,068 research outputs found

GHALogs: Large-scale dataset of GitHub actions runs

Author: Moriconi Florent; Durieux, Thomas; Falleri, Jean-Rémy; Troncy, Raphaël; Francillon, Aurélien
Publication venue
Publication date: 2025
Field of study

Coding practices : from documentation to detection

Author: Latappy Corentin
Publication venue
Publication date: 19/06/2024
Field of study

Les pratiques de code sont de plus en plus utilisées dans le domaine du développement logiciel. Leur mise en place permet d’assurer la maintenabilité, la lisibilité et la consistance du code, ce qui contribue fortement à la qualité logicielle. La majorité de ces pratiques est implémentée dans des outils d’analyse statique, ou linters, qui permettent d’alerter automatiquement les développeurs lorsqu’une pratique n’est pas respectée. Toutefois, de plus en plus d’organisations, ayant tendance à créer leurs propres pratiques internes, rencontrent des problèmes sur leur compréhension et leur adoption par les développeurs. Premièrement, afin d’être appliquée, une pratique doit d’abord être comprise par les développeurs, impliquant donc d’avoir une documentation correctement rédigée. Or, ce sujet de la documentation n’a été que peu étudié dans la littérature scientifique. Ensuite, pour favoriser leur adoption, il faudrait pouvoir étendre les outils d’analyse existants pour y intégrer de nouvelles pratiques, ce qui est difficile compte tenu de l’expertise nécessaire pour apporter ces modifications. Packmind, société bordelaise, développe une solution pour accompagner les développeurs à faire émerger ces pratiques internes à l’aide d’ateliers. Cependant, elle souffre des mêmes problématiques citées précédemment. Dans cette thèse, nous nous sommes d’abord intéressés à fournir des recommandations aux auteurs de la documentation des pratiques. Pour cela, nous avons analysé la documentation de plus de 100 règles provenant de 16 linters différents afin d’en extraire une taxonomie des objectifs de documentation et des types de contenu présents. Nous avons ensuite réalisé une enquête auprès de développeurs afin d’évaluer leurs attentes en termes de documentation. Cela nous a notamment permis d’observer que les raisons pour lesquelles une pratique doit être appliquée étaient très peu documentées, alors qu’elles sont perçues comme essentielles par les développeurs. Dans un second temps, nous avons étudié la faisabilité de l’identification automatique de violations de pratiques à partir d’exemples. Notre contexte, nous contraignant à détecter des pratiques internes pour lesquelles nous avons peu d’exemples pour apprendre, nous a poussé à mettre en place du transfert d’apprentissage sur le modèle de machine learning CodeBERT. Nous montrons que les modèles ainsi entraînés obtiennent de bonnes performances dans un contexte expérimental, mais que la précision s’écroule lorsque nous les appliquons à des bases de code réelles.Coding practices are increasingly used in the field of software development. Their implementation ensures maintainability, readability, and consistency of the code, which greatly contributes to software quality. Most of these practices are implemented in static analysis tools, or linters, which automatically alert developers when a practice is not followed. However,more and more organizations, tending to create their own internal practices, encounter problems with their understanding and adoption by developers. First, for a practice to be applied, it must first be understood by developers, thus requiring properly written documentation. Yet, this topic of documentation has been little studied in the scientific literature. Then, to promote their adoption, it would be necessary to be able to extend existing analysis tools to integrate new practices, which is difficult given the expertise required to make these modifications. Packmind, a company based in Bordeaux, develops a solution to support developers in bringing out these internal practices through workshops. However, it suffers from the same issues mentioned above. In this thesis, we first focused on providing recommendations to the authors of practice documentation. To do this, we analyzed the documentation of more than 100 rules from 16 different linters to extract a taxonomy of documentation objectives and types of content present. We then conducted a survey among developers to assess their expectations in terms of documentation. This notably allowed us to observe that the reasons why a practice should be applied were very poorly documented, while they are perceived as essential by developers. Secondly, we studied the feasibility of automatically identifying violations of practices from examples. Our context, forcing us to detect internal practices for which we have few examples to learn from, pushed us to implement transfer learning on themachine learning model CodeBERT.We show that the models thus trained achieve good performance in an experimental context, but that accuracy collapseswhenwe apply them to real code bases

Theses.fr

Support for the execution of jupyter notebooks in educational environments

Author: Casseau Christophe
Publication venue
Publication date: 20/06/2024
Field of study

Les notebooks sont devenus des outils incontournables dans le domaine de l’analyse de données. Initiés dans les années 1980 avec des logiciels tels que Mathematica et inspirés par le concept de la programmation littéraire de Knuth, leur popularité se concrétise grâce au projet Jupyter en 2014. Ils ont transformé la manière dont les scientifiques communiquent leurs idées en combinant du code exécutable parmi une grande variété de langages de programmation, des visualisations et des explications textuelles dans un même document interactif. Ils ont également largement investi le monde éducatif par exemple avec le programme CANDYCE lancé par l’état français en 2021. Ce programme encourage l’utilisation de l’environnement Jupyter dans l’enseignement des sciences du numérique et ce à tous les niveaux, du primaire à l’enseignement supérieur en proposant des notebooks éducatifs qui sont au coeur de cette thèse. Dans ce contexte éducatif, malgré leurs avantages indéniables, les notebooks présentent également des défis importants, notamment en matière de reproductibilité et de modèle d’exécution. En effet, les notebooks éducatifs embarquent une activité pédagogique contenant des instructions textuelles guidant les étudiants à travers les différentes tâches à réaliser. Ensuite, l’enseignant cherche à reproduire les résultats des étudiants en suivant un ordre le plus souvent linéaire. La reproductibilité des résultats constitue une promesse des notebooks, mais plusieurs études ont révélé des difficultés à atteindre cet objectif, nécessitant le développement d’approches pour accompagner les utilisateurs dans la création de notebooks reproductibles. De plus, le modèle d’exécution flexible des notebooks donne la possibilité aux étudiants d’exécuter les cellules de code dans un ordre différent de celui prévu par l’enseignant pouvant occasionner des erreurs et/ou des résultats trompeurs. Dans cette thèse, nous nous penchons sur ces deux défis que sont la reproductibilité des résultats et l’exécution des notebooks éducatifs. Notre objectif est de proposer deux approches indépendantes du langage de programmation afin d’accompagner les étudiants i) vers la reproductibilité des résultats dans un modèle d’exécution linéaire du haut vers le bas et ii) à l’exécution d’un notebook contenant un scénario c’est à dire des instructions liées à son exécution. Pour répondre à ces deux défis nous avons développé des outils directement intégrés à l’environnement JupyterLab : NORMetMOON. Ces outils ont permis de mettre en évidence à travers des expérimentations menées avec des étudiants de C.P.G.E et de première année universitaire une nette amélioration concernant les deux défis sans entraver l’apprentissage des étudiants.Notebooks have become essential tools in the field of data science. Initiated in the 1980s with software such asMathematica and inspired by Knuth’s concept of literate programming, their popularity was solidified with the Jupyter project in 2014. They have transformed how scientists communicate their ideas by combining executable code from a wide variety of programming languages, visualizations, and textual explanations in a single interactive document. They have also gain in popularity in the educational world, for example, with the CANDYCE program launched by the French government in 2021. This program encourages the use of the Jupyter environment in teaching digital sciences at all levels, from primary to higher education, by offering educational notebooks that are at the heart of this thesis. In this educational context, despite their undeniable advantages, notebooks also present significant challenges, particularly in terms of reproducibility and execution model. Indeed, educational notebooks embed a pedagogical activity containing textual instructions guiding students through the different tasks to be completed. Then, the teacher attempts to reproduce the students’ results by following a predominantly linear order. The reproducibility of results is a promise of notebooks, but several studies have revealed difficulties in achieving this goal, necessitating the development of approaches to support users in creating reproducible notebooks. Additionally, the flexible execution model of notebooks allows students to execute code cells in a different order than intended by the instructor, potentially leading to errors and/or misleading results. In this thesis, we address these two challenges : the reproducibility of results and the execution of educational notebooks. Our goal is to propose two language-agnostic approaches to assist students i) towards result reproducibility in a top-downlinear execution model and ii) in the execution of a notebook containing a scenario, i.e., instructions related to its execution. To tackle these challenges, we have developed tools directly integrated into the JupyterLab environment : NORM and MOON. Through experiments conducted with students from C.P.G.E. and first-year university, these tools have demonstrated a significant improvement in both challenges without hindering student learning

Theses.fr

Fine-grained, accurate and scalable source differencing

Author: Falleri Jean-Rémy
Martinez Matias
Publication venue
Publication date: 14/04/2024
Field of study

International audienceUnderstanding code changes is of crucial importance in a wide range of software evolution activities. The traditional approach is to use textual differencing, as done with success since the 1970s with the ubiquitous diff tool. However, textual differencing has the important limitation of not aligning the changes to the syntax of the source code. To overcome these issues, structural (i.e. syntactic) differencing has been proposed in the literature, notably GumTree which was one of the pioneering approaches. The main drawback of GumTree's algorithm is the use of an optimal, but expensive tree-edit distance algorithm that makes it difficult to diff large ASTs. In this article, we describe a less expensive heuristic that enables GumTree to scale to large ASTs while yielding results of better quality than the original GumTree. We validate this new heuristic against 4 datasets of changes in two different languages, where we generate edit-scripts with a median size 50% smaller and a total speedup of the matching time between 50x and 281x. CCS CONCEPTS• Software and its engineering → Software maintenance tools; Software configuration management and version control systems.</div

Portail HAL U-Bordeaux

A grounded theory of Community Package Maintenance Organizations-Registered Report

Author: Zimmermann Théo
Falleri Jean-Rémy
Publication venue
Publication date: 27/09/2021
Field of study

International audiencea) Context: In many programming language ecosystems, developers rely more and more on external open source dependencies, made available through package managers. Key ecosystem packages that go unmaintained create a health risk for the projects that depend on them and for the ecosystem as a whole. Therefore, community initiatives can emerge to alleviate the problem by adopting packages in need of maintenance. b) Objective: The goal of our study is to explore such community initiatives, that we will designate from now on as Community Package Maintenance Organizations (CPMOs) and to build a theory of how and why they emerge, how they function and their impact on the surrounding ecosystems. c) Method: To achieve this, we plan on using a qualitative methodology called Grounded Theory. We have begun applying this methodology, by relying on "extant" documents originating from several CPMOs. We present our preliminary results and the research questions that have emerged. We plan to answer these questions by collecting appropriate data (theoretical sampling), in particular by contacting CPMO participants and questioning them by e-mails, questionnaires or semi-structured interviews. d) Impact: Our theory should inform developers willing to launch a CPMO in their own ecosystem and help current CPMO participants to better understand the state of the practice and what they could do better

INRIA a CCSD electronic archive server

HAL Descartes

A grounded theory of Community Package Maintenance Organizations-Registered Report

Author: Zimmermann Théo
Falleri Jean-Rémy
Publication venue
Publication date: 27/09/2021
Field of study

Portail HAL U-Bordeaux

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository

Contributions to the use of code clone detectors in software maintenance tasks

Author: Charpentier Alan
Publication venue
Publication date: 17/10/2016
Field of study

L’existence de plusieurs copies d’un même fragment de code (nommées des clones dans lalittérature) dans un logiciel peut compliquer sa maintenance et son évolution. La duplication decode peut poser des problèmes de consistance, notamment lors de la propagation de correction debogues. La détection de clones est par conséquent un enjeu important pour préserver et améliorerla qualité logicielle, propriété primordiale pour le succès d’un logiciel.L’objectif général de cette thèse est de contribuer à l’usage des détecteurs de clones dans destâches de maintenance logicielle. Nous avons centré nos contributions sur deux axes de recherche.Premièrement, la méthodologie pour comparer et évaluer les détecteurs de clones, i.e. les benchmarksde clones. Nous avons empiriquement évalué un benchmark de clones et avons montré queles résultats dérivés de ce dernier n’étaient pas fiables. Nous avons également identifié des recommandationspour fiabiliser la construction de benchmarks de clones. Deuxièmement, la spécialisationdes détecteurs de clones dans des tâches de maintenance logicielle.Nous avons développé uneapproche spécialisée dans un langage et une tâche (la réingénierie) qui permet aux développeursd’identifier et de supprimer la duplication de code de leurs logiciels. Nous avons mené des étudesde cas avec des experts du domaine pour évaluer notre approche.The existence of several copies of a same code fragment—called code clones in the literature—in a software can complicate its maintenance and evolution. Code duplication can lead to consistencyproblems, especially during bug fixes propagation. Code clone detection is therefore a majorconcern to maintain and improve software quality, which is an essential property for a software’ssuccess.The general objective of this thesis is to contribute to the use of code clone detection in softwaremaintenance tasks. We chose to focus our contributions on two research topics. Firstly, themethodology to compare and assess code clone detectors, i.e. clone benchmarks. We perform anempirical assessment of a clone benchmark and we found that results derived from this latter arenot reliable. We also identified recommendations to construct more reliable clone benchmarks.Secondly, the adaptation of code clone detectors in software maintenance tasks. We developed aspecialized approach in one language and one task—refactoring—allowing developers to identifyand remove code duplication in their softwares. We conducted case studies with domain experts toevaluate our approach

Theses.fr

Oskar Bordeaux