Search CORE

1,720,989 research outputs found

What if a bug has a Different Origin?: Making Sense of Bugs Without an Explicit Bug Introducing Change

Author: Gonzalez-Barahona Jesus M. (author)
Serebrenik Alexander (author)
Rodriguez Perez G. (author)
Robles Gregorio (author)
Zaidman A.E. (author)
Publication venue
Publication date: 01/01/2018
Field of study

Background: Many studies in the software research literature on bug fixing are built upon the assumption that "a given bug was introduced by the lines of code that were modified to fix it", or variations of it. Although this assumption seems very reasonable at first glance, there is little empirical evidence supporting it. A careful examination surfaces that there are other possible sources for the introduction of bugs such as modifications to those lines that happened before the last change an changes external to the piece of code being fixed. Goal: We aim at understanding the complex phenomenon of bug introduction and bug fix. Method: We design a preliminary approach distinguishing between bug introducing commits (BIC) and first failing moments (FFM). We apply this approach to Nova and ElasticSearch, two large and well-known open source software projects. Results: In our initial results we obtain that at least 24% bug fixes in Nova and 10% in ElasticSearch have not been caused by a BIC but by co-evolution, compatibility issues or bugs in external API. Merely 26--29% of BICs can be found using the algorithm based on the assumption that "a given bug was introduced by the lines of code that were modified to fix it". Conclusions: The approach allows also for a better framing of the comparison of automatic methods to find bug inducting changes. Our results indicate that more attention should be paid to whether a bug has been introduced and, when it was introduced.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Software Engineerin

TU Delft Repository

Dataset of Author Names and Name Frequencies

Author: Karnauch Andrey
Mockus Audris (8427702)
Dey Tapajit (8427696)
Karnauch Andrey (8427699)
Fry Tanner
Dey Tapajit
Fry Tanner (8427693)
Mockus Audris
Publication venue
Publication date: 06/02/2020
Field of study

This file is a gzipped semicolon separated text file containing block id, frequency of the first name (number of times it appears in the 38M WoC version Q author IDs), frequency of the last name, full name, email, and Author ID. The largest block contains 993 Author IDs. </p

ZENODO

The Francis Crick Institute

Dataset of Author Names and Name Frequencies

Author: Karnauch Andrey
Dey Tapajit
Fry Tanner
Mockus Audris
Publication venue
Publication date: 09/03/2020
Field of study

This file is a gzipped semicolon separated text file containing block id, frequency of the first name (number of times it appears in the 38M World of Code version Q author IDs), frequency of the last name, full name, email, and Author ID. The largest block contains 993 Author IDs. The email address and Author IDs of individual authors have been replaced by their corresponding SHA1 values for privacy reasons

ZENODO

A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits

Author: Mockus Audris
Publication venue
Publication date: 07/02/2020
Field of study

A dataset accompanying the submission to MSR'20 data showcas

ZENODO

data for "The Role of Data Filtering in Open Source Software Ranking and Selection"

Author: Mockus Audris
Publication venue
Publication date: 20/04/2024
Field of study

ZENODO

Who Will Stay in the FLOSS Community? Modeling Participant's Initial Behavior

Author: Zhou Minghui
Mockus Audris
Publication venue
Publication date: 01/01/2015
Field of study

Motivation: To survive and succeed, FLOSS projects need contributors able to accomplish critical project tasks. However, such tasks require extensive project experience of long term contributors (LTCs). Aim: We measure, understand, and predict how the newcomers' involvement and environment in the issue tracking system (ITS) affect their odds of becoming an LTC. Method: ITS data of Mozilla and Gnome, literature, interviews, and online documents were used to design measures of involvement and environment. A logistic regression model was used to explain and predict contributor's odds of becoming an LTC. We also reproduced the results on new data provided by Mozilla. Results: We constructed nine measures of involvement and environment based on events recorded in an ITS. Macro-climate is the overall project environment while micro-climate is person-specific and varies among the participants. Newcomers who are able to get at least one issue reported in the first month to be fixed, doubled their odds of becoming an LTC. The macro-climate with high project popularity and the micro-climate with low attention from peers reduced the odds. The precision of LTC prediction was 38 times higher than for a random predictor. We were able to reproduce the results with new Mozilla data without losing the significance or predictive power of the previously published model. We encountered unexpected changes in some attributes and suggest ways to make analysis of ITS data more reproducible. Conclusions: The findings suggest the importance of initial behaviors and experiences of new participants and outline empirically-based approaches to help the communities with the recruitment of contributors for long-term participation and to help the participants contribute more effectively. To facilitate the reproduction of the study and of the proposed measures in other contexts, we provide the data we retrieved and the scripts we wrote at https://www.passion-lab.org/projects/developerfluency.html.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000347788400006&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701Computer Science, Software EngineeringEngineering, Electrical & ElectronicSCI(E)[email protected]; [email protected]

Crossref

Dataset: Copy-based Reuse in Open Source Software

Author: Jahanshahi Mahmoud
Mockus Audris
Publication venue
Publication date: 2024
Field of study

In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copy based reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks

University of Tennessee, Knoxville: Trace

Special Issue on Maintenance and Metrics

Author
Publication venue
Publication date: 2006
Field of study

King's Research Portal

Bot-NonBot-Commit-Msg

Author: Dey Tapajit
Mockus Audris
Publication venue
Publication date: 21/09/2020
Field of study

This dataset contains the commit messages of the 13,150 bots and 13,150 human developers, as part of the data used in https://dl.acm.org/doi/10.1145/3379597.3387478. For privacy concerns, the developer identities have been replaced with their corresponding SHA1 values. The format of the data is: ; ;commit message; <whether the developer is a bot or non-bot

ZENODO

Leveraging Risk Models to Improve Productivity for Effective Code Un-Freeze at Scale

Author: Abreu Rui
Mockus Audris
Publication venue
Publication date: 2025
Field of study

Changing software is essential to add needed functionality and to fix problems, but changes may introduce defects that lead to outages. This motivates one of the oldest software quality control techniques: a temporary prevention of non-critical changes to the codebase — code freeze. Despite its widespread use in practice, research literature is scant. Historically, code freezes were used as a way to improve software quality by preventing changes during periods before software releases, but code freezes significantly slow down development. To address this shortcoming we develop and evaluate a family of code un-freeze (permitting changes) strategies tailored to different occasions and products at Meta. They are designed to un-freeze the maximum amount of code without compromising quality. The three primary dimensions to un-freeze involve a) the exact timing of (and the reasoning behind it) the code freezes, b) the parts of the organization or the codebase where the codebase freeze is applied to, and c) the method of screening of the code diffs during the code freeze with the aim to allow low risk diffs and prevent only the most risky diffs. To operationalize the drivers of outages, we consider the entire network of interdependencies among different parts of the source code, the engineers that modify the code, code complexity, and the coordination dependencies and authors’ expertise. Since the code freeze is a balancing act between reducing outages and allowing software development to proceed unimpeded, the performance of the various approaches to code un-freeze is evaluated based on the fraction of flagged/gated changes to measure overhead and the fraction of all outage-causing changes contained within the set of flagged set of changes to measure the ability of the code un-freeze to delay (or prevent) outages. We found that taking into account the risk posed by modifying individual files and the properties of the change we could un-freeze two and 2.5 times more changes correspondingly. The change level model is used by Meta in production. For example, during the winter 2023 code freeze, we see that only 16% of changes are gated. Although 42% more changes landed (were integrated into the codebase) compared to the prior year, there was a 52% decrease in outages. This reduction meant less impact on users and less strain on engineers during the holiday period. The risk model has been enormously effective at allowing low risk changes to proceed while gating high risk changes and reducing outages

University of Tennessee, Knoxville: Trace