1,721,079 research outputs found

    The Impact of Dormant Defects on Defect Prediction: A Study of 19 Apache Projects

    Full text link
    Defect prediction models can be beneficial to prioritize testing, analysis, or code review activities, and has been the subject of a substantial effort in academia, and some applications in industrial contexts. A necessary precondition when creating a defect prediction model is the availability of defect data from the history of projects. If this data is noisy, the resulting defect prediction model could result to be unreliable. One of the causes of noise for defect datasets is the presence of "dormant defects," i.e., of defects discovered several releases after their introduction. This can cause a class to be labeled as defect-free while it is not, and is, therefore "snoring." In this article, we investigate the impact of snoring on classifiers' accuracy and the effectiveness of a possible countermeasure, i.e., dropping too recent data from a training set. We analyze the accuracy of 15 machine learning defect prediction classifiers, on data from more than 4,000 defects and 600 releases of 19 open source projects from the Apache ecosystem. Our results show that on average across projects (i) the presence of dormant defects decreases the recall of defect prediction classifiers, and (ii) removing from the training set the classes that in the last release are labeled as not defective significantly improves the accuracy of the classifiers. In summary, this article provides insights on how to create defects datasets by mitigating the negative effect of dormant defects on defect prediction

    Snoring: A noise in defect prediction datasets

    No full text
    In order to develop and train defect prediction models, researchers rely on datasets in which a defect is often attributed to a release where the defect itself is discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon as 'sleeping defects'. We call 'snoring' the phenomenon where classes are affected by sleeping defects only, that would be treated as defect-free until the defect is discovered. In this paper we analyze, on data from 282 releases of six open source projects from the Apache ecosystem, the magnitude of the sleeping defects and of the snoring classes. Our results indicate that 1) on all projects, most of the defects in a project slept for more than 20% of the existing releases, and 2) in the majority of the projects the missing rate is more than 25% even if we remove 50% of releases

    Relationship between design patterns defects and crosscutting concern scattering degree: an empirical study

    No full text
    Design patterns are solutions to recurring design problems, aimed at increasing reuse, code quality and, above all, maintainability and resilience to changes. Despite such advantages, the usage of design patterns implies the presence of crosscutting code implementing the pattern usage and access from other system components. When the system evolves, the presence of crosscutting code can cause repeated changes, possibly introducing defects. This study reports an empirical study, in which it is showed that, for three open source projects, the number of defects in design-pattern classes is in several cases correlated with the scattering degree of their induced crosscutting concerns, and also varies among different kinds of pattern

    3rd International Workshop on Designing Empirical Studies: Assessing the Effectiveness of Agile Methods (IWDES 2009)

    No full text
    Assessing the effectiveness of a development methodology is difficult and requires an extensive empirical investigation. Moreover, the design of such investigations is complex since they involve several stakeholders and their validity can be questioned if not replicated in similar and different contexts. Agilists are aware that data collection is important and the problem of designing and execute meaningful experiments is common. This workshop aims at creating a critical mass for the development of new and extensive investigations in the Agile world

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    An Empirical Study on the Maintenance of Source Code Clones

    No full text
    Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (a) proposes an automatic approach to classify the evolution of source code clone fragments, and (b) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement

    Self-Admitted Technical Debt Removal and Refactoring Actions: Co-Occurrence or More?

    No full text
    Technical Debt (TD) concerns the lack of an adequate solution in a software project, from its design to the source code. Its admittance through comments or commit messages is referred to as Self-Admitted Technical Debt (SATD). Previous research has studied SATD from different perspectives, including its distribution, impact on software quality, and removal. In this paper, we investigate the relationship between refactorings and SATD removal. By leveraging a dataset of SATD and their removals in four open-source projects and by using an automated refactoring detection tool, we study the co-occurrence of refactorings and SATD removals. Results of the study indicate that refactorings are more likely to co-occur with SATD removals than with other commits, however, in most cases, they belong to different quality improvement activities performed at the same time

    An empirical study on the co-occurrence between refactoring actions and Self-Admitted Technical Debt removal

    No full text
    Technical Debt (TD) concerns the lack of an adequate solution in a software project, from its design to the source code. Its admittance through source code comments, issues, or commit messages is referred to as Self-Admitted Technical Debt (SATD). Previous research has studied SATD from different perspectives, including its distribution, impact on software quality, and removal. In this paper, we investigate the relationship between refactoring and SATD removal. By leveraging a dataset of SATD and their removals in four open-source projects and by using an automated refactoring detection tool, we study the co-occurrence of refactoring and SATD removals. Results of the study indicate that refactoring is more likely to co-occur with SATD removals than with other commits, however, in most cases, they belong to different quality improvement activities performed at the same time. Moreover, if looking closely at refactoring actions co-occurring with SATD removal in the same code entities, a relationship between these activities can be found. Finally, we found how both source code quality metrics and SATD removals play a statistically significant role in the likelihood that the commit applies a refactoring action
    corecore