1,721,011 research outputs found

    On the assessment of software defect prediction models via ROC curves

    Full text link
    Software defect prediction models are classifiers often built by setting a threshold t on a defect proneness model, i.e., a scoring function. For instance, they classify a software module non-faulty if its defect proneness is below t and positive otherwise. Different values of t may lead to different defect prediction models, possibly with very different performance levels. Receiver Operating Characteristic (ROC) curves provide an overall assessment of a defect proneness model, by taking into account all possible values of t and thus all defect prediction models that can be built based on it. However, using a defect proneness model with a value of t is sensible only if the resulting defect prediction model has a performance that is at least as good as some minimal performance level that depends on practitioners’ and researchers’ goals and needs. We introduce a new approach and a new performance metric (the Ratio of Relevant Areas) for assessing a defect proneness model by taking into account only the parts of a ROC curve corresponding to values of t for which defect proneness models have higher performance than some reference value. We provide the practical motivations and theoretical underpinnings for our approach, by: 1) showing how it addresses the shortcomings of existing performance metrics like the Area Under the Curve and Gini’s coefficient; 2) deriving reference values based on random defect prediction policies, in addition to deterministic ones; 3) showing how the approach works with several performance metrics (e.g., Precision and Recall) and their combinations; 4) studying misclassification costs and providing a general upper bound for the cost related to the use of any defect proneness model; 5) showing the relationships between misclassification costs and performance metrics. We also carried out a comprehensive empirical study on real-life data from the SEACRAFT repository, to show the differences between our metric and the existing ones and how more reliable and less misleading our metric can be

    Considerations on the region of interest in the ROC space

    No full text
    Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that (Formula presented.) (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method

    An empirical study on software understandability and its dependence on code characteristics

    Full text link
    ContextInsufficient code understandability makes software difficult to inspect and maintain and is a primary cause of software development cost. Several source code measures may be used to identify difficult-to-understand code, including well-known ones such as Lines of Code and McCabe's Cyclomatic Complexity, and novel ones, such as Cognitive Complexity.ObjectiveWe investigate whether and to what extent source code measures, individually or together, are correlated with code understandability.MethodWe carried out an empirical study with students who were asked to carry out realistic maintenance tasks on methods from real-life Open Source Software projects. We collected several data items, including the time needed to correctly complete the maintenance tasks, which we used to quantify method understandability. We investigated the presence of correlations between the collected code measures and code understandability by using several Machine Learning techniques.ResultsWe obtained models of code understandability using one or two code measures. However, the obtained models are not very accurate, the average prediction error being around 30%.ConclusionsBased on our empirical study, it does not appear possible to build an understandability model based on structural code measures alone. Specifically, even the newly introduced Cognitive Complexity measure does not seem able to fulfill the promise of providing substantial improvements over existing measures, at least as far as code understandability prediction is concerned. It seems that, to obtain models of code understandability of acceptable accuracy, process measures should be used, possibly together with new source code measures that are better related to code understandability

    A Dual Language Approach to the Development of Time-Critical Systems

    No full text
    Developing time-critical systems requires expressive, rigorous, easy to use notations to describe the time-related features of the systems, in a way that is formal enough to support and automate activities like property verification and test case generation. We propose a dual-language approach provided with a descriptive formalism for specifying the properties of a system and its components in addition to the typical UML (and UML-RT) diagrams. This description consists of a formula of a new logic, called OTL (Object Temporal Logic), which is an extension of OCL. The approach is applied to a case study derived from the authors' industrial experiences. © 2004 Elsevier B.V. All rights reserved

    Toward Inclusion of Children as Software Engineering Stakeholders

    No full text
    Background: A growing amount of software is available to children today. Children use both software that has been explicitly developed for them and software for general users. While they obtain clear benefits from software, such as access to creativity tools and learning resources, children are also exposed to several risks and disadvantages, such as privacy violation, inactivity, or safety risks that can even lead to death. The research and development community is addressing and investigating positive and negative impacts of software for children one by one, but no comprehensive model exists that relates software engineering and children as stakeholders in their own right. Aims: The final objective of this line of research is to propose effective ways in which children can be involved in Software Engineering activities as stakeholders. Specifically, in this paper, we investigate the quality aspects that are of interest for children, as quality is a crucial aspect in the development of any kind of software, especially for stakeholders like children. Method: Our contribution is based mainly on an analysis of studies at the intersection between Software Engineering (especially software quality) and Child Computer Interaction. Results: We identify a set of qualities and a preliminary set of guidelines that can be used by researchers and practitioners in understanding the complex interrelations between Software Engineering and children. Based on the qualities and the guidelines, researchers can design empirical investigations to obtain deeper insights into the phenomenon and propose new Software Engineering knowledge specific for this type of stakeholders. Conclusions: This conceptualization is a first step towards a framework to support children as stakeholders in software engineering

    Comparing Static Analysis and Code Smells as Defect Predictors: An Empirical Study

    No full text
    Background. Industrial software increasingly relies on open source software. Therefore, industrial practitioners need to evaluate the quality of a specific open source product they are considering for adoption. Automated tools greatly help assess open source software quality, by reducing the related costs, but do not provide perfectly reliable indications. Indications from tools can be used to restrict and focus manual code inspections, which are typically expensive and time-consuming, only on the code sections most likely to contain faults. Aim. We investigate the extent of the effectiveness of static analysis bug detectors by themselves and in combination with code smell detectors in guiding inspections. Method. We performed an empirical study, in which we used a bug detector (SpotBugs) and a code smell detector (JDeodorant). Results. Our results show that the selected bug detector is precise enough to justify inspecting the code it flags as possibly buggy. Applying the considered code smell detector makes predictions even more precise, but at the price of a rather low recall. Conclusions. Using the considered tools as inspection drivers proved quite useful. The relatively small size of our study does not allow us to draw universally valid conclusions, but our results should be applicable to source code of any kind, although they were obtained from open source code

    Experimenting Traditional and Modern Reliability Models in a 3-Years European Software Project

    No full text
    Reliability is a very important non-functional aspect for software systems and artefacts. In literature, several definitions of software reliability exist and several methods and approaches exist to measure reliability of a software project. However, in the literature no works focus on the applicability of these methods in all the development phases of real software projects.In this paper, we describe the methodology we adopted during the S-CASE FP7 European Project to predict reliability for both the S-CASE platform as well as for the software artefacts automatically generated by using the S-CASE platform. Two approaches have been adopted to compute reliability: the first one is the ROME Lab Model, a well adopted traditional approach in industry; the second one is an empirical approach defined by the authors in a previous work. An extensive dataset of results has been collected during all the phases of the project.The two approaches can complement each other, to support to prediction of reliability during all the development phases of a software system in order to facilitate the project management from a non-functional point-of-view

    Understanding and modeling aI-intensive system development

    No full text
    Developers of AI-Intensive Systems - i.e., systems that involve both 'traditional' software and Artificial Intelligence - are recognizing the need to organize development systematically and use engineered methods and tools. Since an AI-Intensive System (AIIS) relies heavily on software, it is expected that Software Engineering (SE) methods and tools can help. However, AIIS development differs from the development of 'traditional' software systems in a few substantial aspects. Hence, traditional SE methods and tools are not suitable or sufficient by themselves and need to be adapted and extended.A quest for 'SE for AI' methods and tools has started. We believe that, in this effort, we should learn from experience and avoid repeating some of the mistakes made in the quest for SE in past years. To this end, a fundamental instrument is a set of concepts and a notation to deal with AIIS and the problems that characterize their development processes.In this paper, we propose to describe AIIS via a notation that was proposed for SE and embeds a set of concepts that are suitable to represent AIIS as well. We demonstrate the usage of the notation by modeling some characteristics that are particularly relevant for AIIS
    corecore