1,720,994 research outputs found
On the Distribution of Bugs in the Eclipse System
The distribution of bugs in software systems has been shown to satisfy the Pareto principle, and typically shows a power-law tail when analyzed as a rank-frequency plot. In a recent paper, Zhang showed that the Weibull cumulative distribution is a very good fit for the Alberg diagram of bugs built with experimental data. In this paper, we further discuss the subject from a statistical perspective, using as case studies five versions of Eclipse, to show how log-normal, Double-Pareto, and Yule-Simon distributions may fit the bug distribution at least as well as the Weibull distribution. In particular, we show how some of these alternative distributions provide both a superior fit to empirical data and a theoretical motivation to be used for modeling the bug generation process. While our results have been obtained on Eclipse, we believe that these models, in particular the Yule-Simon one, can generalize to other software systems
Assessing traditional and new metrics for object-oriented systems
We present an extensive analysis of software metrics for 111 object-oriented systems written in Java. For each system, we considered 18 traditional metrics such as LOC and Chidamber and Kemerer metrics, as well as metrics derived from complex network theory and social network analysis. These metrics were computed at class level. We also considered two metrics at system level, namely the total number of classes and interfaces, and the fractal dimension. We discuss the distribution of these metrics, and their correlation, both at class and at system level. We found that most metrics follow a leptokurtotic distribution. Only a couple of metrics have patent normal behavior while three others are very irregular, and even bimodal. The statistics gathered allow us to study and discuss the variability of metrics along different systems, and to devise a roadmap for further research
Empirical study of software quality evolution in open source projects using agile practices
We analyse the time evolution of two open source Java projects: Eclipse and Netbeans, both developed following agile practices, though to a different extent. Our study is centered on quality analysis of the systems, measured as defects absence, and its relation with software metrics evolution. The two projects are described through a software graph in which nodes are represented by Java files and edges describe the existing relation between nodes. We propose a metrics suite for Java files based on Chidamber and Kemerer suite, and use it to study software evolution and its relationship with bug count
An empirical study of refactoring in the context of FanIn and FanOut coupling
The aim of refactoring is to reduce software complexity and hence simplify the maintenance process. In this paper, we explore the impact of refactorings on "FanIn" and "FanOut" coupling metrics through extraction of refactoring data from multiple releases of five Java open-source systems, We first considered how a single refactoring modified these metric values, what happened when refactorings had been applied to a single class in unison and finally, what influence a set of refactorings had on the shape of Fan In and Fan Out distributions. Results indicated that, on average, refactored classes tended to have larger FanIn and Fan Out values when compared with non-refactored classes. Where evidence of multiple (different) refactorings applied to the same class was found, the net effect (in terms of FanIn and Fan Out coupling values) was negligible
An Analysis of Bug Distribution in Object Oriented Systems
We introduced a new approach to describe Java software as graph, where nodes represent a Java file - called compilation unit (CU) - and an edges represent a relations between them. The software system is characterized by the degree distribution of the graph properties, like in-or-out links, as well as by the distribution of Chidamber and Kemerer metrics computed on its CUs. Every CU can be related to one or more bugs during its life. We find a relationship among the software system and the bugs hitting its nodes. We found that the distribution of some metrics, and the number of bugs per CU, exhibit a power-law behavior in their tails, as well as the number of CUs influenced by a specific bug. We examine the evolution of software metrics across different releases to understand how relationships among CUs metrics and CUs faultness change with time
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
An Empirical Study of Social Networks Metrics in Object Oriented Software
We study the application to object-oriented software of new metrics, derived from Social Network Analysis. Social Networks metrics, as for instance the EGO metrics, allow to identify the role of each single node in the information flow through the network, being related to
software modules and their dependencies. These metrics are compared with other traditional software metrics, like the Chidamber-Kemerer suite, and software graph metrics.
We examine the empirical distributions of all the metrics, bugs included, across the software modules of several releases of two large Java systems, Eclipse and Netbeans. We provide analytical distribution functions suitable for describing and studying the observed distributions.
We study also correlations among metrics and bugs.
We found that the empirical distributions systematically show fat-tails for all the metrics.
Moreover, the various metric distributions look very similar and consistent across all system releases, and are also very similar in both the studied systems. These features appear to be
typical properties of these software metrics
An analysis of SNA metrics on the Java Qualitas Corpus
We computed the software graphs of 96 systems of the Java Qualitas Corpus, parsing the source code and identifying the dependencies among classes. We analyzed 12 software metrics on these 96 graphs, nine borrowed from Social Network Analysis (SNA), and three more traditional software metrics, such as Loc, Fan-in and Fan-out. We analyzed their correlations at system level, and studied the correlation statistics at data-set level. Our results show that these correlations are independent from the specific software system and are general properties of Java software systems. We show how the metrics can be partitioned in groups for almost the whole Java Qualitas Corpus, and that such grouping can provide insights on the topology of software networks. For two systems, Eclipse and Netbeans, we computed also the number of bugs, identifying the bugs affecting each class, and finding that some SNA metrics are highly correlated with bugs, while others are strongly anticorrelated. This suggests that practitioners and software engineers might take advantage of such metrics to keep control of software quality
A machine learning approach for text categorization of fixing-issue commits on CVS
We studied data mining from CVS repositories of two large OO projects, Eclipse and Netbeans, focusing on "fixing-issue" commits. We highlight common characteristics of issue reporting, and problems related to the identification of these messages, and compare static traditional approaches, like Knowledge Engineering, to dynamic approaches based on Machine Learning techniques. We compare for the first time performances of Machine Learning (ML) techniques to automatic classify "fixing-issues" among message commits. Our study calculates precision and recall of different Machine Learning Classifiers for the correct classification of issue-reporting commits. Our results show that some ML classifiers can correctly classify up to 99.9% of such commit
- …
