1,721,110 research outputs found

    Word Embeddings for Comment Coherence

    No full text
    Information in source code comments and identifiers names represent a valuable resource for programmers to maintain and evolve software. During the evolution of a software it could happen that the information in comments and the corresponding source code is not aligned, so hampering the execution of software evolution and maintenance tasks. This kind of misalignment is known as lack of coherence and can happen for several reasons, e.g., programmers modify the intent of source code while executing a maintenance task without updating its comment accordingly. We study the problem of detecting lack of coherence between comments and source code by exploiting Word Embeddings (WEs), a tool which has shown to be very effective in natural language processing. We introduce four models based on WEs and tested them using six different WE variants. These models and WEs have been empirically assessed through an experiment conducted on a publicly available dataset and compared them with a baseline approach. The results indicate that, while maintaining performance very close to the baseline, the considered models and WE variants are more efficient in terms of execution time. The explanation for such an improvement is that WEs are able to concentrate the important information in a much more compact representation of the input. This represents one of the most important take-away lesson from our experiment

    Studying abbreviated vs. full-word identifier names when dealing with faults: An external replication

    No full text
    Context: abbreviated and full-word identifier names in dealing with faults in source code. Goal: investigating whether the use of abbreviated identifier names affects the ability of novice professional software developers in identifying and fixing faults in Java code. Method: external replication. Results: the results of the original experiment (conducted on C code) were confirmed. Conclusions: the difference in using abbreviated and full-word identifiers is not statistically significant with respect to the time to complete a task and the number of faults identified and fixed

    Does the Combined use of Class and Sequence Diagrams Improve the Source Code Comprehension? Results from a Controlled Experiment

    No full text
    We present the results of a controlled experiment aimed to investigate whether the source code comprehension increases when participants are provided with UML class and sequence diagrams produced in the software design phase. The experiment has been conducted with Master students in Computer Science at the University of Salerno. The data analysis shows that the participants significantly better comprehend source code when it is added with class and sequence diagrams together

    Viewing object-oriented software with metricattitude: An empirical evaluation

    No full text
    MetricAttitude is a visualization tool based on static analysis that provides a mental picture by viewing an object-oriented software system by means of polymetric views. In this paper, we present a preliminary empirical investigation based on a questionnaire-based survey to assess Metric Attitude with respect to source code comprehension tasks. Participants involved in this study were Computer Science students and software professionals. The results suggest that Metric Attitude is a viable means to comprehend source code and that both kinds of participants in the empirical investigation considered it to be appropriate in source code comprehension

    A graph-based approach to detect unreachable methods in Java software

    No full text
    In this paper, we have defined a static approach named DUM (Detecting Unreachable Methods) that works on Java byte-code and detects unreachable methods by traversing a graph-based representation of the software to be analyzed. To assess the validity of our approach, we have implemented it in a prototype software system. Both our approach and prototype have been validated on four open-source software. Results have shown the correctness, the completeness, and the accuracy of the methods that our solution detected as unreachable. We have also compared our solution with: JTombstone and Google CodePro AnalytiX. This comparison suggested that DUM outperforms baselines

    On the Effectiveness of the UML Object Diagrams: A Replicated experiment

    No full text
    Background: In the modeling of object oriented software systems, the UML object diagrams are recognized very useful to complement class diagrams. However, up to now, there exists only one experiment [Torchiano 2004] that investigates this concern. Aim: To confirm or contradict the findings of the original experiment, we have conducted a replication and the achieved results have been presented in this paper. Both the replication and the original experiment have been conducted to investigate whether the use of object diagrams to complement class diagrams affects the comprehension of software systems. Method: The replication has been conducted with a group of 24 graduated subjects in Computer Science of the University of Basilicata. The experiment adopts a counterbalanced design, thus ensuring that each subject work on two comprehension tasks, experimenting each time class and object diagrams together or class diagrams alone. The comprehension on each task has been assessed using a questionnaire-based approach. In particular, we have measured the comprehension level of each subject using an information retrieval based approach that allowed us to get a balance between correctness and completeness of the answers. Results: The results show that the subjects significantly benefit from the use of object diagrams in the comprehension of software systems, thus confirming and strengthening the findings of the original experiment. Conclusions: It is advisable to complement the usual class diagrams with object diagrams to increase the understandability of software systems. To raise the generalizability of the results, replications of this study are necessary especially with professional software engineer

    Weighing lexical information for software clustering in the context of architecture recovery

    No full text
    In literature some approaches have been proposed to partition software systems into meaningful subsystems exploiting the lexical information provided by programmers into the source code. However these techniques usually do not consider the programming language constructs in which the lexicon appears (e.g.: comments, class names, method names) even if it is a common experience that programmers place di erent care in choosing terms for different constructs. In this paper we present a novel lexical-based software clustering technique which exploits the contribution of terms placed in six different parts of the source code (i.e. zones), namely Class Names, Attribute Names, Method Names, Parameter Names, Comments and Source Code Statements. These zones convey information with different levels of relevance, and so their contribution should be di erently weighted according to the specificities of the analyzed software system. To this aim we dene a probabilistic model of the data whose parameters are estimated automatically by the Expectation-Maximization algorithm. These weights are then exploited to generate the software partitions with two distinct clustering algorithms properly customized to make them more suitable for the specic domain. The overall technique has been assessed in a case study conducted on a dataset of 16 open source software systems whose results are presented in the paper. In particular, we experimentally observed that the use of both the dened zones and the Expectation-Maximization algorithm improves the overall quality of results

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore