1,720,989 research outputs found
Coherence of comments and method implementations: a dataset and an empirical investigation
In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source soft- ware applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vec- tor Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinc- tion, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space
Coherence of comments and method implementations: a dataset and an empirical investigation
In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space
SemFLATNESSES: Social and Proactive Enterprise Knowledge Management with Semantic Web Tecnologies
On the Coherence Between Comments and Implementations in Source Code
Source code comments provide useful information on the implementation of a software and on the intent behind design decisions and goals. Writing informative and useful comments is far from being a trivial task. Moreover, source code comments tend to remain mostly unchanged during maintenance activities. As a consequence, the information provided in the comment of a method and in its corresponding implementation may be not coherent with each other (i.e., The comment does not properly describe the implementation). In this paper, we present the results of a manual assessment on the coherence between comments and implementations of 3636 methods, gathered from 3 Java open source software systems (for one of these systems, we considered 2 different subsequent versions). Resulting evaluations have been collected in a dataset, we made publicly available on the web. The defined protocol used for the creation of this dataset is also described. This lets researchers evaluate the goodness of our dataset and eases its future possible extensions. Another contribution of our paper consists in investigating on a possible link between coherence and lexical similarity between source code and comments. Our preliminary outcomes suggest that this similarity is higher in case the comment of methods and their implementations are coherent. However, the obtained similarity values are generally low and are not much higher than those for non-coherent method implementations and comments
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
TAASRAD19, a high-resolution weather radar reflectivity dataset for precipitation nowcasting
We introduce TAASRAD19, a high-resolution radar reflectivity dataset collected by the Civil Protection weather radar of the Trentino South Tyrol Region, in the Italian Alps. The dataset includes 894,916 timesteps of precipitation from more than 9 years of data, offering a novel resource to develop and benchmark analog ensemble models and machine learning solutions for precipitation nowcasting. Data are expressed as 2D images, considering the maximum reflectivity on the vertical section at 5 min sampling rate, covering an area of 240 km of diameter at 500 m horizontal resolution. The TAASRAD19 distribution also includes a curated set of 1,732 sequences, for a total of 362,233 radar images, labeled with precipitation type tags assigned by expert meteorologists. We validate TAASRAD19 as a benchmark for nowcasting methods by introducing a TrajGRU deep learning model to forecast reflectivity, and a procedure based on the UMAP dimensionality reduction algorithm for interactive exploration. Software methods for data pre-processing, model training and inference, and a pre-trained model are publicly available on GitHub (https://github.com/MPBA/TAASRAD19) for study replication and reproducibility
Weighing lexical information for software clustering in the context of architecture recovery
In literature some approaches have been proposed to partition software systems into meaningful subsystems exploiting the lexical information provided by programmers into the source code. However these techniques usually do not consider the programming language constructs in which the lexicon appears (e.g.: comments, class names, method names) even if it is a common experience that programmers place di erent care in choosing terms for different constructs. In this paper we present a novel lexical-based software clustering technique which exploits the contribution of terms placed in six different parts of the source code (i.e. zones), namely Class Names, Attribute Names, Method Names, Parameter Names, Comments and Source Code Statements. These zones convey information with different levels of relevance, and so their contribution should be di erently weighted according to the specificities of the analyzed software system. To this aim we dene a probabilistic model of the data whose parameters are estimated automatically by the Expectation-Maximization algorithm. These weights are then exploited to generate the software partitions with two distinct clustering algorithms properly customized to make them more suitable for the specic domain. The overall technique has been assessed in a case study conducted on a dataset of 16 open source software systems whose results are presented in the paper. In particular, we experimentally observed that the use of both the dened zones and the Expectation-Maximization algorithm improves the overall quality of results
LINSEN: An Efficient Approach to Split Identifiers and Expand Abbreviations
Information Retrieval (IR) techniques are being exploited by an increasing number of tools suited to support Software Maintenance activities. This is because the lexical information embedded in the source code by programmers can be valuable for tasks such as concept location, clustering or recovery of traceability links.
However, the application of such IR-based techniques relies on the consistency of lexicon available in the different artifacts, and their effectiveness can worsen if programmers introduce abbreviations (e.g: rect) and/or did not strictly follow naming conventions such as Camel Case (e.g: UTFtoASCII).
In this paper we propose an approach useful for all of these IR-based tools, suited to automatically split identifiers in their composing words, and expand abbreviations. The solution is able to perform in linear time, taking advantage of an approximate pattern matching technique applied in a graph-based model. Linear complexity allows exploiting a number of different dictionaries, referring to increasingly broader contexts, in order to achieve a disambiguation strategy based on the knowledge gathered from the most appropriate domain.
The proposed approach has been compared to other splitting and expansion techniques, using freely available oracles for the identifiers extracted from a number of C/C++ and Java open source systems. Results show an improvement in both splitting and expanding performance, in addition to a strong enhancement in the computational efficiency
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
- …
