1,720,971 research outputs found

    Agile collaboration for distributed teams

    No full text
    Today software engineering is characterized by two strong trends: agile and distributed. Both together are increasingly demanded and challenge teams and projects due to lack of discipline, insufficient transparency, agile "ping pong" and thus overheads and rework. Authors Fabio Calefato and I describe current technologies and tools for agile collaboration. I look forward to hearing from both readers and prospective column authors about this column and the technologies you want to know more about

    Towards social semantic suggestive tagging

    No full text
    The organization of the knowledge on the web is increasingly becoming a social task performed by online communities whose members share a common interest in classifying different types of information for a later retrieval. Collaborative tagging systems allow people to organize a set of resources of interest through unconstrained annotations based on free keywords commonly named tags. Suggestive tagging techniques support users in this organization process and have shown to be helpful also in fostering a quick convergence to a shared tag vocabulary. In this paper, we propose a tag recommender which relies on the content analysis of the resource to be tagged, as well as on the personal and collective tagging history. The main contribution of this work is a model which combines semantic content analysis methods with existing suggestive tagging techniques. The expected benefit is the improvement of the user experience in social bookmarking systems, and more generally in collaborative tagging systems

    Pynblint: A Static Analyzer for Python Jupyter Notebooks

    No full text
    Jupyter Notebook is the tool of choice of many data scientists in the early stages of ML workflows. The notebook format, however, has been criticized for inducing bad programming practices; indeed, researchers have already shown that open-source repositories are inundated by poor-quality notebooks. Low-quality output from the prototypical stages of ML workflows constitutes a clear bottleneck towards the productization of ML models. To foster the creation of better notebooks, we developed Pynblint, a static analyzer for Jupyter notebooks written in Python. The tool checks the compliance of notebooks (and surrounding repositories) with a set of empirically validated best practices and provides targeted recommendations when violations are detected.CCS CONCEPTS• Software and its engineering → Software maintenance tools; Software configuration management and version control systems; • Human-centered computing → Collaborative and social computing systems and tools

    Global Software Engineering: Challenges and solutions

    Full text link
    Today’s software industry is more global than ever before. The idea of developing major software systems in one location or by one single team belongs to the past. Over the past decade, research on GSE has uncovered many challenges associated with operating over physical, temporal, cultural, and linguistic distances. Unfortunately, these challenges have become even more personal to many more in 2020 due to the disruption provoked by the COVID-19 pandemic. Studies have shown how organizations have struggled tosmoothly transition to virtual work, and while Global Software Engineering (GSE) is part of everyday life by now, succeeding in the global software industry remains challenging, with a considerable share of global projects still not meeting the expectations, especially regarding cost savings and time to market. Albeit known, distances are still causing severe breakdowns in cooperation within virtual teams and among distributed ones. As such, there is a considerable gap to fill in to establish how to manage such challenges effectively. During its 14th edition held in Montreal, Canada, on 24–26 May 2019, co-located with ICSE, the International Conference on Global Software Engineering (ICGSE 2019)1 opened a call for paper for the present JSS Special Issue with the goal of advancing research that focused on providing evidence of working solutions to the GSE challenges and needs. Both extended papers from ICGSE 2019 and original manuscripts were eligible for submission. As a result of the call, 16 papers were submitted, highlighting the interest of the international GSE research community on the topics of the special issue. After an internal review performed by the Guest Editors and the Special Issue Editor, 11 papers were moved into review and assigned to three expert reviewers selected from academia and industry. As a result of this careful review process, four high-quality papers have been accepted, resulting in a 25% acceptance rate. We take the opportunity to congratulate the authors of the accepted papers, thank all who submitted a contribution to this special issue, and all the reviewers for their precious hard work

    Professional Insights into Benefits and Limitations of Implementing MLOps Principles

    No full text
    Machine Learning Operations (MLOps) has emerged as a set of practices that combines development, testing, and operations to deploy and maintain machine learning applications. Objective: In this paper, we assess the benefits and limitations of using the MLOps principles in online supervised learning. Method: We conducted two focus group sessions on the benefits and limitations of applying MLOps principles for online machine learning applications with six experienced machine learning developers. Results: The focus group revealed that machine learning developers see many benefits of using MLOps principles but also that these do not apply to all the projects they worked on. According to experts, this investment tends to pay off for larger applications with continuous deployment that require well-prepared automated processes. However, for initial versions of machine learning applications, the effort taken to implement the principles could enlarge the project’s scope and increase the time needed to deploy a first version to production. The discussion brought up that most of the benefits are related to avoiding error-prone manual steps, enabling the restore of the application to a previous state, and having a robust continuous automated deployment pipeline. Conclusions: It is important to balance the trade-offs of investing time and effort in implementing the MLOps principles considering the scope and needs of the project, favoring such investments for larger applications with continuous model deployment requirements

    Assessing the Use of AutoML for Data-Driven Software Engineering

    Full text link
    Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies

    KGTorrent: A Dataset of Python Jupyter Notebooks from Kaggle

    Full text link
    Computational notebooks have become the tool of choice for many data scientists and practitioners for performing analyses and disseminating results. Despite their increasing popularity, the research community cannot yet count on a large, curated dataset of computational notebooks. In this paper, we fill this gap by introducing KGTorrent, a dataset of Python Jupyter notebooks with rich metadata retrieved from Kaggle, a platform hosting data science competitions for learners and practitioners with any levels of expertise. We describe how we built KGTorrent, and provide instructions on how to use it and refresh the collection to keep it up to date. Our vision is that the research community will use KGTorrent to study how data scientists, especially practitioners, use Jupyter Notebook in the wild and identify potential shortcomings to inform the design of its future extensions

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore