1,720,971 research outputs found

    Time-varying graph representation learning via higher-order skip-gram with negative sampling

    Full text link
    Representation learning models for graphs are a successful family of techniques that project nodes into feature spaces that can be exploited by other machine learning algorithms. Since many real-world networks are inherently dynamic, with interactions among nodes changing over time, these techniques can be defined both for static and for time-varying graphs. Here, we show how the skip-gram embedding approach can be generalized to perform implicit tensor factorization on different tensor representations of time-varying graphs. We show that higher-order skip-gram with negative sampling (HOSGNS) is able to disentangle the role of nodes and time, with a small fraction of the number of parameters needed by other approaches. We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned representations outperform state-of-the-art methods when used to solve downstream tasks such as network reconstruction. Good performance on predicting the outcome of dynamical processes such as disease spreading shows the potential of this method to estimate contagion risk, providing early risk awareness based on contact tracing data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1140/epjds/s13688-022-00344-8

    Value is in the Eye of the Beholder: A Framework for an Equitable Graph Data Evaluation

    No full text
    Proprietary data is a valuable asset used to develop predictive algorithms that benefit a wide range of users, including customers, business owners, and decision-makers. Consequently, there is a growing interest in developing safe and robust techniques for sharing, learning models, and distributing predictions across a wide spectrum of potential stakeholders. However, a structured process to assess the value of data assets, and thus enabling collaborations among stakeholders, remains largely unexplored. This is particularly challenging when the data to be shared has a networked structure, where increasing the shared data samples potentially connects information observed by different data owners, providing new knowledge that is unavailable to any data owner individually. Here, we propose E-GraDE, a framework that assists organizations in assessing the value of their networked data to better address graph machine learning tasks. This framework includes a step-by-step analysis of the requirements of different stakeholders, such as the accuracy or fairness requisites of the models, ensuring a fair evaluation process and stronger alignment in the development of a data federation consortium. Additionally, we propose an approach to estimate the value of networked data to be shared while disclosing only a small fraction of the original information. We support our approach with extensive computational experiments, analysing each part of it through simulated use cases

    FairLens: Auditing black-box clinical decision support systems

    No full text
    The pervasive application of algorithmic decision-making is raising concerns on the risk of unintended bias in AI systems deployed in critical settings such as healthcare. The detection and mitigation of model bias is a very delicate task that should be tackled with care and involving domain experts in the loop. In this paper we introduce FairLens, a methodology for discovering and explaining biases. We show how this tool can audit a fictional commercial black-box model acting as a clinical decision support system (DSS). In this scenario, the healthcare facility experts can use FairLens on their historical data to discover the biases of the model before incorporating it into the clinical decision flow. FairLens first stratifies the available patient data according to demographic attributes such as age, ethnicity, gender and healthcare insurance; it then assesses the model performance on such groups highlighting the most common misclassifications. Finally, FairLens allows the expert to examine one misclassification of interest by explaining which elements of the affected patients’ clinical history drive the model error in the problematic group. We validate FairLens’ ability to highlight bias in multilabel clinical DSSs introducing a multilabel-appropriate metric of disparity and proving its efficacy against other standard metrics

    Explainability Methods for Natural Language Processing: Applications to Sentiment Analysis

    Full text link
    Sentiment analysis is the process of classifying natural lan-guage sentences as expressing positive or negative sentiments, and it is a crucial task where the explanation of a prediction might arguably be as necessary as the prediction itself. We analysed di fierent explanation techniques, and we applied them to the classification task of Sentiment Analysis. We explored how attention-based techniques can be exploited to extract meaningful sentiment scores with a lower computational cost than existing XAI methods

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Topic tomographies (Toptom): A visual approach to distill information from media streams

    Full text link
    In this paper we present Top Tom, a digital platform whose goal is to provide analytical and visual solutions for the exploration of a dynamic corpus of user‐generated messages and media articles, with the aim of i) distilling the information from thousands of documents in a low‐dimensional space of explainable topics, ii) cluster them in a hierarchical fashion while allowing to drill down to details and stories as constituents of the topics, iii) spotting trends and anomalies. Top Tom implements a batch processing pipeline able to run both in near‐real time with time stamped data from streaming sources and on historical data with a temporal dimension in a cold start mode. The resulting output unfolds along three main axes: time, volume and semantic similarity (i.e. topic hierarchical aggregation). To allow the browsing of data in a multiscale fashion and the identification of anomalous behaviors, three visual metaphors were adopted from biological and medical fields to design visualizations, i.e. the flowing of particles in a coherent stream, tomographic cross sectioning and contrast‐like analysis of biological tissues. The platform interface is composed by three main visualizations with coherent and smooth navigation interactions: calendar view, flow view, and temporal cut view. The integration of these three visual models with the multiscale analytic pipeline proposes a novel system for the identification and exploration of topics from unstructured texts. We evaluated the system using a collection of documents about the emerging opioid epidemics in the United States

    RV4JaCa—Towards Runtime Verification of Multi-Agent Systems and Robotic Applications

    Full text link
    This paper presents a Runtime Verification (RV) approach for Multi-Agent Systems (MAS) using the JaCaMo framework. Our objective is to bring a layer of security to the MAS. This is achieved keeping in mind possible safety-critical uses of the MAS, such as robotic applications. This layer is capable of controlling events during the execution of the system without needing a specific implementation in the behaviour of each agent to recognise the events. In this paper, we mainly focus on MAS when used in the context of hybrid intelligence. This use requires communication between software agents and human beings. In some cases, communication takes place via natural language dialogues. However, this kind of communication brings us to a concern related to controlling the flow of dialogue so that agents can prevent any change in the topic of discussion that could impair their reasoning. The latter may be a problem and undermine the development of the software agents. In this paper, we tackle this problem by proposing and demonstrating the implementation of a framework that aims to control the dialogue flow in a MAS; especially when the MAS communicates with the user through natural language to aid decision-making in a hospital bed allocation scenario
    corecore