1,720,997 research outputs found

    ISIC_WSM: Generating Weak Segmentation Maps for the ISIC archive

    No full text
    Recognizing skin cancer in time could greatly increase patients' chances of recovery. For this reason, in recent years, numerous decision support systems have been proposed to help dermatologists in this diagnosis. These systems are generally based on Convolutional Neural Networks and are used for both segmentation and classification of lesions. Although their main goal is to correctly recognize the lesions' type, the preliminary segmentation step has been shown to increase the performance of the classifier. In fact, this is not surprising because physicians also use information on the shape of the lesion to make a diagnosis. Thanks to the ISIC archive, a huge number of skin lesion images, along with the corresponding metadata (type, position, dimension, etc.), are publicly available to train a deep neural network, but, unfortunately, only a small fraction of them are labeled for segmentation. To overcome this limitation, in this paper, a weak supervised approach is proposed to extract the segmentation label maps from the entire ISIC archive. Moreover, to demonstrate the quality of the proposed approach, the generated supervisions were first compared with those available in ISIC and, then, used to train a segmentation network, whose performance was evaluated against that obtained using only the small set of ISIC label maps. To foster reproducibility and to promote future research in lesion segmentation and classification, the generated ISIC Weak Segmentation Map (ISIC_WSM) dataset has been released. As far as we know, this is the first dataset that contains segmentation supervisions for clinical images of skin lesions. (c) 2022 Elsevier B.V. All rights reserved

    Development of an automated moderator for deliberative events

    Full text link
    Online communication platforms have revolutionized interpersonal interactions by transcending geographical barriers. While facilitating connectivity, these platforms have introduced challenges such as overcoming linguistic differences and preventing spam and offensive content diffusion. This is particularly pertinent in the context of deliberative events, where online platforms could be used to extend the inclusion of citizens in democratic decision-making. In traditional deliberative events, human moderators and translators were used to facilitate conversation; however, the need for these figures imposed a limit on both the number of deliberative events that could be organized and the number of participants. In response, this paper proposes an automated moderator for deliberative events. The moderator is developed in Python for the online communication platform Discord and can be used, thanks to the integrated AI (Artificial Intelligence) tools, to automatically manage conversation agendas, prevent spam and inappropriate language, analyze the sentiment of the conversation, and translate messages into multiple languages. In particular, three classifiers, based on a pre-trained BERT (Bidirection Encoder Representations from Transformers), were fine-tuned for spam detection, toxic comments classification, and sentiment analysis. These allow the moderator to automatically detect and remove spam and offensive messages in different languages, send warnings to users, alert administrators, and, after repeated warnings, impose bans. Additionally, a built-in translator, based on Meta’s No Language Left Behind NLLB model, translates messages into five languages (Italian, English, French, German, and Polish). The developed bot was tested in a simulated deliberative event on a Discord server, demonstrating its ability to manage conversations and prevent linguistic abuse

    Towards a comprehensive characterization of arteries and veins in retinal imaging

    Full text link
    Retinal fundus imaging is crucial for diagnosing and monitoring eye diseases, which are often linked to systemic health conditions such as diabetes and hypertension. Current deep learning techniques often narrowly focus on segmenting retinal blood vessels, lacking a more comprehensive analysis and characterization of the retinal vascular system. This study fills this gap by proposing a novel, integrated approach that leverages multiple stages to accurately determine vessel paths and extract informative features from them. The segmentation of veins and arteries, achieved through a deep semantic segmentation network, is used by a newly designed algorithm to reconstruct individual vessel paths. The reconstruction process begins at the optic disc, identified by a localization network, and uses a recurrent neural network to predict the vessel paths at various junctions. The different stages of the proposed approach are validated both qualitatively and quantitatively, demonstrating robust performance. The proposed approach enables the extraction of critical features at the individual vessel level, such as vessel tortuosity and diameter. This work lays the foundation for a comprehensive retinal image evaluation, going beyond isolated tasks like vessel segmentation, with significant potential for clinical diagnosis

    Automatic image classification for the urinoculture screening

    No full text
    Urinary tract infections (UTIs) are considered to be the most common bacterial infection and, actually, it is estimated that about 150 million UTIs occur world wide yearly, giving rise to roughly $6 billion in healthcare expenditures and resulting in 100,000 hospitalizations. Nevertheless, it is difficult to carefully assess the incidence of UTIs, since an accurate diagnosis depends both on the presence of symptoms and on a positive urinoculture, whereas in most outpatient settings this diagnosis is made without an ad hoc analysis protocol. On the other hand, in the traditional urinoculture test, a sample of midstream urine is put onto a Petri dish, where a growth medium favors the proliferation of germ colonies. Then, the infection severity is evaluated by a visual inspection of a human expert, an error prone and lengthy process. In this paper, we propose a fully automated system for the urinoculture screening that can provide quick and easily traceable results for UTIs. Based on advanced image processing and machine learning tools, the infection type recognition, together with the estimation of the bacterial load, can be automatically carried out, yielding accurate diagnoses. The proposed AID (Automatic Infection Detector) system provides support during the whole analysis process: first, digital color images of Petri dishes are automatically captured, then specific preprocessing and spatial clustering algorithms are applied to isolate the colonies from the culture ground and, finally, an accurate classification of the infections and their severity evaluation are performed. The AID system speeds up the analysis, contributes to the standardization of the process, allows result repeatability, and reduces the costs. Moreover, the continuous transition between sterile and external environments (typical of the standard analysis procedure) is completely avoided

    Leveraging Synthetic Data for Zero–Shot and Few–Shot Circle Detection in Real–World Domains

    Full text link
    Circle detection plays a pivotal role in computer vision, underpinning applications from industrial inspection and bioinformatics to autonomous driving. Traditional methods, however, often struggle with real–world complexities, as they demand extensive parameter tuning and adaptation across different domains. In this paper, we present the Synthetic Circle Dataset (SynCircle), a large synthetic image dataset designed to train a YOLO v10 network for circle detection. The YOLO v10 network, pre–trained solely on synthetic data, demonstrates remarkable off–the–shelf performance that surpasses conventional methods in various practical scenarios. Furthermore, we show that incorporating just a few labeled real images for fine–tuning can significantly boost performance, reducing the need for large annotated datasets. To promote reproducibility and streamline adoption, we publicly release both the trained YOLO v10 weights and the full SynCircle dataset

    Weak supervision for generating pixel–level annotations in scene text segmentation

    No full text
    Providing pixel–level supervisions for scene text segmentation is inherently difficult and costly, so that only few small datasets are available for this task. To face the scarcity of training data, previous approaches based on Convolutional Neural Networks (CNNs) rely on the use of a synthetic dataset for pre–training. However, synthetic data cannot reproduce the complexity and variability of natural images. In this work, we propose to use a weakly supervised learning approach to reduce the domain–shift between synthetic and real data. Leveraging the bounding–box supervision of the COCO–Text and the MLT datasets, we generate weak pixel–level supervisions of real images. In particular, the COCO–Text–Segmentation (COCO_TS) and the MLT–Segmentation (MLT_S) datasets are created and released. These two datasets are used to train a CNN, the Segmentation Multiscale Attention Network (SMANet), which is specifically designed to face some peculiarities of the scene text segmentation task. The SMANet is trained end–to–end on the proposed datasets, and the experiments show that COCO_TS and MLT_S are a valid alternative to synthetic images, allowing to use only a fraction of the training samples, with a significant improvement in performance

    Diff-Props: is Semantics Preserved within a Diffusion Model?

    Full text link
    The ambition to create increasingly realistic images has driven researchers to develop increasingly powerful models, capable of generalizing and generating high-resolution images, even in a multimodal setup (e.g., from textual input). Among the most recent generative networks, Stable Diffusion Models (SDMs) have achieved state-of-the-art showing great generative capabilities but also a high degree of complexity, both in terms of training and interpretability. Indeed, the impressive generalization capability of pre-trained SDMs has pushed researchers to exploit their internal representation to perform downstream tasks (e.g., classification and segmentation). Understanding how well the model preserves semantic information is fundamental to improve its performance. Our approach, namely Diff-Props, analyses the features extracted from the U-Net within Stable Diffusion Model to unveil how Stable Diffusion retains semantic information of an image in a pre-trained setup. Exploiting a set of different distance metrics, Diff-Props aims to analyse how features at different depths contribute to preserving the meaning of the objects in the image

    An analysis of pre-trained stable diffusion models through a semantic lens

    Full text link
    Recently, generative models for images have garnered remarkable attention, due to their effective generalization ability and their capability to generate highly detailed and realistic content. Indeed, the success of generative networks (e.g., BigGAN, StyleGAN, Diffusion Models) has driven researchers to develop increasingly powerful models. As a result, we have observed an unprecedented improvement in terms of both image resolution and realism, making generated images indistinguishable from real ones. In this work, we focus on a family of generative models known as Stable Diffusion Models (SDMs), which have recently emerged due to their ability to generate images in a multimodal setup (i.e., from a textual prompt) and have outperformed adversarial networks by learning to reverse a diffusion process. Given the complexity of these models that makes it hard to retrain them, researchers started to exploit pre-trained SDMs to perform downstream tasks (e.g., classification and segmentation), where semantics plays a fundamental role. In this context, understanding how well the model preserves semantic information may be crucial to improve its performance. This paper presents an approach aimed at providing insights into the properties of a pre-trained SDM through the semantic lens. In particular, we analyze the features extracted by the U-Net within a SDM to explore whether and how the semantic information of an image is preserved in its internal representation. For this purpose, different distance measures are compared, and an ablation study is performed to select the layer (or combination of layers) of the U-Net that best preserves the semantic information. We also seek to understand whether semantics are preserved when the image undergoes simple transformations (e.g., rotation, flip, scale, padding, crop, and shift) and for a different number of diffusion denoising steps. To evaluate these properties, we consider popular benchmarks for semantic segmentation tasks (e.g., COCO, and Pascal-VOC). Our experiments suggest that the first encoder layer at resolution effectively preserves semantic information. However, increasing inference steps (even for a minimal amount of noise) and applying various image transformations can affect the diffusion U-Net’s internal feature representation. Additionally, we propose some examples taken from a video benchmark (DAVIS dataset), where we investigate if an object instance within a video preserves its internal representation even after several frames. Our findings suggest that the internal object representation remains consistent across multiple frames in a video, as long as the configuration changes are not excessive
    corecore