1,720,971 research outputs found

    Detection of red and white blood cells from microscopic blood images using a region proposal approach

    No full text
    In this paper, we propose a novel and efficient method for detecting and quantifying red and white blood cells from microscopic blood images. Laboratory tests that use a cell counter or a flow cytometer can perform a complete blood count (CBC) rapidly. Nonetheless, a manual blood smear inspection is still needed, both to have a human check on the counter results and to monitor patients under therapy. Moreover, it allows for describing the cells' appearance as well as any abnormalities. However, manual analysis is lengthy and repetitive, and its result can be subjective and error-prone. In contrast, by using image processing techniques, the proposed system is entirely automated. The main effort is devoted to both achieving high accuracy and finding a way to overcome the typical differences in the condition of blood smear images that computer-aided methods encounter. It is based on the Edge Boxes method, which is considered a state-of-art region proposal approach. By incorporating knowledge-based constraints into the detection process using Edge Boxes, we can find cell proposals rapidly and efficiently. We tested the proposed approach on the Acute Lymphoblastic Leukaemia Image Database (ALL-IDB), a well-known public dataset proposed for leukaemia detection, and the Malaria Parasite Image Database (MP-IDB), a recently proposed dataset for malaria detection. Experimental results were excellent in both cases, outperforming the state-of-the-art on ALL-IDB and creating a strong baseline on MP-IDB, demonstrating that the proposed method can work well on different datasets and different types of images

    Human-in-the-loop cross-domain person re-identification

    Full text link
    Person re-identification is a challenging cross-camera matching problem, which is inherently subject to domain shift. To mitigate it, many solutions have been proposed so far, based on four kinds of approaches: supervised and unsupervised domain adaptation, direct transfer, and domain generalisation; in particular, the first two approaches require target data during system design, respectively labelled and unlabelled. In this work, we consider a very different approach, known as human-in-the-loopHITL), which consists of exploiting user’s feedback on target data processed during system operation to improve re-identification accuracy. Although it seems particularly suited to this application, given the inherent interaction with a human operator, HITL methods have been proposed for person re-identification by only a few works so far, and with a different purpose than addressing domain shift. However, we argue that HITL deserves further consideration in person re-identification, also as a potential alternative solution against domain shift. To substantiate our view, we consider simple HITL implementations which do not require model re-training or fine-tuning: they are based on well-known relevance feedback algorithms for content-based image retrieval, and of novel versions of them we devise specifically for person re-identification. We then conduct an extensive, cross-data set experimental evaluation of our HITL implementations on benchmark data sets, and compare them with a large set of existing methods against domain shift, belonging to the four categories mentioned above. Our results provide evidence that HITL can be as effective as, or even outperform, existing ad hoc solutions against domain shift for person re-identification, even under the simple implementations we consider. We believe that these results can foster further research on HITL in the person re-identification field, where, in our opinion, its potential has not been thoroughly explored so far

    On the effectiveness of synthetic data sets for training person re-identification models

    Full text link
    Person re-identification is a prominent topic in computer vision due to its security-related applications, and to the fact that issues such as variations in illumination, background, pedestrian pose and clothing appearance make it a very challenging task in real-world scenarios. State-of-the-art supervised methods require a huge manual annotation effort for training data and exhibit limited generalisation capability to unknown target domains. Synthetic data sets have recently been proposed as one possible solution to mitigate these problems, aimed at improving generalisation capability by encompassing a larger amount of variations in the above mentioned visual factors, with no need for manual annotation. However, existing synthetic data sets differ in many aspects, including the number of images, identities and cameras, and in their degree of photorealism, and there is not yet a clear understanding of how all such factors affect person re-identification performance. This work makes a first step towards filling this gap through an in-depth empirical investigation, where we use existing synthetic data sets for model training and real benchmark ones for performance evaluation. Our results provide interesting insights towards developing effective synthetic data sets for person re-identification

    Scene-specific crowd counting using synthetic training images

    Full text link
    Crowd counting is a computer vision task on which considerable progress has recently been made thanks to convolutional neural networks. However, it remains a challenging task even in scene-specific settings, in real-world application scenarios where no representative images of the target scene are available, not even unlabelled, for training or fine-tuning a crowd counting model. Inspired by previous work in other computer vision tasks, we propose a simple but effective solution for the above application scenario, which consists of automatically building a scene-specific training set of synthetic images. Our solution does not require from end-users any manual annotation effort nor the collection of representative images of the target scene. Extensive experiments on several benchmark data sets show that the proposed solution can improve the effectiveness of existing crowd counting methods

    An Empirical Evaluation of Nuclei Segmentation from H&E Images in a Real Application Scenario

    Full text link
    Cell nuclei segmentation is a challenging task, especially in real applications, when the target images significantly differ between them. This task is also challenging for methods based on convolutional neural networks (CNNs), which have recently boosted the performance of cell nuclei segmentation systems. However, when training data are scarce or not representative of deployment scenarios, they may suffer from overfitting to a different extent, and may hardly generalise to images that differ from the ones used for training. In this work, we focus on real-world, challenging application scenarios when no annotated images from a given dataset are available, or when few images (even unlabelled) of the same domain are available to perform domain adaptation. To simulate this scenario, we performed extensive cross-dataset experiments on several CNN-based state-of-the-art cell nuclei segmentation methods. Our results show that some of the existing CNN-based approaches are capable of generalising to target images which resemble the ones used for training. In contrast, their effectiveness considerably degrades when target and source significantly differ in colours and scale

    On The Potential of Image Moments for Medical Diagnosis

    Full text link
    Medical imaging is widely used for diagnosis and postoperative or post-therapy monitoring. The ever-increasing number of images produced has encouraged the introduction of automated methods to assist doctors or pathologists. In recent years, especially after the advent of convolutional neural networks, many researchers have focused on this approach, considering it to be the only method for diagnosis since it can perform a direct classification of images. However, many diagnostic systems still rely on handcrafted features to improve interpretability and limit resource consumption. In this work, we focused our efforts on orthogonal moments, first by providing an overview and taxonomy of their macrocategories and then by analysing their classification performance on very different medical tasks represented by four public benchmark data sets. The results confirmed that convolutional neural networks achieved excellent performance on all tasks. Despite being composed of much fewer features than those extracted by the networks, orthogonal moments proved to be competitive with them, showing comparable and, in some cases, better performance. In addition, Cartesian and harmonic categories provided a very low standard deviation, proving their robustness in medical diagnostic tasks. We strongly believe that the integration of the studied orthogonal moments can lead to more robust and reliable diagnostic systems, considering the performance obtained and the low variation of the results. Finally, since they have been shown to be effective on both magnetic resonance and computed tomography images, they can be easily extended to other imaging techniques

    How Realistic Should Synthetic Images Be for Training Crowd Counting Models?

    No full text
    Using synthetic images has been proposed to avoid collecting and manually annotating a sufficiently large and representative training set for several computer vision tasks, including crowd counting. While existing methods for crowd counting are based on generating realistic images, we start investigating how crowd counting accuracy is affected by increasing the realism of synthetic training images. Preliminary experiments on state-of-the-art CNN-based methods, focused on image background and pedestrian appearance, show that realism in both of them is beneficial to a different extent, depending on the kind of model (regression- or detection-based) and on pedestrian size in the images

    BLUES: Before-reLU-EStimates Bayesian Inference for Crowd Counting

    No full text
    Ensuring the trustworthiness of artificial intelligence and machine learning systems is becoming a crucial requirement given their widespread applications, including crowd counting, which we focus on in this work. This is often addressed by integrating uncertainty measures into their predictions. Most Bayesian uncertainty quantification techniques use a Gaussian approximation of the output, whose variance is interpreted as the uncertainty measure. However, in the case of neural network models for crowd counting based on density estimation, where the ReLU activation function is used for the output units, such a prior may lead to an approximated distribution with a significant mass on negative values, although they cannot be produced by the ReLU activation. Interestingly, we found that this is related to “false positive” pedestrian localisation errors in the density map. We propose to address this issue by shifting the Bayesian Inference Before the reLU EStimates (BLUES). This modification allows us to estimate a probability distribution both on the people density and the people presence in each pixel. This allows us to compute a crowd segmentation map, which we exploit for filtering out false positive localisations. Results on several benchmark data sets provide evidence that our BLUES approach allows for improving the accuracy of the estimated density map and the quality of the corresponding uncertainty measure

    On the Evaluation of Video-Based Crowd Counting Models

    No full text
    Crowd counting is a challenging and relevant computer vision task. Most of the existing methods are image-based, i.e., they only exploit the spatial information of a single image to estimate the corresponding people count. Recently, video-based methods have been proposed to improve counting accuracy by also exploiting temporal information coming from the correlation between adjacent frames. In this work, we point out the need to properly evaluate the temporal information's specific contribution over the spatial one. This issue has not been discussed by existing work, and in some cases such evaluation has been carried out in a way that may lead to overestimating the contribution of the temporal information. To address this issue we propose a categorisation of existing video-based models, discuss how the contribution of the temporal information has been evaluated by existing work, and propose an evaluation approach aimed at providing a more complete evaluation for two different categories of video-based methods. We finally illustrate our approach, for a specific category, through experiments on several benchmark video data sets
    corecore