1,720,960 research outputs found

    Exploring few-shot text line segmentation approaches in challenging ancient manuscripts

    Full text link
    Text line segmentation is a critical component of document layout analysis, particularly for ancient handwritten manuscripts. Its primary goal is to accurately extract individual text lines, a step that significantly influences subsequent tasks such as optical character recognition, text transcription, and information extraction. However, segmenting text lines in historical manuscripts is particularly challenging due to irregular handwriting, faded ink, and complex layouts with overlapping lines and non-linear text flows. Additionally, the limited availability of large annotated datasets makes fully supervised learning approaches impractical for these documents. In this paper, we explore the applicability of three prominent semantic segmentation models when applied in a few-shot learning setting, using only a small number of labeled examples per manuscript. Our results demonstrate the challenges of addressing text line segmentation in the context of scarce labeled data. This provides a promising avenue for future research in document analysis for historical manuscripts

    In-domain versus out-of-domain transfer learning for document layout analysis

    Full text link
    Data availability is a big concern in the field of document analysis, especially when working on tasks that require a high degree of precision when it comes to the definition of the ground truths on which to train deep learning models. A notable example is represented by the task of document layout analysis in handwritten documents, which requires pixel-precise segmentation maps to highlight the different layout components of each document page. These segmentation maps are typically very time-consuming and require a high degree of domain knowledge to be defined, as they are intrinsically characterized by the content of the text. For this reason in the present work, we explore the effects of different initialization strategies for deep learning models employed for this type of task by relying on both in-domain and cross-domain datasets for their pre-training. To test the employed models we use two publicly available datasets with heterogeneous characteristics both regarding their structure as well as the languages of the contained documents. We show how a combination of cross-domain and in-domain transfer learning approaches leads to the best overall performance of the models, as well as speeding up their convergence process

    U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts

    No full text
    Document Layout Analysis, which is the task of identifying different semantic regions inside of a document page, is a subject of great interest for both computer scientists and humanities scholars as it represents a fundamental step towards further analysis tasks for the former and a powerful tool to improve and facilitate the study of the documents for the latter. However, many of the works currently present in the literature, especially when it comes to the available datasets, fail to meet the needs of both worlds and, in particular, tend to lean towards the needs and common practices of the computer science side, leading to resources that are not representative of the humanities real needs. For this reason, the present paper introduces U-DIADS-Bib, a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities. Furthermore, we propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation, necessary for the generation of the ground truth segmentation maps. Finally, we present a standardized few-shot version of the dataset (U-DIADS-BibFS), with the aim of encouraging the development of models and solutions able to address this task with as few samples as possible, which would allow for more effective use in a real-world scenario, where collecting a large number of segmentations is not always feasible

    A One-Shot Learning Approach to Document Layout Segmentation of Ancient Arabic Manuscripts

    No full text
    Document layout segmentation is a challenging task due to the variability and complexity of document layouts. Ancient manuscripts in particular are often damaged by age, have very irregular layouts, and are characterized by progressive editing from different authors over a large time window. All these factors make the semantic segmentation process of specific areas, such as main text and side text, very difficult. However, the study of these manuscripts turns out to be fundamental for historians and humanists, so much so that in recent years the demand for machine learning approaches aimed at simplifying the extraction of information from these documents has consistently increased, leading document layout analysis to become an increasingly important research area. In order for machine learning techniques to be applied effectively to this task, however, a large amount of correctly and precisely labeled images is required for their training. This is obviously a limitation for this field of research as ground truth must be precisely and manually crafted by expert humanists, making it a very time-consuming process. In this paper, with the aim of overcoming this limitation, we present an efficient document layout segmentation framework, which while being trained on only one labeled page per manuscript still achieves state-of-the-art performance compared to other popular approaches trained on all the available data when tested on a challenging dataset of ancient Arabic manuscripts

    Dynamic instance generation for few-shot handwritten document layout segmentation (short paper)

    Full text link
    Historical handwritten document analysis is an important activity to retrieve information about our past. Given that this type of process is slow and time-consuming, the humanities community is searching for new techniques that could aid them in this activity. Document layout analysis is a branch of machine learning that aims to extract semantic informations from digitised documents. Here we propose a new framework for handwritten document layout analysis that differentiates from the current state-of-the-art by the fact that it features few-shot learning, thus allowing for good results with little manually labelled data and the dynamic instance generation process. Our results were obtained using the DIVA - HisDB dataset

    Multi-modal Analysis of Bi-Parametric MRI Slices for Lesion Detection in Prostate Cancer Screening

    Full text link
    Prostate cancer is a leading cause of cancer-related mortality among men, with early detection playing a critical role in improving patient outcomes. While deep learning has significantly advanced prostate cancer detection in Magnetic Resonance Imaging (MRI), most existing models depend on full 3D scans, posing challenges for real-time deployment in clinical practice. Our work addresses this problem by leveraging a slice-based approach that balances accuracy with efficiency, ensuring scalability for real-world applications. We present a dual-branch, multi-modal deep learning framework focusing on individual bi-parametric MRI slices to detect clinically significant prostate cancer areas (csPCa). The proposed approach combines pixel-level segmentation via UNet++ and instance-level classification using EfficientNet to enhance lesion localization and reduce false positives. We evaluated our framework on the PI-CAI (Prostate Imaging: Cancer AI) dataset, achieving perforamances that outperform other 2D segmentation models. The study highlights the feasibility of slice-based MRI analysis for prostate cancer screening in resource-limited clinical settings, highlight the potential of this approach as a clinical decision support tool, reducing interpretation burden and aiding radiologists in prostate cancer screening

    FL-W3S: Cross-domain federated learning for weakly supervised semantic segmentation of white blood cells

    Full text link
    Background: Segmentation models for clinical data experience severe performance degradation when trained on a single client from one domain and distributed to other clients from different domain. Federated Learning (FL) provides a solution by enabling multi-party collaborative learning without compromising the confidentiality of clients' private data. Methods: In this paper, we propose a cross-domain FL method for Weakly Supervised Semantic Segmentation (FL-W3S) of white blood cells in microscopic images. We perform model training on multiple clients with different data distributions to obtain a global aggregated model using only image-level class labels for semantic segmentation of white blood cells. A multi-class token transformer model learns the relationship between patch tokens and class tokens during collaborative learning and generates class-specific localization maps for mask predictions. To rectify the localization maps, we use patch-level pairwise affinity obtained from patch-to-patch transformer attention. Results: We evaluate performance of the proposed semantic segmentation method on two different datasets of white blood cells from different domains. Our experimental results show that for two datasets, there is 2.56% and 1.39% increase in performance of the proposed method over existing state-of-the-art methods. Conclusion: The combination of federated learning for collaborative model training while preserving data privacy, alongside white blood cell segmentation techniques for precise cell identification, enhances diagnostic accuracy and personalized treatment strategies in clinical applications, particularly in hematology and pathology. More specifically, it involves isolating white blood cell from blood smear for further analysis such as automated blood cell counting, morphological analysis, cell classification, disease diagnosis and monitoring

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
    corecore