1,721,040 research outputs found
Ottimizzazione di Algoritmi per l’Elaborazione di Immagini Binarie
La procedura che rende un algoritmo più efficiente in termini di requisiti di memoria o tempo di esecuzione si chiama ottimizzazione e rappresenta un passaggio cruciale nell'elaborazione di immagini e video. È raro che il processo di ottimizzazione produca un algoritmo ottimo in senso assoluto, ma spesso occorre raggiungere un compromesso tra i requisiti di tempo e quelli di memoria. Ad ogni modo, esistono molti scenari in cui il tempo di esecuzione totale richiesto per completare un'attività è il vincolo più restrittivo. Gli algoritmi di elaborazione di immagini binarie, ad esempio, rappresentano un'operazione fondamentale nella maggior parte dei sistemi di analisi di immagini e video all'avanguardia, anche quando questi sono basati su tecniche di deep learning. Avere un'implementazione efficiente è quindi essenziale, specialmente quando questi sistemi devono essere impiegati in scenari con vincoli temporali, dove compromettere la qualità del risultato, o fare affidamento su hardware più performante, non è una strada percorribile.
Questa tesi introduce ed esplora diversi approcci per l'ottimizzazione degli algoritmi di elaborazione di immagini binarie modellabili con tabelle decisionali. Esistono diversi problemi che possono essere definiti in questo modo: l’etichettatura delle componenti connesse, il thinning, il chain code e gli operatori morfologici sono alcuni di questi. In generale, tutti gli algoritmi in cui il valore di output per ciascun pixel dell'immagine è ottenuto dal valore del pixel stesso e di alcuni dei suoi vicini possono essere definiti utilizzando tabelle decisionali.
Concentrandosi sull'etichettatura delle componenti connesse, vengono analizzati gli approcci all'avanguardia sia per ambienti sequenziali basati su CPU che per ambienti paralleli basati su CPU e GPU, focalizzandosi su come misurare in modo equo le prestazioni. Vengono quindi introdotti nuovi approcci per migliorare ulteriormente le prestazioni in termini di tempo totale di esecuzione, mostrando come queste tecniche possano essere generalizzate per migliorare qualsiasi algoritmo modellabile con tabelle decisionali. Infine, viene presentato un framework che consente di applicare automaticamente molte delle strategie di ottimizzazione precedentemente descritte ed analizzate ad un determinato algoritmo. Il framework, chiamato GRAPHGEN, prende come input una definizione del problema in termini di condizioni da verificare e azioni da eseguire ed è in grado di produrre come output il codice C/C++ che include tutte le ottimizzazioni necessarie. Rispetto agli approcci esistenti, gli algoritmi generati con GRAPHGEN hanno prestazioni significativamente migliori, sia su set di dati reali che su quelli sintetici.The procedure of making an algorithm more efficient in terms of memory requirements or execution time is called optimization and represents a crucial step in image and video processing. Usually, it is achieved with a trade-off between time and memory. Anyway, in many scenarios the total execution time required to complete a task is the most restrictive constraint. Binary image processing algorithms, for example, represent a fundamental pre- and post-processing operation in most of the state-of-the-art image and video analysis pipelines, even when they are based on deep learning techniques. For this reason, having a fast implementation is crucial, especially when these pipelines must be employed in real time scenarios in which compromising the output quality result or resort to more powerful hardware is not a choice.
This thesis introduces and explores different approaches for the optimization of all the binary image processing algorithms that can be modeled with decision tables. There is a large amount of algorithms that can be defined in such a way: Connected Component Labeling, Thinning, Chain Code, and Morphological operators are some of them. Generally, all those algorithms in which the output value for each image pixel is obtained from the value of the pixel itself and of some of its neighbors can be defined using decision tables.
Focusing on Connected Component Labeling, this thesis analyzes the state-of-the-art approaches for both sequential CPU-based and parallel CPU- and GPU-based environments, focusing on how to fairly measure performance. We then introduce novel approaches to further optimize such a kind of algorithms, showing how these optimization techniques can be generalized to boost the performance of any algorithm modeled with decision tables. A framework that allows to automatically apply the optimization strategies to a given problem is then presented. The framework, called GRAPHGEN, takes a definition of the problem in terms of conditions to check and actions to be performed as input and it is able to produce the C++ code including all the required optimizations as output. When compared to existing approaches, the algorithms generated with GRAPHGEN perform significantly better than previous state-of-the-art algorithms, on real-world and synthetic datasets
Indexing of Historical Document Images: Ad Hoc Dewarping Technique for Handwritten Text
This work presents a research project, named XDOCS, aimed at extending to a much wider audience the possibility to access a variety of historical documents published on the web. The paper presents an overview of the indexing process that will be used to achieve the goal, focusing on the adopted dewarping technique. The proposed dewarping approach performs its task with the help of a transformation model which maps the projection of a curved surface to a 2D rectangular area. The novelty introduced with this work regards the possibility of applying dewarping to document images which contain both handwritten and typewritten text
A Hierarchical Quasi-Recurrent approach to Video Captioning
Video captioning has picked up a considerable measure of attention thanks to the use of Recurrent Neural Networks, since they can be utilized to both encode the input video and to create the corresponding description. In this paper, we present a recurrent video encoding scheme which can find and exploit the layered structure of the video. Differently from the established encoder-decoder approach, in which a video is encoded continuously by a recurrent layer, we propose to employ Quasi-Recurrent Neural Networks, further extending their basic cell with a boundary detector which can recognize discontinuity points between frames or segments and likewise modify the temporal connections of the encoding layer. We assess our approach on a large scale dataset, the Montreal Video Annotation dataset. Experiments demonstrate that our approach can find suitable levels of representation of the input information, while reducing the computational requirements
A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes: Implementation and Reproducibility Notes
This paper provides a detailed description of how to install, setup, and use the YACCLAB benchmark to test the algorithms published in "A Heuristic-Based Decision Tree for Connected Components Labeling of 3D Volumes," underlying how the parameters affect and influence experimental results
One DAG to Rule Them All
In this paper, we present novel strategies for optimizing the performance of many binary image processing algorithms. These strategies are collected in an open-source framework, GRAPHGEN, that is able to automatically generate optimized C++ source code implementing the desired optimizations. Simply starting from a set of rules, the algorithms introduced with the GRAPHGEN framework can generate decision trees with minimum average path-length, possibly considering image pattern frequencies, apply state prediction and code compression by the use of Directed Rooted Acyclic Graphs (DRAGs). Moreover, the proposed algorithmic solutions allow to combine different optimization techniques and significantly improve performance. Our proposal is showcased on three classical and widely employed algorithms (namely Connected Components Labeling, Thinning, and Contour Tracing). When compared to existing approaches —in 2D and 3D—, implementations using the generated optimal DRAGs perform significantly better than previous state-of-the-art algorithms, both on CPU and GPU
Long-Range 3D Self-Attention for MRI Prostate Segmentation
The problem of prostate segmentation from Magnetic Resonance Imaging (MRI) is an intense research area, due to the increased use of MRI in the diagnosis and treatment planning of prostate cancer. The lack of clear boundaries and huge variation of texture and shapes between patients makes the task very challenging, and the 3D nature of the data makes 2D segmentation algorithms suboptimal for the task.
With this paper, we propose a novel architecture to fill the gap between the most recent advances in 2D computer vision and 3D semantic segmentation. In particular, the designed model retrieves multi-scale 3D features with dilated convolutions and makes use of a self-attention transformer to gain a global field of view. The proposed Long-Range 3D Self-Attention block allows the convolutional neural network to build significant features by merging together contextual information collected at various scales. Experimental results show that the proposed method improves the state-of-the-art segmentation accuracy on MRI prostate segmentation
Optimizing Resource Allocation in Public Healthcare: A Machine Learning Approach for Length-of-Stay Prediction
Effective hospital resource management hinges on established metrics such as Length of Stay (LOS) and Prolonged Length of Stay (pLOS). Reducing pLOS is associated with improved patient outcomes and optimized resource utilization (e.g., bed allocation). This study investigates several Machine Learning (ML) models for both LOS and pLOS prediction. We conducted a retrospective study analyzing data from general inpatients discharged between 2022 and 2023 at a northern Italian hospital. Sixteen regression and twelve classification algorithms were compared in forecasting LOS as either a continuous or multi-class variable (1-3 days, 4-10 days, >10 days). Additionally, the same models were assessed for pLOS prediction (defined as LOS exceeding 8 days). All models were evaluated using two variants of the same dataset: one containing only structured data (e.g., demographics and clinical information), and a second one also containing features extracted from free-text diagnosis. Ensemble models, leveraging the combined strengths of multiple ML algorithms, demonstrated superior accuracy in predicting both LOS and pLOS compared to single-algorithm models, particularly when utilizing both structured and unstructured data extracted from diagnoses. Integration of ML, particularly ensemble models, has the potential to significantly improve LOS prediction and identify patients at high risk of pLOS. Such insights can empower healthcare professionals and bed managers to optimize patient care and resource allocation, promoting overall healthcare efficiency and sustainability
A Deep-Learning-Based Method for Real-Time Barcode Segmentation on Edge CPUs
Barcodes are a critical technology in industrial automation, logistics, and retail, enabling fast and reliable data capture. While deep learning has significantly improved barcode localization accuracy, most modern architectures remain too computationally demanding for real-time deployment on embedded systems without dedicated hardware acceleration. In this work, we present BaFaLo (Barcode Fast Localizer), an ultra-lightweight segmentation-based neural network for barcode localization. Our model is specifically optimized for real-time performance on low-power CPUs while maintaining high localization accuracy for both 1D and 2D barcodes. It features a two-branch architecture—comprising a local feature extractor and a global context module—and is tailored for low-resolution inputs to improve inference speed further. We benchmark BaFaLo against several lightweight architectures for object detection or segmentation, including YOLO Nano, Fast-SCNN, BiSeNet V2, and ContextNet, using the BarBeR dataset. BaFaLo achieves the fastest inference time among all deep-learning models tested, operating at 57.62ms per frame on a single CPU core of a Raspberry Pi 3B+. Despite its compact design, it achieves a decoding rate nearly equivalent to YOLO Nano for 1D barcodes and only 3.5 percentage points lower for 2D barcodes while being approximately nine times faster
ClusterFix: A Cluster-Based Debiasing Approach without Protected-Group Supervision
The failures of Deep Networks can sometimes be ascribed to biases in the data or algorithmic choices. Existing debiasing approaches exploit prior knowledge to avoid unintended solutions; we acknowledge that, in real-world settings, it could be unfeasible to gather enough prior information to characterize the bias, or it could even raise ethical considerations. We hence propose a novel debiasing approach, termed ClusterFix, which does not require any external hint about the nature of biases. Such an approach alters the standard empirical risk minimization and introduces a per-example weight, encoding how critical and far from the majority an example is. Notably, the weights consider how difficult it is for the model to infer the correct pseudo-label, which is obtained in a self-supervised manner by dividing examples into multiple clusters. Extensive experiments show that the misclassification error incurred in identifying the correct cluster allows for identifying examples prone to bias-related issues. As a result, our approach outperforms existing methods on standard benchmarks for bias removal and fairness
Sustainable Use of Resources in Hospitals: A Machine Learning-Based Approach to Predict Prolonged Length of Stay at the Time of Admission
Introduction. Length of Stay (LOS) and Prolonged Length of Stay (pLOS) are critical indicators of hospital efficiency. Reducing pLOS is crucial for patient safety, autonomy, and bed allocation. This study investigates different machine learning (ML) models to predict LOS and pLOS. Methods. We analyzed a dataset of patients discharged from a northern Italian hospital between 2022 and 2023 as a retrospective cohort study. We compared sixteen regression algorithms and twelve classification methods for predicting LOS as either a continuous or multi-class variable (1-3 days, 4-10 days, >10 days). We also evaluated pLOS prediction using the same models, having pLOS defined as any hospitalization with LOS longer than 8 days. We further analyzed all models using two versions of the same dataset: one containing only structured data (e.g. demographics and clinical information), whereas the second one also containing features extracted from free-text diagnosis. Results. Our results indicate that ensemble models achieved the highest prediction accuracy for both LOS and pLOS, outperforming traditional single-algorithm models, particularly when using both structured and unstructured data extracted from diagnoses. Discussion. The integration of ML, particularly ensemble models, can significantly improve LOS prediction and identify patients at increased risk of pLOS. This information can guide healthcare professionals and bed managers in making informed decisions to enhance patient care and optimize resource allocation
- …
