1,720,983 research outputs found
Metodi e software per la ricostruzione di immagini in diversi ambienti paralleli.
Da molti anni le immagini digitali sono state utilizzate in vari settori della scienza, come astronomia, biologia e medicina, e in tutti i casi queste sono inevitabilmente danneggiate da rumore e sfocatura. In particolare il rumore è principalmente dovuto alla conversione del segnale da analogico a digitale, mentre la sfocatura solitamente è conseguenza della natura delle osservazioni, quali presenza di atmosfera, aberrazione delle lenti o effetti di diffrazione della luce.
Una possibile strategia per la ricostruzione di queste immagini prevede un approccio statistico, noto come massima verosimiglianza.
Tutti gli algoritmi appartenenti a questo gruppo presuppongono che l'immagine in ingresso sia affetta da rumore, di cui è nota la densità di probabilità, e tentano di trovare l'immagine iniziale più probabile data l'osservazione. Questo porta a risolvere problemi di ottimizzazione le cui funzioni obiettivo dipendono da questa densità di probabilità.
Sebbene questi metodi di deconvoluzione diano spesso risultati accettabili, in campi applicativi come microscopia a fluorescenza e astronomia, essi sono computazionalmente intensivi, tipicamente a causa della grande dimensione dei dati. Una possibile soluzione per superare questo problema è la parallelizzazione del calcolo. Le strategie di parallelizzazione più comuni attualmente possono essere divise in due categorie principali.
Per prima cosa, esistono algoritmi che sono implementati come applicazioni multi-thread e sviluppati per le architetture SMP (symmetric multiprocessing).
Questo modello presuppone che tutte le unità di elaborazione (processori) siano identiche e usino un'unica memoria condivisa.
Tali applicazioni parallele possono sfruttare le moderne architetture CPU multi-core, workstaion con diversi processori e memoria globale condivisa, o un diverso tipo di dispositivi multiprocessore ad alte prestazioni come le GPU (schede grafiche).
In secondo luogo, gli algoritmi possono essere progettati per architetture a memoria distribuita. In questo caso ogni processore ha la propria memoria privata e i processori possono operare solo su dati locali, dovendo comunicare con altri per accedere ai loro dati.
Software sviluppato con questo approccio può sfruttare un cluster di computer (un gruppo di PC collegati in rete) e può anche essere eseguito in modo efficiente su architetture SMP, per velocizzare i calcoli suddividendo il carico di lavoro tra i thread.
Tuttavia la deconvoluzione di immagini di grandi dimensioni può essere ancora un problema.
I computer general-purpose di solito contengono solo pochi GB di memoria e un processore con due o quattro core.
Le workstation multiprocessore sono costose e contengono non più di quattro CPU.
Le GPU potrebbero essere una buona soluzione per ottenere una migliore velocità, grazie alla loro architettura massicciamente parallela.
In questa tesi sono presentate due versioni parallele del metodo del gradiente scalato proiettato (SGP) per risolvere i problemi di ottimizzazione che sorgono nella deconvoluzione di immagini e modificate per affrontare anche la riduzione dell'effetto di bordo tipicamente introdotto dagli algoritmi di ricostruzione.
La prima versione sfrutta l'ambiente Message Passing Interface (MPI) e funziona in modo efficiente su cluster di computer, l'altra sfrutta l'ambiente CUDA per l'utilizzo con GPU NVidia.
L'implementazione è stata progettata originalmente per i casi a 2 dimensioni o per immagini multiple in 2D, ma è stata estesa per immagini N-dimensionali.
L'efficacia delle implementazioni parallele è stata valutata ricostruendo immagini di piccoli e grandi dimensioni, dando risultati notevoli e accelerazioni molto promettenti rispetto alle versioni scalari.
Una sperimentazione numerica intensiva su immagini reali provenienti da astronomia e microscopia ha dimostrato che gli algoritmi proposti sono uno strumento molto promettente per deconvolvere in tempo reale immagini di grandi dimensioni.Since many years, digital images have been used in various areas of science, such as astronomy, biology and medicine, and in all cases these are unavoidably corrupted by noise and blur.
In particular the noise is mainly due to the conversion from analogical to digital signal, while the blur usually results from the nature of the observations, such as presence of atmosphere, lenses aberration or light diffraction effects.
A possible strategy for the restoration of this kind of images involves a statistical approach, known as Maximum Likelihood (ML) estimate.
All the algorithms belonging to this group assume that the input image is affected by noise whose probability density is known and attempt to find the most probable source image.
This leads to solve optimization problems whose objective function depend on the assumed noise probability density.
Although these statistical deconvolution methods often give acceptable results in applicative fields as fluorescence microscopy and astronomy, they are computationally intensive due to the typical large size of the data.
A possible solution to overcome this problem is the parallelization of the computation.
The most common parallelization strategies exploited currently may be divided into two main categories.
First, there are algorithms that are implemented as multi-threaded applications and developed for symmetric multiprocessing architectures (SMP).
This model assumes that all processing units (processors) are identical and use single shared main memory.
Such parallel application can use nowadays PC machines which multi-core CPU, workstations with several processors and shared global memory, or a different kind of high-performance multiprocessors devices such as the GPU (Graphical Process Unit).
Secondly, algorithms can be designed for distributed memory architectures. In this case, each processor has its own private memory.
Computational tasks can only operate on local data and must communicate with other tasks to access their data.
Software developed with this approach can use a computer cluster (a group of network-linked PC machines or workstations) and can also efficiently work on SMP architectures.
They can use SMP to speed the computation by splitting the calculation among threads.
However, deconvolution of large images may be still a problem. Standard computers usually contain only a few GBs of memory and one processor with two or four cores.
Multiprocessor workstations are expensive and contain no more than four CPUs.
Some GPU architectures could be a good solution to achieve the best speed, because of their massive parallel structure.
In this thesis we propose two parallel versions of the Scaled Gradient
Projection (SGP) method for solving the optimization problem arising in image deconvolution, including an extension to face also the reduction of the boundary effect typically introduced by the reconstruction algorithms.
The first version exploits the Message Passing Interface (MPI) environment and works efficiently on clusters of computers; the other version exploits the NVidia CUDA framework for GPU devices.
The implementation was designed principally for 2-dimensional cases or multiple 2D images, but it has been extended to work for N-dimensional images.
The effectiveness of the parallel schemes have been evaluated in image
deblurring on small and large scale problems, giving remarkable results and very promising speedups with respect to the scalar versions.
An intensive numerical experimentation on realistic images from applications in astronomy and microscopy showed that the proposed schemes is a very promising tool for achieving real time
deconvolution of very large images
A Novel Real-Time Edge-Cloud Big Data Management and Analytics Framework for Smart Cities
Exposing city information to dynamic, distributed, powerful, scalable, and user-friendly big data systems is expected to enable the implementation of a wide range of new opportunities; however, the size, heterogeneity and geographical dispersion of data often makes it difficult to combine, analyze and consume them in a single system. In the context of the H2020 CLASS project, we describe an innovative framework aiming to facilitate the design of advanced big-data analytics workflows. The proposal covers the whole compute continuum, from edge to cloud, and relies on a well-organized distributed infrastructure exploiting: a) edge solutions with advanced computer vision technologies enabling the real-time generation of “rich” data from a vast array of sensor types; b) cloud data management techniques offering efficient storage, real-time querying and updating of the high-frequency incoming data at different granularity levels. We specifically focus on obstacle detection and tracking for edge processing, and consider a traffic density monitoring application, with hierarchical data aggregation features for cloud processing; the discussed techniques will constitute the groundwork enabling many further services. The tests are performed on the real use-case of the Modena Automotive Smart Area (MASA)
Efficient multi-image deconvolution in astronomy
The deconvolution of astronomical images by the Richardson-Lucy method (RLM) is extended here to the problem of multiple image deconvolution
and the reduction of boundary effects. We show the multiple image RLM in its accelerated gradient-version SGP (Scaled Gradient Projection).
Numerical simulations indicate that the approach can provide excellent results with a considerable reduction of the boundary effects. Also exploiting
GPUlib applied to the IDL code, we obtained a remarkable acceleration of up to two orders of magnitude
Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCs
High-performance embedded platforms are increasingly adopting heterogeneous systems-on-chip (HeSoC) that couple multi-core CPUs with accelerators such as GPU, FPGA, or AI engines. Adopting HeSoCs in the context of real-time workloads is not immediately possible, though, as contention on shared resources like the memory hierarchy—and in particular the main memory (DRAM)—causes unpredictable latency increase. To tackle this problem, both the research community and certification authorities mandate (i) that accesses from parallel threads to the shared system resources (typically, main memory) happen in a mutually exclusive manner by design, or (ii) that per-thread bandwidth regulation is enforced. Such arbitration schemes provide timing guarantees, but make poor use of the memory bandwidth available in a modern HeSoC. Controlled Memory Request Injection (CMRI) is a recently-proposed bandwidth limitation concept that builds on top of a mutually-exclusive schedule but still allows the threads currently not entitled to access memory to use as much of the unused bandwidth as possible without losing the timing guarantee. CMRI has been discussed in the context of a multi-core CPU, but the same principle applies also to a more complex system such as an HeSoC. In this article, we introduce two CMRI schemes suitable for HeSoCs: Voluntary Throttling via code refactoring and Bandwidth Regulation via dynamic throttling. We extensively characterize a proof-of-concept incarnation of both schemes on two HeSoCs: an NVIDIA Tegra TX2 and a Xilinx UltraScale+, highlighting the benefits and the costs of CMRI for synthetic workloads that model worst-case DRAM access. We also test the effectiveness of CMRI with real benchmarks, studying the effect of interference among the host CPU and the accelerators
SGP-dec:A Scaled Gradient Projection method for2D and 3D images deconvolution
SGP-dec is a Matlab package for the deconvolution of 2D and 3D images corrupted by Poisson noise. Following amaximum likelihood approach, SGP-dec computes a deconvolved image by early stopping an iterative method for the minimization of the generalized Kullback-Lieibler divergence. The iterative minimization method implemented by SGP-dec is a Scaled Gradient Projection (SGP) algorithm that can be considered an acceleration of the Expectation Maximization method, also known as Richardson-Lucy method. The main feature of the SGP algorithm consists in the combination of non-expensivediagonally scaled gradient directions with adaptive Barzilai-Borwein steplength rules specially designed for thesedirections; global convergence properties are ensured by exploiting a line-search strategy (monotone or nonmonotone)along the feasible direction.The algorithm SGP is provided to be used as iterative regularization method; this means that a regularized reconstruction can be obtained by early stopping the SGP sequence. Several early stopping strategies can be selected, basedon different criteria: maximum number of iterations, distance of successive iterations or function values, discrepancyprinciple; the user must choose a stopping criterion and fixsuited values for the parameters involved by the chosen criterion
SGP-IDL: a Scaled Gradient Projection method for image deconvolution in an Interactive Data Language environment
An Interactive Data Language (IDL) package for the single and multiple deconvolution of 2D images corrupted by Poisson noise, with the optional inclusion of a boundary effect correction. Following a maximum likelihood approach, SGP-IDL computes a deconvolved image by early stopping of the scaled gradient projection (SGP) algorithm for the solution of the optimization problem coming from the minimization of the generalized Kullback-Leibler divergence between the computed image and the observed image. The algorithms have been implemented also for Graphic Processing Units (GPUs)
ShareBERT: Embeddings Are Capable of Learning Hidden Layers
The deployment of Pre-trained Language Models in memory-limited devices is hindered by their massive number of parameters, which motivated the interest in developing smaller architectures.
Established works in the model compression literature showcased that small models often present a noticeable performance degradation and need to be paired with transfer learning methods, such as Knowledge Distillation.
In this work, we propose a parameter-sharing method that consists of sharing parameters between embeddings and the hidden layers, enabling the design of near-zero parameter encoders. To demonstrate its effectiveness, we present an architecture design called ShareBERT, which can preserve up to 95.5%
of BERT Base performances, using only 5M parameters (21.9× fewer parameters) without the help of Knowledge Distillation. We demonstrate empirically that our proposal does not negatively affect the model learning capabilities and that it is even beneficial for representation learning. Code will be available at https://github.com/jchenghu/sharebert
Efficient deconvolution methods for astronomical imaging: algorithms and IDL-GPU codes
Context. The Richardson-Lucy (RL) method is the most popular deconvolution method in Astronomy because it preserves the number of counts and the nonnegativity of the original object. Regularization is, in general, obtained by an early stopping of RL iterations; in the case of point-wise objects such as binaries or open star clusters, iterations can be pushed to convergence. However, it is well known that RL is not an efficient method: in most cases and, in particular, for low noise levels, acceptable solutions are obtained at the cost of hundreds or thousands of iterations. Therefore, several approaches for accelerating RL have been proposed. They are mainly based on the remark that RL is a scaled gradient method for the minimization of the Kullback-Leibler (KL) divergence, or Csiszar I-divergence, which represents the data-fidelity function in the case of Poisson noise. In this framework, a line search along the descent direction is considered for reducing the number of iterations.Aims. In a recent paper, a general optimization method, denoted as scaled gradient projection (SGP) method , has been proposed for the constrained minimization of continuously differentiable convex functions. It is applicable to the nonnegative minimization of the KL divergence. If the scaling suggested by RL is used in this method, then it provides a considerable speedup of RL. Therefore the aim of this paper is to apply SGP to a number of imaging problems in Astronomy such as single image deconvolution, multiple image deconvolution and boundary effect correction.Methods. Deconvolution methods are proposed by applying SGP to the minimization of the KL divergence for the imaging problems mentioned above and the corresponding algorithms are derived and implemented in IDL. For all the algorithms several stopping rules are introduced, including one based on a recently proposed discrepancy principle for Poisson data. For a further increase of efficiency, implementation on GPU (Graphic Processing Unit) is also considered.Results. The proposed algorithms are tested on simulated images. The speedup of SGP methods with respect to the corresponding RL methods strongly depends on the problem and on the specific object to be reconstructed, and in our simulationsit ranges from about 4 to more than 30. Moreover, significant speedups up to two orders of magnitude have been observed between the serial and parallel implementations of the algorithms. The codes are available upon request
- …
