Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
Not a member yet
343 research outputs found
Sort by
Deep Learning based-framework for Math Formulas Understanding
Extracting mathematical formulas from images of scientific documents and converting them into structured data for storage in a database is essential for their further use. However, recognizing and extracting math formulas automatically, rapidly, and effectively can be challenging. To handle this problem, we have proposed a system, with a deep learning architecture, that uses the formula combination features to train the YOLOv8 model. This system can detect and classify the formula inside and outside the text. Once extracted, we built a robust end-to-end math formula recognition system that automatically identifies and classifies math symbols, using the faster R-CNN object detection, then a Convolution Graphical Neural network (ConvGNN) to analyze the math formula layout, as the formula is better represented as a graph with complex relationships and object interdependency. ConvGNN can predict formula linkages without resorting to laborious feature engineering. Experimental results on the IBEM and CROHME 2019 datasets reveal that the proposed approach can accurately extract isolated formulas with mAP of 99.3\%, embedded formulas with mAP of 80.3%, detect symbols with mAP of 87.3%, and analyze formula layout with an accuracy of 92%. We also showed that our system is competitive with related work
On Performance Analysis Of Diabetic Retinopathy Classification
This paper describes the Classification of bulk OCT retinal fundus images of normal and diabetic retinopathy using the Intensity histogram features, Gray Level Co-Occurrence Matrix (GLCM), and the Gray Level Run Length Matrix (GLRLM) feature extraction techniques. Three features—Intensity histogram features, GLCM, and GLRLM were taken and, that features were compared fairly. A total of 301 bulk OCT retinal fundus color images were taken for two different varieties which are normal and diabetic retinopathy. For classification and feature extraction, a filtered image output based on a fourth-order PDE is used. Using OCT retinal fundus images, the most effective feature extraction method is identified
Dr DAH-Unet: A modified UNet for Semantic Segmentation of MRI images for brain tumour detection
Using sophisticated image processing techniques on brain MR images for medical image segmentation significantly improves the ability to detect tumors. It takes a lot of time and requires a doctor\u27s training and experience to manually segment a brain tumor. To address this issue, we proposed a modification in Unet architecture called DAH-Unet that combines residual blocks, a rebuilt atrous spatial pyramid pooling (ASPP), and depth-wise convolutions. Also, a hybrid loss function which is explicitly aware of the boundaries is another thing we suggested. Experiments were conducted on two publicly available dataset and proved better in some metrics as compare to existing semantic segmentation models.
 
A Labeled Array Distance Metric for Measuring Image Segmentation Quality
This work introduces two new distance metrics for comparing labeled arrays, which are common outputs of image segmentation algorithms. Each pixel in an image is assigned a label, with binary segmentation providing only two labels (\u27foreground\u27 and \u27background\u27). These can be represented by a simple binary matrix and compared using pixel differences. However, many segmentation algorithms output multiple regions in a labeled array. We propose two distance metrics, named LAD and MADLAD, that calculate the distance between two labeled images. By doing so, the accuracy of different image segmentation algorithms can be evaluated by measuring their outputs against a \u27ground truth\u27 labeling. Both proposed metrics, operating with a complexity of O(N) for images with N pixels, are designed to quickly identify similar labeled arrays, even when different labeling methods are used. Comparisons are made between images labeled manually and those labeled by segmentation algorithms. This evaluation is crucial when searching through a space of segmentation algorithms and their hyperparameters via a genetic algorithm to identify the optimal solution for automated segmentation, which is the goal in our lab, SEE-Insight. By measuring the distance from the ground truth, these metrics help determine which algorithm provides the most accurate segmentation
Robust fingerprint recognition approach based on diagonal slice of polyspectra in the polar space
Although fingerprint recognition is a mature technology and nowadays commercial state-of-the-art systems can be successfully used in a number of real applications, not all the problems have been solved and the research is still very active in the field. This paper presents a new approach for estimating the shift and rotation parameters between fingerprint images stored in the dadabase that operates in the third-order frequency-domain measure called the auto-bispectrum, which, allows us to estimate the shift and the rotation separately. The diagonal slices and their spectra of auto- and cross-bispectrum are proposed. The rotation parameters are estimated from the remaining polar sampled the third-order spectrum using cross-correlation, and then, after compensating for rotation, we may easily estimate the translational component, e.g., by using phase correlation.Experimental evidence of this performance is presented, and the mathematical reasons behind these characteristics are explained in depth. We compare our approach in a simulation to other frequency-domain fingerprint recognition algorithms. We find that our algorithm can better estimate shift and rotation parameters than the other methods
Off-line identifying Script Writers by Swin Transformers and ResNeSt-50
In this work, we present two advanced models for identifying script writers, leveraging the power of deep learning. The proposed systems utilize the new vision Swin Transformer and ResNeSt-50. Swin Transformer is known for its robustness to variations and ability to model long-range dependencies, which helps capture context and make robust predictions. Through extensive training on large datasets of handwritten text samples, the Swin Transformer operates on sequences of image patches and learns to establish a robust representation of each writer’s unique style. On the other hand, ResNeSt-50 (Residual Neural Network with Squeeze-and-Excitation (SE) and Next Stage modules), with its multiple layers, helps in learning complex representations of a writer’s unique style and distinguishing between different writing styles with high precision. The SE module within ResNeSt helps the model focus on distinctive handwriting characteristics and reduce noise. The experimental results demonstrate exceptional performance, achieving an accuracy of 98.50% (at patch level) by the Swin Transformer on the CVL database, which consists of images withcursively handwritten German and English texts, and an accuracy of 96.61% (at page level) by ResNeSt-50 on the same database. This research advances writer identification by showcasing the effectiveness of the Swin Transformer and ResNeSt-50. The achieved accuracy underscores the potential of these models to process and understand complex handwriting effectively
A Multimodal Biometric Authentication System Using of Autoencoders and Siamese Networks for Enhanced Security
Ensuring secure and reliable identity verification is crucial, and biometric authentication plays a significant role in achieving this. However, relying on a single biometric trait, unimodal authentication, may have accuracy and attack vulnerability limitations. On the other hand, multimodal authentication, which combines multiple biometric traits, can enhance accuracy and security by leveraging their complementary strengths. In the literature, different biometric modalities, such as face, voice, fingerprint, and iris, have been studied and used extensively for user authentication. Our research introduces a highly effective multimodal biometric authentication system with a deep learning approach. Our study focuses on two of the most user-friendly safety mechanisms: face and voice recognition. We employ a convolutional autoencoder for face images and an LSTM autoencoder for voice data to extract features. These features are then combined through concatenation to form a joint feature representation. A Siamese network carries out the final step of user identification. We evaluated our model’s efficiency using the OMG-Emotion and RAVDESS datasets. We achieved an accuracy of 89.79% and 95% on RAVDESS and OMG-Emotion datasets, respectively. These results are obtained using a combination of face and voice modality
Infrared Thermography For Seal Defects Detection On Packaged Products: Unbalanced Machine Learning Classification With Iterative Digital Image Restoration
Non-destructive and online defect detection on seals is increasingly being deployed in packaging processes, especially for food and pharmaceutical products. It is a key control step in these processes as it curtails the costs of these defects.
To address this cause, this paper highlights a combination of two cost-effective methods, namely machine learning algorithms and infrared thermography. Expectations can, however, be restricted when the training data is small, unbalanced, and subject to optical imperfections.
This paper proposes a classification method that tackles these limitations. Its accuracy exceeds 93% with two small training sets, including 2.5 to 10 times fewer negatives. Its algorithm has a low computational cost compared to deep learning approaches, and does not need any prior statistical studies on defects characterization
Rip Current: A Potential Hazard Zones Detection in Saint Martin’s Island using Machine Learning Approach
Beach hazards would be any occurrences potentially endanger individuals as well as their activity. Rip current, or reverse current of the sea, is a type of wave that pushes against the shore and moves in the opposite direction, that is, towards the deep sea. The management of access to the beach sometimes accidentally push unwary beachgoers forward into rip-prone regions, increasing the probability of a drowning on that beach. The research suggests an approach for something like the automatic detection of rip currents with waves crashing based on convolutional neural networks (CNN) and machine learning algorithms (MLAs) for classification. Several individuals are unable to identify rip currents in order to prevent them. In addition, the absence of evidence to aid in training and validating hazardous systems hinders attempts to predict rip currents. Security cameras and mobile phones have still images of something like the shore pervasive and represent a possible cause of rip current measurements and management to handle this hazards accordingly. This work deals with developing detection systems from still beach images, bathymetric images, and beach parameters using CNN and MLAs.The detection model based on CNN for the input features of beach images and bathymetric images has been implemented. MLAs have been applied to detect rip currents based on beach parameters. When compared to other detection models, bathymetric image-based detection models have significantly higher accuracy and precision. The VGG16 model of CNN shows maximum accuracy of 91.13% (Recall = 0.94, F1-score = 0.87) for beach images. For the bathymetric images, the highest performance has been found with an accuracy of 96.89% (Recall= 0.97, F1-score=0.92) for the DenseNet model of CNN. The MLA-based model shows an accuracy of 86.98% (Recall=0.89, F1-score= 0.90) for random forest classifier. Once we know about the potential zone of rip current continuosly generating rip current, then the coastal region can be managed accordingly to prevent the accidents occured due to this coastal hazards
Improved Classification of Histopathological images using the feature fusion of Thepade sorted block truncation code and Niblack thresholding
Histopathology is the study of disease-affected tissues, and it is particularly helpful in diagnosis and figuring out how severe and rapidly a disease is spreading. It also demonstrates how to recognize a variety of human tissues and analyze the alterations brought on by sickness. Only through histopathological pictures can a specific collection of disease characteristics, such as lymphocytic infiltration of malignancy, be determined. The "gold standard" for diagnosing practically all cancer forms is a histopathological picture. Diagnosis and prognosis of cancer at an early stage are essential for treatment, which has become a requirement in cancer research. The importance and advantages of classification of cancer patients into more-risk or less-risk divisions have motivated many researchers to study and improve the application of machine learning (ML) methods. It would be interesting to explore the performance of multiple ML algorithms in classifying these histopathological images. Something crucial in this field of ML for differentiating images is feature extraction. Features are the distinctive identifiers of an image that provide a brief about it. Features are drawn out for discrimination between the images using a variety of handcrafted algorithms. This paper presents a fusion of features extracted with Thepade sorted block truncation code (TSBTC) and Niblack thresholding algorithm for the classification of histopathological images. The experimental validation is done using 960 images present in the Kimiapath-960 dataset of histopathological images with the help of performance metrics like sensitivity, specificity and accuracy. Better performance is observed by an ensemble of TSBTC N-ary and Niblack\u27s thresholding features as 97.92% of accuracy in 10-fold cross-validation