Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)
Not a member yet
343 research outputs found
Sort by
Enhanced Bell Pepper and Grape Leaf Disease Classification Using a Depthwise Separable VGG19-Capsule Network
Crop disease is a significant problem in the agricultural sector, leading to decreased food production and causing substantial economic losses for farmers in farming regions. Nowadays, computer vision and deep learning models can detect and diagnose leaf diseases in their early stages, which may assist farmers and contribute to ensuring food security. This research introduces a hybrid Depth-wise Separable VGG19 and Capsule Network (VGG19-CapsNet) architecture for automated leaf disease detection and classification in bell pepper and grape plants. The novel contribution lies in the enhanced VGG19 architecture, incorporating depth-wise separable convolution, batch normalization, and a 40% dropout by introducing convolutional layers before the primary capsule layer. The process involves extracting features from VGG19, flattening them into vectors, and utilizing them as input for the capsule layer. This ensures the capsule network effectively captures spatial information and preserves the hierarchical relationships between features. A noteworthy aspect of this research work is introducing an ensemble activation function, fusing Leaky Rectified Linear Unit (Leaky ReLU) and Gaussian Error Linear Unit (GELU). A hybrid architecture combining VGG19 and CapsNet, using DWSC and batch renormalization with a dropout rate of 0.4, a learning rate of 0.001, and a batch size of 9, successfully captures complex patterns for categorizing diseases in bell pepper and grape plants. The performance of the plant disease classification model is enhanced by using Leaky ReLU activation functions and GELU, which increase the non-linearity and ensemble learning of the VGG19 model. The proposed VGG19-CapsNet framework is developed and deployed in a 128-core Jetson Nano single-board computer with graphics processing support. The research outcomes set a benchmark for accuracy and present a paradigm shift in automated leaf disease classification. The benchmark datasets PlantifyDr, Plant village and custom dataset are used to train and develop the proposed VGG19-CapNet deep learning model. Through extensive comparative analyses on various datasets and field tests, the proposed architecture has demonstrated superior performance in terms of accuracy (99.81%, 99.84%), precision (99.84%, 99.84 %), recall (99.79%, 99.84%), sensitivity (99.94%, 99.84%), F1-score (99.81%, 99.84%), and AUC (1.0, 1.0) for bell pepper, and grape leaves across different datasets. It demonstrates the potential to transform agriculture with innovative methodologies tailored for bell pepper and grape diseases
A Panoptic Segmentation for Indoor Environments using MaskDINO: An Experiment on the Impact of Contrast
Robot perception involves recognizing the surrounding environment, particularly in indoor spaces like kitchens, classrooms, and dining areas. This recognition is crucial for tasks such as object identification. Objects in indoor environments can be categorized into "things," with fixed and countable shapes (e.g., tables, chairs), and "stuff," which lack a fixed shape and cannot be counted (e.g., sky, walls). Object detection and instance segmentation methods excel in identifying "things," with instance segmentation providing more detailed representations than object detection. However, semantic segmentation can identify both "things" and "stuff" but lacks segmentation at the object level. Panoptic segmentation, a fusion of both methods, offers comprehensive object and stuff identification and object-level segmentation. Considerations need to be made regarding the variabilities of room conditions in contrast to implementing panoptic segmentation indoors. High or low contrast in the room potentially reduces the clarity of the shape of an object, thus affecting the segmentation results of that object. We experimented with how contrast varieties impact the panoptic segmentation performance using the MaskDINO model, the first on the panoptic quality (PQ) leaderboard. We then improved the model generalization on the various contrasts by re-optimizing it using a contrast-augmented dataset
Implementation of Explainable Ai in Deep Learning Methods for Multiclass Classification of Plant Diseases in Mango Leaves
Maintaining optimal yield plays a crucial role in the prosperity of agriculture and in turn the economy of the country. One way to optimize this yield is by early and accurate detection and diagnosis of crop diseases. Traditional methods that involve manual inspection or the like tend to be tedious and often inaccurate. Hence the use of machine learning and convolutional neural networks have proven to be of great advantage in terms of accuracy, reliability, ease of implementation etc. This paper explores various deep learning models such as AlexNet, ResNet, Swin Transformer, Vgg-16, vit model for plant leaf disease detection and classification on a dataset of mango leaves and compares aspects such as accuracy and loss. Further the models have been combined using feature fusion, and their accuracies compared. Finally, a combination of ResNet and AlexNet has been proposed with an impressive accuracy of 99.97%. Further, Grad-CAM (Gradient-weighted Class Activation Mapping) has been implemented to highlight important regions in the leaf images which improves visualization. This can potentially provide an accurate identification and classification of plant diseases based on leaf images
State-of-the-art DNN techniques for lung cancer diagnosis using chest CT scans: A review
This paper reviews state-of-the-art literature on the early diagnosis of lung cancer with deep neural network techniques and chest CT scans. First, a brief introduction to the significance of lung cancer and the need for this review is stated. The architectures of the deep neural networks, evaluation methods, and the comprehensive review of recent progress in lung cancer diagnosis based on deep neural network techniques are provided. Further, the comparative analysis of the literature is presented. A critical discussion on the existing datasets, various methodologies, and challenges in the diagnosis are presented. The performances of deep neural network-based techniques for segmentation, nodule detection, and nodule classification are also discussed. This review covers the malignancy classification along with the nodule detection tasks. Thus, this may provide necessary information to all the researchers to prepare a robust methodology for early detection of lung cancer and hence proper diagnosis
A Reversible Data Hiding Techniques For Improved Embedding Capacity Using Image Interpolation
High capacity steganography is still challenging today in the field of information security. The demandfor the exact retrieval of the cover media from stego-image after the extraction of secret data is also increasing.Using reversible information hiding techniques, the cover image can be recovered at the time of extraction ofsecret messages. Two techniques are proposed in this paper. In the first technique, the image is interpolated usinga new interpolation technique and the second technique uses a High Capacity Reversible Steganography usingMulti-layer Embedding (CRS) method for image interpolation. In both the techniques, the secret data areembedded in the cover image by Exclusive OR (XOR) operation. The proposed techniques give high embeddingcapacity and preserve image quality. The experimental results show that the proposed techniques offer betterresults over the existing techniques
Deep Learning Approach for the Morphological Differentiation of Corn Seed Types
Corn is one of Indonesia\u27s main food ingredients that contains the second largest source of carbohydrates after rice. Classification of the type and quality of corn seeds is still conducted manually by farmers. This procedure is time-consuming and can result in inaccuracies in sorting. Morphology has important characteristics to determine varieties such as size, color, area and seed shape. Some of these attributes, if measured manually, will take a long time and complexity that requires special expertise. The right way to describe these characteristics is to utilize machine learning. The machine learning used is CNN (Convolutional Neural Network). The CNN models used are ResNet101, Resnet50, VGG-19 and MobileNetV2. An analysis of the performance of the model was carried out using a confusion matrix. The results of the CNN model performance parameters for the classification of corn seed varieties with the ResNet101 model showed an accuracy of 89.8%, a precision of 86.9%, a recall of 88.3% and an F1-score of 86.4%. The ResNet50 model showed an accuracy of 86.27%, a precision of 83.2%, a recall of 84.1% and an F1-score of 83.4%. While the VGG-19 model showed an accuracy of 76.47%, a precision of 66.8%, a recall of 78.% and an F1-score of 71.1%. Meanwhile, the MobileNetV2 model showed an accuracy of 73.34%, a precision of 69%, a recall of 69.8% and an F1-score of 69.8%
Improving Slow-Moving Object Detection in Complex Environments Using a Feature Pooling Enhanced Encoder-Decoder Model: EDM-SMOD
The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98.86% and a lower average misclassification error (AMCE) value of 0.85. Furthermore, the algorithm’s effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance
Enhanced Bird Species Image Recognition and Classification using MobileNet and InceptionV3 Transfer learning Architectures
The proposed study explores the application of transfer learning techniques in bird species image classification, specifically focusing on the MobileNet and InceptionV3 models. Utilizing the CUB-200-2011 dataset, the transfer learning is employed to enhance classification accuracy. The MobileNet model achieved an impressive accuracy of 74.60%, outperforming InceptionV3, which recorded an accuracy of 64.00%. The corresponding loss values were 0.8685 for MobileNet and 1.128 for InceptionV3, highlighting MobileNet\u27s superior alignment with actual class labels. Additionally, MobileNet demonstrated a precision range of 0.45 to 0.93, while InceptionV3\u27s precision ranged from 0.65 to 0.81. The F1-scores revealed MobileNet\u27s performance ranged from 0.40 to 0.91, in contrast to InceptionV3’s lower F1-scores, indicating a more stable but less effective classification ability. These findings underscore the potential of MobileNet as a lightweight, efficient alternative for wildlife image classification tasks, making it particularly suitable for deployment in resource-constrained environments. The developed user interface allows for seamless interaction, enabling users to upload images and receive immediate classification results, further demonstrating the practical application of these models in conservation and biodiversity preservation efforts
Supervised Deep Learning Approaches For Anomaly Detection And Recognition In Crowd Scenes
These days consciousness about public safety increases and CCTV cameras are installed at almost all public places. But generally automatic smart surveillance systems are not available. In this manuscript, emphasis is given to detect and classify abnormal events in surveillance video especially in crowd environments. Abnormal event detection is a challenging task because the definition of abnormality is subjective. A normal event in one situation can be considered an abnormal event in another case. In the surveillance video with a dense crowd, automatic anomaly detection becomes very difficult because of clutter and severe occlusion.
This manuscript represents CNN (Convolutional Neural Network) and CNN-LSTM (Convolutional Neural Network-Long Short-Term Memory) based approaches for detection and classification of abnormal events. The CNN architecture is developed from scratch and can be used for spatial domains. LSTM architecture is developed for the temporal domain. Feature sequences are generated using CNN model and given as input to LSTM model. Experiments are carried out using five different publicly available benchmark datasets. The performance is measured by accuracy and area under the ROC (receiver operating characteristic) curve (AUC). CNN-LSTM approach works better than only CNN
A Computational Approach to Color Vision Enhancement Using Deep Learning, Tensorflow and Keras
Individuals afflicted with color vision deficiency (CVD) often face obstacles in effectively navigating and engaging with their surroundings due to challenges in accurately discerning colors. Such limitations can hinder a range of daily activities, compelling these individuals to rely on external assistance for color-centric tasks, potentially curtailing their autonomy and inclusiveness. In response to these impediments, our study centers on the design and implementation of a machine learning-driven color adaptation framework. Utilizing the TensorFlow and Keras libraries, this system harnesses sophisticated machine learning methodologies to detect and modify colors within visual content, thereby augmenting perceptibility for those with CVD. Our principal aim is to equip individuals with CVD with a pragmatic tool that enhances color clarity in images, facilitating self-evaluation of their visual condition. This innovation targets to bolster navigational capabilities, diminish reliance on external assistance for colororiented activities, and advance inclusivity via technological advancements. Furthermore, our investigation underscores the precision and dependability of the machine learning algorithms through meticulous testing and validation protocols, guaranteeing robust performance across diverse contexts and image categories. A user-centric and easily navigable graphical user interface (GUI) is emphasized to accommodate users with varying technical proficiencies. Beyond the immediate technological impact, our research aspires to amplify awareness and deepen comprehension of color vision deficiency within the wider populace, thereby fostering a society characterized by enhanced equity and accessibility for all