1,721,224 research outputs found

    AI-Driven Precision Clothing Classification: Revolutionizing Online Fashion Retailing with Hybrid Two-Objective Learning

    No full text
    In the ever-expanding online fashion market, businesses in the clothing sales sector are presented with substantial growth opportunities. To utilize this potential, it is crucial to implement effective methods for accurately identifying clothing items. This entails a deep understanding of customer preferences, niche markets, tailored sales strategies, and an improved user experience. Artificial intelligence (AI) systems that can recognize and categorize clothing items play a crucial role in achieving these objectives, empowering businesses to boost sales and gain valuable customer insights. However, the challenge lies in accurately classifying diverse attire items in a rapidly evolving fashion landscape. Variations in styles, colors, and patterns make it difficult to consistently categorize clothing. Additionally, the quality of images provided by users varies widely, and background clutter can further complicate the task of accurate classification. Existing systems may struggle to provide the level of accuracy needed to meet customer expectations. To address these challenges, a meticulous dataset preparation process is essential. This includes careful data organization, the application of background removal techniques such as the GrabCut Algorithm, and resizing images for uniformity. The proposed solution involves a hybrid approach, combining the strengths of the ResNet152 and EfficientNetB7 architectures. This fusion of techniques aims to create a classification system capable of reliably distinguishing between various clothing items. The key innovation in this study is the development of a Two-Objective Learning model that leverages the capabilities of both ResNet152 and EfficientNetB7 architectures. This fusion approach enhances the accuracy of clothing item classification. The meticulously prepared dataset serves as the foundation for this model, ensuring that it can handle diverse clothing items effectively. The proposed methodology promises a novel approach to image identification and feature extraction, leading to impressive classification accuracy of 94%, coupled with stability and robustness

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Cross-modal Spectral Fusion Model for Referring Video Object Segmentation

    No full text
    Referring Video Object Segmentation (R-VOS) demands precise visual comprehension and sophisticated cross-modal reasoning to segment objects in videos based on descriptions from natural language. Addressing this challenge, we introduce the Cross-modal Spectral Fusion Model (CSF). Our model incorporates a Multi-Scale Spectral Fusion Module (MSFM), which facilitates robust global interactions between the modalities, and a Consensus Fusion Module (CFM) that dynamically balances multiple prediction vectors based on text features and spectral cues for accurate mask generation. Additionally, the Dual-stream Mask Decoder (DMD) enhances the segmentation accuracy by capturing both local and global information through parallel processing. Tested on three datasets, CSF surpasses existing methods in R-VOS, proving its efficacy and potential for advanced video understanding tasks.This work is supported by the National Natural Science Foundation of China (No. 91748107), the Special Research Fund (BOF) of Hasselt University (No. BOF23DOCBL11), the Guangdong Innovative Research Team Program (No. 2014ZT05G157). Chen Junhong was sponsored by the China Scholarship Council (No. 202208440309)

    SMVT: Spectrum-Driven Multi-scale Vision Transformer for Referring Image Segmentation

    No full text
    Referring image segmentation is a challenging task at the intersection of computer vision and natural language processing, aiming to segment out an object referred to by a natural language expression from an image. Recently despite significant progress on this task, existing methods still face challenges in effectively integrating visual and language information and enhancing the model's ability to capture fine-grained details within images. These challenges primarily originate from a lack of a mechanism capable of deeply and comprehensively fusing visual features with language features and effectively utilizing cross-modal features. To address these problems, we propose the Spectrum-driven Multi-scale Visual Transformer (SMVT), which incorporates two innovative designs: Spectrum-driven Fusion Attention (SFA) and the Cross-modal Feature Refinement Enhancement (CFRE) module. SFA, by guiding the fusion of visual and linguistic features at the spectral domain level, effectively captures fine-grained features in images and enhances the model's sensitivity to local spectral domain information , thereby responding more accurately to the detail requirements in language descriptions. CFRE module, by refining and enhancing cross-modal features at different layers, enhances the complementarity and the ability to capture fine-grained cross-modal features across different layers, promoting the precise alignment of visual and language features. These two modules enable the SMVT to more effectively process visual and language information. Experiments on three benchmark datasets have shown that our method surpasses state-of-the-art approaches

    3D-HRFC: 3D-Aware Image Generation at High Resolution with Faster Convergence

    No full text
    Learning 3D-aware generators from 2D image collections has attracted significant attention in the field of generative modeling. However, there are several challenges in generating high-resolution multi-view consistent images, e.g., 2D CNN-based approaches leverage upsampling layers to generate high-resolution images, easily resulting in inconsistencies across multi-view images; methods that generate images based on NeRF require tremendous memory space and a long time to converge. To this end, we propose a novel 3D-aware generative method named 3D-HRFC to generate high-resolution consistent images with faster convergence. Specifically, we first propose a depth fusion based super-resolution module that integrates the depth maps into the low-resolution images in order to generate consistent multi-view images. And then a skip super-resolution module is devised to enhance the generation of the high-resolution images. To generate high-resolution consistent images and accelerate the model convergence, we devise a composite loss function that consists of adversarial loss, super-resolution loss, and content consistency. Extensive experiments conducted on FFHQ and AFHQ-v2 Cats datasets illustrate that our proposed method can generate high-quality 3D-consistent images

    DistillGrasp: Integrating Features Correlation with Knowledge Distillation for Depth Completion of Transparent Objects

    No full text
    Due to the visual properties of reflection and refraction, RGB-D cameras cannot accurately capture the depth of transparent objects, leading to incomplete depth maps. To fill in the missing points, recent studies tend to explore new visual features and design complex networks to reconstruct the depth, however, these approaches tremendously increase computation, and the correlation of different visual features remains a problem. To this end, we propose an efficient depth completion network named DistillGrasp which distillates knowledge from the teacher branch to the student branch. Specifically, in the teacher branch, we design a position correlation block (PCB) that leverages RGB images as the query and key to search for the corresponding values, guiding the model to establish correct correspondence between two features and transfer it to the transparent areas. For the student branch, we propose a consistent feature correlation module (CFCM) that retains the reliable regions of RGB images and depth maps respectively according to the consistency and adopts a CNN to capture the pairwise relationship for depth completion. To avoid the student branch only learning regional features from the teacher branch, we devise a distillation loss that not only considers the distance loss but also the object structure and edge information. Extensive experiments conducted on the ClearGrasp dataset manifest that our teacher network outperforms state-of-the-art methods in terms of accuracy and generalization, and the student network achieves competitive results with a higher speed of 48 FPS. In addition, the significant improvement in a real-world robotic grasping system illustrates the effectiveness and robustness of our proposed system

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore