1,720,971 research outputs found
Deep Understanding of Shopper Behaviours and Interactions in Intelligent Retail Environment
In ambienti retail comprendere come il consumatore si muove nello spazio e interagisce con i prodotti risulta essere di notevole interesse. Nonostante l'ambiente retail possegga diverse caratteristiche favorevoli al supporto della computer vision, ad esempio un'illuminazone costante, il vasto numero e la variabilità dei prodotti venduti, così come la potenziale ambiguità dei movimenti del comsumatore, indicano che misurarne il comportamento è tuttora sfidante. Negli anni, tecniche di machine learning e feature-based per il conteggio persone, l'analisi delle interazioni e la re-identificazione sono state sviluppate allo scopo di apprendere il comportamento del consumatore, basandosi su camere RGB-D in configurazione top-view. Tuttavia dall'avvento dei big data gli approcci machine learning sono evoluti verso approcci deep learning, che risultano essere un mezzo più potente ed efficiente per trattare la complessità del comportamento umano.
Partendo da questa premessa questa tesi tratta l'evoluzione di 3 sistemi reali quali: People Counting, Shopper Analytics e Re-Identification. L'obbiettivo principale è quello di sviluppare architetture deep learning progettate specificatamente per ambito retail. A questo scopo un nuovo VRAI deep learning framework viene descritto. In particolare utilizza 3 reti neurali convoluzionali (CNN) per contare il numero di persone che passano o si fermano nell'area coperta dalla camera, effettuare una re-identificazione top-view e misurare le interazioni consumatore-scaffale da un singolo flusso RGBD con performance quasi real-time.
Il VRAI framework è stato poi valutato su 3 nuovi dataset resi pubblici: TVHeads per il conteggio persone, HaDa per l'analisi delle interazioni consumatore-scaffale e TVPR2 per la re-identificazione.In retail environments, understanding how shoppers move in the store’s spaces and interact with products is very valuable. While the retail environment has several favourable characteristics that support computer vision, such as reasonable lighting, the large number and diversity of products sold, as well as the potential ambiguity of shoppers’ movements, mean that accurately measuring shopper behaviour is still challenging. Over the past years, machine-learning and feature-based tools for people counting as well as interactions analytics and re-identification were developed with the aim of learning shopper behaviors based on occlusion-free RGB-D cameras in a top-view configuration. However,after moving into the era of multimedia big data, machine-learning approaches evolved into deep learning approaches, which are a more powerful and efficient way of dealing with the complexities of human behaviour.
Starting from such a premise, this thesis addresses the evolution process of 3 real systems such as: People Counting, Shopper Analytics and Re-Identification. The main goal is to develop Deep Learning architectures especially designed for Retail Environment. For this purpose, a novel VRAI deep learning framework is described. In particular, it uses 3 Convolutional Neural Networks (CNNs) to count the number of people passing or stopping in the camera area, perform top-view re-identification and measure shopper-shelf interactions from a single RGB-D video flow with near real-time performances.
The VRAI framework is evaluated on the following 3 new datasets that are publicly available: TVHeads for people counting, HaDa for shopper-shelf interactions and TVPR2 for people re-identification
Embedded Vision System for Real-Time Shelves Rows Detection for Planogram Compliance Check
In retail environment monitor store shelves is a key factor for retailers and brands to provide the best customer shopping experience and maximize sales. Computer vision and deep learning are well suitable this task and are already used for detection and recognition of products displayed in shelves. Recently, retailers started using autonomous robotic applications for monitoring store shelves. Commercial robotic solutions for full store inventory are rising on the market, equipped with Radio Frequency Identification (RFID) technology or vision-based systems. Such robots usually browse the store fulfilling a task regarding inventory or planogram compliance check. Detect and recognize product on a shelf, however, is not enough to have a proper picture of a shelf. Physical structure of the shelf must be taken into account in order to assign every detected product to its shelf row. Know exactly in which shelf row a product is displayed enables the calculation of some specific Key Performance Indicators (KPIs) such as the Share of Shelf. In this paper after analyzing the techniques currently used in the state-of-the-art, we realized that there is no reliable and lightweight solution to detect shelf rows, so we provide an end-to-end solution for that and we prove the feasibility of the approach on a newly collected datase
Shelf Management: A deep learning-based system for shelf visual monitoring
Shelf monitoring plays a key role in optimizing retail shelf layout, enhancing the customer shopping experience and maximizing profit margins. The process of automating shelf audit involves the detection, localization and recognition of objects on store shelves, including diverse products with varying attributes in unconstrained environments. This facilitates the assessment of planogram compliance. Accurate product localization within shelves requires the identification of specific shelf rows. To address the current technological challenges, we introduce “Shelf Management”, a deep learning-based system that is carefully tailored to redesign shelf monitoring practices. Our system can navigate the complexities of shelf monitoring by using advanced deep learning techniques and object detection and recognition models. In addition, a complex semantic module enhances the accuracy of detecting and assigning products to their designated shelf rows and locations. In particular, we recognize the lack of finely annotated datasets at the SKU level. As a contribution to the field, we provide annotations for two novel datasets: SHARD (SHelf mAnagement Row Dataset) and SHAPE (SHelf mAnagement Product dataset). These datasets not only provide valuable resources, but also serve as benchmarks for further research in the field of retail. A complete pipeline is designed using a RetinaNet architecture for object detection with 0.752 mAP, followed by a Deep Hough transform to detect shelf rows as semantic lines with an F1 score of 97%, and a product recognition step using a MobileNetV3 architecture trained with triplet loss and used as a feature extractor together with FAISS for fast image retrieval with an accuracy of 93% on top-1 recognition. Localization is achieved using a deterministic approach based on product detection and shelf row detection. Source code and datasets are available at https://github.com/rokopi-byte/shelf_management
GREEN PATH: an expert system for space planning and design by the generation of human trajectories
Public space is usually conceived as where people live, perceive, and interact with other people. The environment affects people in several different ways as well. The impact of environmental problems on humans is significant, affecting all human activities, including health and socio-economic development. Thus, there is a need to rethink how space is used. Dealing with the important needs raised by climate emergency, pandemic and digitization, the contributions of this paper consist in the creation of opportunities for developing generative approaches to space design and utilization. It is proposed GREEN PATH, an intelligent expert system for space planning. GREEN PATH uses human trajectories and deep learning methods to analyse and understand human behaviour for offering insights to layout designers. In particular, a Generative Adversarial Imitation Learning (GAIL) framework hybridised with classical reinforcement learning methods is proposed. An example of the classical reinforcement learning method used is continuous penalties, which allow us to model the shape of the trajectories and insert a bias, which is necessary for the generation, into the training. The structure of the framework and the formalisation of the problem to be solved allow for the evaluation of the results in terms of generation and prediction. The use case is a chosen retail domain that will serve as a demonstrator for optimising the layout environment and improving the shopping experience. Experiments were assessed on shoppers' trajectories obtained from four different stores, considering two years
Preterm infants’ limb-pose estimation from depth images using convolutional neural networks
Preterm infants' limb-pose estimation is a crucial but challenging task, which may improve patients' care and facilitate clinicians in infant's movements monitoring. Work in the literature either provides approaches to whole-body segmentation and tracking, which, however, has poor clinical value, or retrieve a posteriori limb pose from limb segmentation, increasing computational costs and introducing inaccuracy sources. In this paper, we address the problem of limb-pose estimation under a different point of view. We proposed a 2D fully-convolutional neural network for roughly detecting limb joints and joint connections, followed by a regression convolutional neural network for accurate joint and joint-connection position estimation. Joints from the same limb are then connected with a maximum bipartite matching approach. Our analysis does not require any prior modeling of infants' body structure, neither any manual interventions. For developing and testing the proposed approach, we built a dataset of four videos (video length = 90 s) recorded with a depth sensor in a neonatal intensive care unit (NICU) during the actual clinical practice, achieving median root mean square distance [pixels] of 10.790 (right arm), 10.542 (left arm), 8.294 (right leg), 11.270 (left leg) with respect to the groundtruth limb pose. The idea of estimating limb pose directly from depth images may represent a future paradigm for addressing the problem of preterm-infants' movement monitoring and offer all possible support to clinicians in NICUs
The babyPose dataset
The database here described contains data relevant to preterm infants' movement acquired in neonatal intensive care units (NICUs). The data consists of 16 depth videos recorded during the actual clinical practice. Each video consists of 1000 frames (i.e., 100s). The dataset was acquired at the NICU of the Salesi Hospital, Ancona (Italy). Each frame was annotated with the limb-joint location. Twelve joints were annotated, i.e., left and right shoul- der, elbow, wrist, hip, knee and ankle. The database is freely accessible at http: //doi.org/10.5281/zenodo.3891404. This dataset represents a unique resource for artificial intelligence researchers that want to develop algorithms to provide healthcare professionals working in NICUs with decision support. Hence, the babyPose dataset is the first annotated dataset of depth images relevant to preterm infants' movement analysis
Social4Fashion: An intelligent expert system for forecasting fashion trends from social media contents
The fashion field is continually expanding and evolving, and social media play a significant role in shaping current fashion trends through the influence of online personalities, such as influencers. As a result, fashion designers often turn to social media to gain insights into the latest trends and draw inspiration, while in the past they used to physically visit fashion districts. To automate and speed up this process, an expert system is much needed; thus, Social4Fashion has been created, an end-to-end framework that leverages deep learningbased techniques in order to support creatives in their research and decision-making process, with the final goal of analyzing and predicting trends. This system employs several steps, starting with the automatic data collection from Instagram, using hashtags provided by domain experts. Next, retrieved images are filtered to remove non-fashion related pictures, leaving only those pertaining to the fashion area for further processing. Then, to obtain more specific information about the images, the handbags present (if any) are detected and classified, based on their type; finally, dominant colors of the handbags are retrieved through clustering on the images. All the data collected with this system are then stored and analyzed via user-friendly dashboards, created with the objective of highlighting relevant information, in order to perform analysis on current and future fashion trends. Results show the effectiveness of the proposed system, with an accuracy of 97% (95% confidence interval 0.95-1) for the fashion image classification and a mAP of 0.77 (95% confidence interval 0.73-0.82) for the handbag detection, which makes it suitable for fashion domain analysis. Also, as a result of this work, a novel fashion-related dataset has been made available to the research community. This system can greatly improve the way fashion trends are analyzed, and allow for more efficient and effective design processes in the future
COIGAN: Controllable Object Inpainting Through Generative Adversarial Network for Defect Synthesis in Data Augmentation
Predictive maintenance is a key aspect for the safety of critical infrastructure such as bridges, dams, and tunnels, where a failure can lead to catastrophic outcomes in terms of human lives and costs. The surge in Artificial Intelligence-driven visual robotic inspection methods necessitates high-quality datasets containing diverse defect classes with several instances on different conditions (e.g., material, illumination). In this context, we introduce a Controllable Object Inpainting Generative Adversarial Network (COIGAN) to synthetically generate realistic images that augment defect datasets. The effectiveness of the model is quantitatively validated by a Fréchet Inception Distance, which measures the similarity between the generated and training samples. To further evaluate the impact of COIGAN-generated images, a segmentation task was conducted, utilizing key performance metrics such as segmentation accuracy, mAP, mIoU, and F1 score, demonstrating that the synthetic images integrate seamlessly and produce results comparable to real defect images. Subsequently, COIGAN generability was successfully used for the segmentation of a defect-free dataset by inpainting defects. The results showcase COIGAN's ability to learn defect patterns and apply them in new contexts, preserving the original features of the base image and allowing the creation of new datasets with a desired multi-class distribution. Specifically, in the context of predictive maintenance, COIGAN enriches datasets, enabling deep learning models to more effectively identify potential infrastructure anomalies. Project page: https://bit.ly/4bzxwqf
Edge-AI for Buoy Detection and Mussel Farming: A Comparative Study of YOLO Frameworks
Despite advancements in computer vision technologies, maritime environments continue to pose significant challenges. Varying weather conditions, dynamic water surfaces, and the presence of both large and small objects hamper object detection and tracking and, more in general, the development of robust AI solutions for maritime industry. Addressing this concern, we propose a lightweight deep learning approach for robust environment monitoring and, in particular, tailored for the detection of buoys as those used for offshore submerged mussel farming long-lines. Such industrial applications are still challenging as, due to unstable internet connectivity, autonomous and efficient object detection cannot rely on external resources. Our model, built and benchmarked upon several You Only Look Once (YOLO) frameworks coupled with horizon line segmentation, leverages both custom and open access data and is tested for deployment on edge device for practical demonstration. The proposed method uses Deep Hough Transform to determine the maritime horizon line and exclude far-off objects/land, enhancing the system’s robustness to false positives. YOLOv3, YOLOv4, YOLOv5, and YOLOv8 and their variants, were tested and evaluated based on several performance and efficiency metrics. Preliminary findings indicate that YOLOv8-Nano was particularly effective, demonstrating high computational efficiency (7.2 GFLOPs) and real-time inference at 24.1 fps on an NVIDIA Jetson Nano, with a mAP of 69.10 and achieving an optimal trade-off between efficiency and accuracy. Such enhanced object detection capabilities could substantially benefit the maritime industry, significantly improving operational safety and reducing the risk of economic losses and environmental damage
- …
