1,721,313 research outputs found

    Automatic trademark detection and recognition in sport videos

    No full text
    In this paper we describe a system for automatic detection and recognition of trademarks in sports videos. We propose a compact representation of trademarks based on SIFT feature points and a matching algorithm to robustly detect and retrieve trademarks in a variety of different sports video types. Trademark localization is performed through robust clustering of matched feature points in the video frame. A supervised machine learning approach is used to automatically adapt the similarity threshold used to assess the trademark matches. Experimental results are provided, along with an analysis of the precision and recall. Results show that our proposed technique is efficient and effectively detects and classifies trademarks

    Automatic detection and recognition of players in soccer videos

    No full text
    An application for content-based annotation and retrieval of videos can be found in the sport domain, where videos are annotated in order to produce short summaries for news and sports programmes, edited reusing the video clips that show important highlights and the players involved in them. The problem of detecting and recognizing faces in broadcast videos is a widely studied topic. However, in the case of sports videos in general, and soccer videos in particular, the current techniques are not suitable for the task of face detection and recognition, due to the high variations in pose, illumination, scale and occlusion that may happen in an uncontrolled environment. In this paper we present a method for face detection and recognition, with associated metric, that copes with these problems. The face detection algorithm adds a filtering stage to the Viola and Jones Adaboost detector, while the recognition algorithm exploits i) local features to describe a face, without requiring a precise localization of the distinguishing parts of a face, and ii) the set of poses to describe a person and perform a more robust recognition

    Trademark matching and retrieval in sports video databases

    No full text
    In this paper we describe a system for detection and retrieval of trademarks appearing in sports videos. We propose a compact representation of trademarks and video frame content based on SIFT feature points. This representation can be used to robustly detect, localize, and retrieve trademarks as they appear in a variety of different sports video types. Classification of trademarks is performed by matching a set of SIFT feature descriptors for each trademark instance against the set of SIFT features detected in each frame of the video. Localization is performed through robust clustering of matched feature points in the video frame. Experimental results are provided, along with an analysis of the precision and recall. Results show that the our proposed technique is efficient and effectively detects and classifies trademarks

    Soccer players identification based on visual local features

    No full text
    Semantic detection and recognition of objects and events contained in a video stream has to be performed in order to provide content-based annotation and retrieval of videos. This annotation is done as a means to be able to reuse the video material at a later stage, e.g. to produce new TV programmes. A typical example is that of sports videos, where videos are annotated in order to reuse the video clips that show key highlights and players to produce short summaries for news and sports programmes. In order to select the most interesting actions among all the possibly detected highlights further analysis is required; i.e. the shots that contain a key action are typically followed by close-ups of the players that take part in the action. Therefore the automatic identification of these players would add considerable value both to the annotation and retrieval of the key highlights and key players of a sport event. The problem of detecting and recognizing faces in broadcast videos is a widely studied topic. However, in the case of soccer videos, and sports videos in general, the current techniques are not suitable for the task of face recognition, due to the high variations in pose, illumination, scale and occlusion that may happen in an uncontrolled environment. In this paper a method that copes with these problems, exploiting local features to describe a face, without requiring a precise localization of the distinguishing parts of a face, and the set of poses to describe a person and perform a more robust recognition, is presented. A similarity metric based on the number of matched interest points, able to cope with different face sizes, is also presented and experimentally validated

    MICC-UNIFI at ImageCLEF 2013 Scalable Concept Image Annotation

    No full text
    In this paper we report on MICC participation to the Scalable Concept Image Annotation subtask of the ImageCLEF Photo An- notation and Retrieval competition. Our goal has been to investigate the applicability of data-driven methods that have obtained good results in the field of social image annotation and retrieval to web images. These methods have been applied typically to tasks such as tag ranking, tag suggestion and refinement. Since they do not require a training stage they can be applied in cases in which the set of annotation keywords can vary greatly over time or when the set of images to be analyzed is very large

    Data-driven approaches for social image and video tagging

    No full text
    The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to per-form content classification and c lustering and enable more effective semantic indexing and retrieval of visual data. However there is need to overcome the relatively low quality of these metadata: user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, excessively personalized and limited - and at the same time take into account the ‘web-scale’ quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images, considering also the temporal patterns of their usage, and discuss extensions to tag suggestion and localization in web video sequences

    Video Event Annotation using Ontologies with Temporal Reasoning

    No full text
    Annotation and retrieval tools for multimedia digital libraries have to cope with the complexity of multimedia content. In particular, when dealing with video content, annotation and retrieval tools have to use appropriate knowledge structures that can effectively relate high level concepts to low and mid level visual features and, at the same time, integrate temporal information which is crucial when defining an abstract model for video. In this paper we present a multimedia ontologies that include both linguistic and visual ontology. Moreover provided that appropriate low level descriptors are used to detect simple events, subjects or objects, we propose usage of Semantic Web Rule Language in order to provide a formal definition of complex events based on temporal relations between simple entities. Results for complex event inferencing are shown for the news broadcast domain

    An evaluation of nearest-neighbor methods for tag refinement

    No full text
    The success of media sharing and social networks has led to the availability of extremely large quantities of images that are tagged by users. The need of methods to manage efficiently and effectively the combination of media and metadata poses significant challenges. In particular, automatic image annotation of social images has become an important research topic for the multimedia community. In this paper we propose and thoroughly evaluate the use of nearest-neighbor methods for tag refinement. Extensive and rigorous evaluation using two standard large-scale datasets shows that the performance of these methods is comparable with that of more complex and computationally intensive approaches and that, differently from these latter approaches, nearest-neighbor methods can be applied to web-scale data

    Evaluating temporal information for social image annotation and retrieval

    No full text
    Can we use the temporal evolution of annotations in Web images to improve tasks such as annotation, indexing and retrieval? This important question is the main motivation for this work. Typically visual content, text and metadata, are used to improve these tasks. A characteristic that has received less attention, so far, is the temporal aspect of social media production and tagging. The main contribution of this paper is a thorough analysis of the temporal aspects of two popular datasets commonly used for tasks such as tag ranking, tag suggestion and tag refinement, namely NUS-WIDE and MIR-Flickr-1M. The correlation of the time series of the tags with Google searches shows that for certain concepts web information sources may be beneficial to annotate social media

    Social media annotation

    No full text
    The large success of online social platforms for creation, sharing and tagging of user-generated media has lead to a strong interest by the multimedia and computer vision communities in research on methods and techniques for annotating and searching social media. Visual content similarity, geo-tags and tag co-occurrence, together with social connections and comments, can be exploited to perform tag suggestion as well as to perform content classification and clustering and enable more effective semantic indexing and retrieval of visual data. However there is need to countervail the relatively low quality of these metadata user produced tags and annotations are known to be ambiguous, imprecise and/or incomplete, overly personalized and limited - and at the same time take into account the 'web-scale' quantity of media and the fact that social network users continuously add new images and create new terms. We will review the state of the art approaches to automatic annotation and tag refinement for social images and discuss extensions to tag suggestion and localization in web video sequences
    corecore