Search CORE

1,721,139 research outputs found

Computational methods for a class of constrained rate-distortion functions

Author: Serra Giuseppe
Publication venue
Publication date: 2024
Field of study

Learning rules for semantic video event annotation

Author: SERRA GIUSEPPE
DEL BIMBO A
BERTINI M
Publication venue
Publication date: 01/01/2008
Field of study

Automatic semantic annotation of video events has received a large attention from the scientific community in the latest years, since event recognition is an important task in many applications. Events can be defined by spatio-temporal relations and properties of objects and entities, that change over time; some events can be described by a set of patterns. In this paper we present a framework for semantic video event annotation that exploits an ontology model, referred to as Pictorially Enriched Ontology, and ontology reasoning based on rules. The proposed ontology model includes: high-level concepts, concept properties and concept relations, used to define the semantic context of the examined domain; concept instances, with their visual descriptors, enrich the video semantic annotation. The ontology is defined using the Web Ontology Language (OWL) standard. Events are recognized using patterns defined using rules, that take into account high-level concepts and concept instances. In our approach we propose an adaptation of the First Order Inductive Learner (FOIL) technique to the Semantic Web Rule Language (SWRL) standard to learn rules. We validate our approach on the TRECVID 2005 broadcast news collection, to detect events related to airplanes, such as taxiing, flying, landing and taking off. The promising experimental performance demonstrates the effectiveness of the proposed framework

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Wearable Vision for Retrieving Architectural Details in Augmented Tourist Experiences

Author: Rita Cucchiara
SERRA Giuseppe
Alletto Stefano
Publication venue
Publication date: 01/01/2015
Field of study

The interest in cultural cities is in constant growth, and so is the demand for new multimedia tools and applications that enrich their fruition. In this paper we propose an egocentric vision system to enhance tourists' cultural heritage experience. Exploiting a wearable board and a glass-mounted camera, the visitor can retrieve architectural details of the historical building he is observing and receive related multimedia contents. To obtain an effective retrieval procedure we propose a visual descriptor based on the covariance of local features. Differently than the common Bag of Words approaches our feature vector does not rely on a generated visual vocabulary, removing the dependence from a specific dataset and obtaining a reduction of the computational cost. 3D modeling is used to achieve a precise visitor's localization that allows browsing visible relevant details that the user may otherwise miss. Experimental results conducted on a publicly available cultural heritage dataset show that the proposed feature descriptor outperforms Bag of Words techniques

Archivio istituzionale della ricerca - Università degli Studi di Udine

Computation of rate-distortion-perception function under f-divergence perception constraints

Author: Serra Giuseppe; Stavrou, Photios A.; Kountouris, Marios
Publication venue
Publication date: 2023
Field of study

EURECOM Repository

On the computation of the Gaussian rate-distortion-perception function

Author: Serra Giuseppe; Stavrou, Photios A.; Kountouris, Marios
Publication venue
Publication date: 2024
Field of study

EURECOM Repository

On the rate-distortion-perception function for Gaussian processes

Author: Serra Giuseppe; Stavrou, Photios A.; Kountouris, Marios
Publication venue
Publication date: 2025
Field of study

EURECOM Repository

Learning Video Retrieval Models with Relevance-Aware Online Mining

Author: Falcon Alex
Serra Giuseppe
Lanz Oswald
Publication venue
Publication date: 01/01/2022
Field of study

Due to the amount of videos and related captions uploaded every hour, deep learning-based solutions for cross-modal video retrieval are attracting more and more attention. A typical approach consists in learning a joint text-video embedding space, where the similarity of a video and its associated caption is maximized, whereas a lower similarity is enforced with all the other captions, called negatives. This approach assumes that only the video and caption pairs in the dataset are valid, but different captions - positives - may also describe its visual contents, hence some of them may be wrongly penalized. To address this shortcoming, we propose the Relevance-Aware Negatives and Positives mining (RANP) which, based on the semantics of the negatives, improves their selection while also increasing the similarity of other valid positives. We explore the influence of these techniques on two video-text datasets: EPIC-Kitchens-100 and MSR-VTT. By using the proposed techniques, we achieve considerable improvements in terms of nDCG and mAP, leading to state-of-the-art results, e.g. +5.3% nDCG and +3.0% mAP on EPIC-Kitchens-100. We share code and pretrained models at https://github.com/aranciokov/ranp

Archivio della ricerca - Fondazione Bruno Kessler

Computation of the multivariate Gaussian rate-distortion-perception function

Author: Serra Giuseppe; Stavrou, Photios A.; Kountouris, Marios
Publication venue
Publication date: 2024
Field of study

EURECOM Repository

Wearable Vision for Retrieving Architectural Details in Augmented Tourist Experiences

Author: SERRA GIUSEPPE
CUCCHIARA Rita
ALLETTO STEFANO
Publication venue
Publication date: 01/01/2015
Field of study

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Copula-based estimation of continuous sources for a class of constrained rate-distortion-functions

Author: Serra Giuseppe; Stavrou, Photios A.; Kountouris, Marios
Publication venue
Publication date: 2024
Field of study

EURECOM Repository