1,721,058 research outputs found

    Exploring the Synthetic Speech Attribution Problem Through Data-Driven Detectors

    No full text
    In recent years, numerous techniques to manipulate multimedia data and generate hyper-realistic synthetic content have been presented. These inauthentic data are hazardous as they can lead to numerous threats and dangers when misused. This has led the forensic community to propose multiple approaches to tackle both detection and attribution problems. Solving the detection problem consists in determining whether some given data is genuine or false. Solving the attribution problem consists in determining which specific technique has been used to manipulate or generate the observed data. In this paper we address the attribution problem on synthetic speech. We consider a set of methods initially proposed for synthetic speech detection, and adapt them to identify which speech generation algorithm has been used to synthesize a speech track. Our goal is to sample the versatility of these systems and verify how far the detection and attribution tasks are from each other. We test the models in a closed-set scenario and compare their performance with that of a well-established baseline. Moreover, we propose different solutions to address the task in an open-set situation. The encouraging results show that the considered methods can provide a representation of the input signal that is meaningful for both detection and attribution

    All-for-One and One-for-All: Deep Learning-Based Feature Fusion for Synthetic Speech Detection

    No full text
    Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever, leading to possible threats and dangers from malicious users. In the audio field, we are witnessing the growth of speech deepfake generation techniques, which solicit the development of synthetic speech detection algorithms to counter possible mischievous uses such as frauds or identity thefts. In this paper, we consider three different feature sets proposed in the literature for the synthetic speech detection task and present a model that fuses them, achieving overall better performances with respect to the state-of-the-art solutions. The system was tested on different scenarios and datasets to prove its robustness to anti-forensic attacks and its generalization capabilities

    Time Scaling Detection and Estimation in Audio Recordings

    No full text
    The widespread diffusion of user friendly editing software for audio signals has made audio tampering extremely accessible to anyone. Therefore, it is increasingly necessary to develop forensic methodologies aiming at verifying if a given audio content has been digitally manipulated or not. Among the multiple available audio editing techniques, a very common one is time scaling, i.e., altering the temporal evolution of an audio signal without affecting any pitch component. For instance, this can be used to slow-down or speed-up speech recordings, thus enabling the creation of natural sounding fake speech compositions. In this work, we propose to blindly detect and estimate the time scaling applied to an audio signal. To expose time scaling, we leverage a Convolutional Neural Network that analyzes the Log-Mel Spectrogram and the phase of the Short Time Fourier Transform of the input audio signal. The proposed technique is tested on different audio datasets, considering various time scaling implementations and challenging cross test scenarios

    Multimodal Violence Detection in Videos

    Full text link
    Effective tools for detection of violence are highly demanded, specially when dealing with video streams. Such tools have a wide range of applications, from forensics and law enforcement to parental control over the ever increasing amount of videos available online. Prior studies showed that deep learning has great potential in detecting violence, but focuses on detecting violence in general, or only specific cases of violent behavior. While the concept of violence is broad and highly subjective, simpler concepts such as fights, explosions, and gunshots, convey the idea of violence while being more objective. Even though different concepts relate to this same broader idea of violence, they differ widely in relation to whether or not they convey the idea of movement, the presence of a specific object, or even if they generate distinctive sounds. In this study, we propose to analyze different concepts related to violence and how to better describe these concepts exploring visual and auditory cues in order to reach a robust method to detect violence

    Identification and recognition of landmine internal structure scattering contribution from GPR data

    No full text
    The aim of the study was to quantify the potential increase in the information level produced by an increase in the data dimensionality, i.e. from analysing a 1D signature to the investigation of a 3D GPR volume. The experimental campaign was carried out employing two different neutralised landmines, characterised by a different internal structure and buried in controlled conditions. Obviously, the acquisition of a single monodimensional signature of the target has the advantage of being almost effortless, but shows significant limitations in achieving adequate performance, in particular for landmines showing an irregular internal structure. This is a consequence of the impossibility of effectively separating the different scattering contribution. As well, despite producing a clearer and more intuitive image of the target, a single 2D profile is not able to provide reliable performance, hence there is little benefit in acquiring a 2D profile as it still suffers from not producing unambiguous results. The analysis of a 3D volume, instead, allows for an accurate delineation of the internal structure of the target, providing a reliable solution to the complex target design critical issue
    corecore