1,721,015 research outputs found

    Image and video compression

    No full text

    Image and video compression

    No full text

    Prediction and Optimization of Speech Intelligibility in Adverse Conditions

    No full text
    In digital speech-communication systems like mobile phones, public address systems and hearing aids, conveying the message is one of the most important goals. This can be challenging since the intelligibility of the speech may be harmed at various stages before, during and after the transmission process from sender to receiver. Causes which create such adverse conditions include background noise, an unreliable internet connection during a Skype conversation or a hearing impairment of the receiver. To overcome this, many speech-communication systems include speech processing algorithms to compensate for these signal degradations like noise reduction. To determine the effect on speech intelligibility of these signal processing based solutions, the speech signal has to be evaluated by means of a listening test with human listeners. However, such tests are costly and time consuming. As an alternative, reliable and fast machine-driven intelligibility predictors are of interest, since they might replace listening tests, at least in some stages of the algorithm development process. Two important issues exist with current intelligibility predictors. (1) Many of these methods cannot reliably predict the effect of more advanced nonlinear signal processing algorithms on speech intelligibility. (2) Typically, these measures are based on very complex auditory models or use average statistics of minutes of running speech, which makes it difficult on how to design new (real-time) speech processing solutions in an optimal manner given such a measure. To this end we propose several new measures which show good prediction results with the intelligibility of nonlinear processed speech. The newly proposed measures are of a low computational complexity and mathematically tractable which make them suitable for optimization of new signal processing solutions which aim for improving speech intelligibility.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Resilient video coding for wireless and peer-to-peer networks

    No full text
    Electrical Engineering, Mathematics and Computer Scienc

    Quantization-based watermarking: Methods for amplitude scale estimation, security, and linear filtering invariance

    No full text
    Electrical Engineering, Mathematics and Computer Scienc

    Modeling Audio Fingerprints: Structure, Distortion, Capacity

    No full text
    An audio fingerprint is a compact low-level representation of a multimedia signal. An audio fingerprint can be used to identify audio files or fragments in a reliable way. The use of audio fingerprints for identification consists of two phases. In the enrollment phase known content is fingerprinted, and ingested into a database, together with all relevant metadata. In the identification phase, unknown audio content is fingerprinted, and the fingerprints form the query to the database. The query fingerprint is compared to the fingerprints in the database. If a similar fingerprint is found in the database, the relevant metadata corresponding to the fingerprint is returned. In this thesis we develop models for audio fingerprints. The emphasis here is on fingerprint extraction and the properties of the fingerprint, not on matching the query fingerprint to the fingerprints in the database, and the actual identification. We also do not develop new practical fingerprinting algorithms. There is a wide variety of applications for audio fingerprinting, including broadcast monitoring, audience measurement, forensic applications, blacklisting of unauthorized content, 'name that tune' services and linking of special offers to television or radio commercials. Content which uses the same recorded source material, but which is in different representation, or distorted in different ways, will generate similar audio fingerprints. This distinguishes audio fingerprints from hashes and content-based retrieval. The hash of an audio file changes when one sample changes. Two perceptually equal audio items can have completely different hash values, but will generate similar fingerprints. Content-based retrieval looks for audio items which apply to a similar concept, like the same genre, artist or style, while fingerprinting looks for the reuse of the recorded content. Of course, the exact requirements for a fingerprinting system strongly depend on the application. Relevant aspects for the topics discussed in this thesis are the robustness, uniqueness, accuracy (notably the False Acceptance Rate and False Rejection Rate), granularity and the size of the fingerprints. In this thesis we make three contributions in the form of models. First, we model the structure of a particular type of audio fingerprint, the Philips Robust Hash (PRH). The PRH fingerprint extracts a series of spectral energy related features from the audio signal, which are represented efficiently but coarsely as a binary time-series. The time-series captures the temporal and spectral dynamics of the audio signal, and has a very particular structure mainly depending on a limited number of parameters in the fingerprint extraction. The model describes the structure of the PRH as a function of a number of parameters. It can be used for better understanding and potentially optimization of the fingerprinting system. We experimentally verify the model on synthetic Gaussian iid data, and conclude that the model capture the structure of the PRH fingerprint well. This analysis was reformulated and extended by Balado, Hurley, McCarthy and Silvestre. Second, we observe that distortions in the audio are reflected in changes in the corresponding fingerprint. This kind of distortion affects the quality of the audio signal and changes the resulting fingerprint. The idea is to estimate the amount of distortion on the audio signal by comparing the corresponding fingerprint to a reference fingerprint extracted from a high quality copy of the same audio. In this way one could extend the functionality of a fingerprinting system. We implement and compare the behaviour of a number of algorithms from literature, and observe similar behaviour of the distance between corresponding fingerprints due to compression. We model the effect of particular distortions in the audio due to compression or additive white noise on the difference introduced in the PRH fingerprints. The main result of our modeling effort is a closed form relation between Signal-to-Noise Ratio (SNR) and average fingerprint distance for PRH audio fingerprints of independent identically distributed (iid) signals. We also experimentally verify the developed models. The model fits perfectly for synthetic signals, and captures the behavior observed in a wider variety of fingerprinting algorithms on actual music. Third, we consider an information theoretical framework developed by Westover and O'Sullivan (WOS). The main question is `how many signals can be identified by a fingerprinting system, under certain conditions'. The conditions relate to characteristics of the fingerprint (size of the fingerprint, and representation of the fingerprint), and characteristics of the environment in which the system operates (representation and statistical characteristics of the signals that need to be identified, how much distortion is allowed). We use the results of the model developed for the PRH fingerprint to compare to estimate up to how many signals can be identified with a binary fingerprint like the PRH. Finally, we check whether the changes in the fingerprints we observe in practice due to distortions in the audio signals, and which have been modeled in this thesis, fit in the information theoretical framework of the WOS model. We outline the differences in the WOS-model compared to practical implementations. We finish with a list of recommendations on extending the models to take jointly consider distortion and uniqueness characteristics; to take more distortion types into account, and to extend to images and video; to develop an evaluation framework for audio fingerprinting; to integrate psycho-acoustics; and to develop a theoretical framework for comparing specific algorithms to the capacity bound.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Private Computing with Untrustworthy Proxies

    No full text
    The objective of this thesis is to preserve privacy for the user while untrustworthy proxies are involved in the communication and computation i.e. private computing. A basic example of private computing is an access control system (proxy) which grants access (or not) to users based on fingerprints. For privacy reasons the user does not want to reveal his fingerprint to the system, since he does not trust the system in storing his fingerprint securely. The system uses a mechanism to compare a new fingerprint with previously collected fingerprints, in order to verify the identity of the user. The challenge is that fingerprints, even if they are from the same user, are never exactly equal like passwords are. This makes fingerprints hard to compare, especially when the system should not learn anything from these fingerprints other than if they are equal or not. This thesis addresses two problems within private computing. First, the problem of letting an untrustworthy proxy collect private information from various sources is investigated. The challenge is to let the untrustworthy proxy perform the collection of the selected information, while guaranteeing confidentiality of the inputs and outputs. Second, the problem of letting an untrustworthy proxy compare the collected private information is addressed. The challenge is to let the untrustworthy proxy compute a comparison function without being able to learn the actual inputs, but being allowed to learn the outcome of the function. The problem is similar to the Millionaires' problem known from Multi-Party Computation, however in the private computing case the untrustworthy proxy learns the outcome of the computation without having to inform the users. For the selection and collection problem two approaches are addressed. First, the parallel selection and collection approach is considered whereby an untrustworthy proxy collects information simultaneously from various sources without loosing the users privacy. The problem is presented within a location-based services (LBS) scenario with the goal to protect private location data. The solution is based on two distinct oblivious transfers and the usage of homomorphic encryption. Second, the sequential selection and collection approach is considered where information is collected from various sources based on a fixed itinerary before returning with the results to the proxy. The solution is provided using threshold signature schemes and hash chaining. Furthermore, a mechanism is constructed which ensures that the itinerary is completed even if one of the sources is unavailable. Two approaches are addressed for the comparison problem. First, a single comparison is undertaken, where the untrustworthy proxy computes one inequality function. The solution is to use a bit-wise comparison protocol and reconstruct it in such a way that the proxy leaks one bit of information (the result of the comparison) but nothing else. The reconstruction of the protocol is based on multiple homomorphic encryptions and decryptions using ElGamal. Finally, the multiple comparison problem is addressed which can be applied to the fingerprint matching problem as described above. The challenge is to let an untrustworthy proxy compare multiple inequality functions, learning only if all off the functions satisfied the comparison conditions or that some failed while letting the proxy remain oblivious to which conditions failed. The output of the function also only leaks one bit of information. The solution is based on the same bit-wise comparison protocol as the single comparison but it is reconstructed using a different homomorphic encryption scheme and extending the hiding function used for comparison. This thesis demonstrats that private computing protocols can be designed to protect the privacy of the users while providing functionality in the cryptographic domain. Moreover, the presented protocols can also be applied within other applications where untrustworthy proxies are unavoidable.Multimedia Signal Processing GroupElectrical Engineering, Mathematics and Computer Scienc

    Object-based Video Segmentation with Region Labeling

    No full text
    Electrical Engineering, Mathematics and Computer Scienc

    Geometric Distortion in Image and Video Watermarking. Robustness and Perceptual Quality Impact

    No full text
    The main focus of this thesis is the problem of geometric distortion in image and video watermarking. In this thesis we discuss the two aspects of the geometric distortion problem, namely the watermark desynchronization aspect and the perceptual quality assessment aspect. Furthermore, this thesis also discusses the challenges of watermarking data compressed in low bit-rates. The main contributions of this thesis are: ⢠A watermarking algorithm suitable for low bit-rate video has been proposed. ⢠Two different approaches has been proposed to deal with the watermark desynchronization problem. ⢠A novel approach has been proposed to quantify the perceptual quality impact of geometric distortion.Electrical Engineering, Mathematics and Computer Scienc

    Distributed Video Coding (DVC): Motion estimation and DCT quantization in low complexity video compression

    No full text
    The main focus of video encoding in the past twenty years has been on video broadcasting. A video is captured and encoded by professional equipment and then watched on varying consumer devices. Consequently, the focus was to increase the quality and to keep down the decoder complexity. In more recent years we observe a shift in user behavior, from solely consuming video to also producing and sharing video. As opposed to professional cameras such constrained media devices are limited by the encoder complexity. This thesis addresses Distributed Video Coding (DVC) as a possible solution for very low complexity video encoding. Straightforward intra coding techniques at the encoder is combined with exploiting motion information at the decoder side. In particular, the thesis focuses on the problems that typically emerge when exploiting temporal correlation solely at the decoder. The thesis covers performance limitations of different DVC aspects, namely channel coding, motion estimation at the decoder and quantization. All proposed schemes focus on allowing real-time encoding. In channel coding, we investigate decoder-based modeling. In motion estimation at the decoder, we focus on true motion-based extrapolation. In quantization, we propose a trade-off between adaptivity and overhead. Finally, we compare the derived solutions for each DVC aspect with its counterpart in conventional video coding. We find that DVC can outperform intra coding with a similar encoder complexity. However, for a less constrained encoder complexity conventional inter coding outperforms DVC by a large margin.TU Delft MediamaticaElectrical Engineering, Mathematics and Computer Scienc
    corecore