1,720,996 research outputs found

    Content-based multimedia retrieval: indexing and diversification

    No full text
    The demand for efficient systems that facilitate searching in multimedia databases and collections is vastly increasing. Application domains include criminology, musicology, trademark registration, medicine and image or video retrieval on the web. This thesis discusses content-based retrieval techniques that can be applied on these databases. The most important operation to support is query-by-example: retrieve items that are to some extent similar to a given query. In chapter 2 we study vantage indexing, a generic indexing approach not limited to a specific domain, because it only requires a (preferably metric) way of calculating a distance between the objects. With vantage indexing, objects are no longer compared directly, but it is investigated how similar their resemblance is to a set of reference objects: the vantage objects. We present a new way of selecting vantage objects, and demonstrate experimentally the scalability of the approach and the improvement in retrieval performance over existing methods. In many applications, it is possible to represent the objects by graphs. In chapter 3 we propose an indexing strategy for these scenarios. The assumption is that the graph's topology can be used to describe the underlying object. This topology is stored the laplacian matrix, and the sorted set of eigenvalues (spectrum) of this matrix is used as an indexing signature. To account for partial similarity, the difference in spectra of many subgraphs is considered as well. Both theory and experimental work support the claim that similarity in laplacian spectrum predicts similarity between the original objects. The proposed representation outperforms existing methods in various application domains. A possible limitation of the approach presented in chapter 3 is that two graphs with the same topology share the same representation, while the underlying objects may be different. In chapter 4, we enrich the graph representation therefore by storing additional object properties. We develop a complex-valued analogue of the laplacian matrix, a Hermitian matrix, and use its eigenvector associated to the second smallest eigenvalue as indexing signature. This eigenvector is known to be informative about the graph, and can be reused to partition the graph into meaningful subgraphs, resulting in less subgraphs to compare for partial similarity. We provide a successful instance of this framework within the context of 3D object retrieval. In chapter 5, we claim that good retrieval results are not only relevant to the query, they should in fact reflect the diversity of the relevant objects that are present in the collection as well. Given an initial result set for a user query (image search), we propose to cluster the retrieved images based on their visual characteristics and to show the most important objects from each cluster. We first dynamically determine appropriate weights of visual features for a specific query. These weights are used in a dynamic ranking function that is deployed in a clustering technique to obtain a diverse ranking based on cluster representatives. We provide three lightweight and efficient clustering algorithms, and show that the algorithmic output coincides consistently with the results of a user study

    Supervised learning algorithms for visual object categorization

    No full text
    This thesis presents novel techniques for image recognition systems for better understanding image content. More specifically, it looks at the algorithmic aspects and experimental verification to demonstrate the capability of the proposed algorithms. These techniques aim to improve the three major components that are part of current state-of-the-art image recognition systems. This thesis offers four algorithms implementing different strategies to effectively classify images into correct categories automatically. A set of images from all categories is selected, labeled and next become learning data to learn models to categorize other images. To show the effectiveness of the proposed algorithms, all approaches were validated on several standard datasets namely PASCAL 2006 and 2007, Caltech-101 and Corel. Each proposed algorithm is explained in detail in separate chapters in this thesis

    Affective appraisal of virtual environments

    No full text
    Interactive navigable 3D visualisations of built and natural environments have become commonplace in design and planning of urban environments and landscapes, and are regarded as potent prototyping and communication tools. In training applications, for instance for fire fighters, virtual environments displayed on desktop monitors or on projection screens are used to represent situations and scenarios that cannot be created in the real world for reasons of safety, cost, and time. A valid simulated environment should induce not only a cognitive, but also an emotional response in the observer equivalent to the response to the real environment. For visualisations, this means that viewers experience the ‘ambience’ of a place as they would in the real environment. In training environments and serious games, the emotional response of the trainee to the virtual scenario should be similar to the response in a real, often stressful, situation. A common assumption is that the highest possible level of photorealism ensures a valid representation. However, the focus in the development of virtual environments is on spatial tasks or other cognitive tasks, not on the affective qualities of an environment. The virtual environments do not contain the required information for that purpose, which can be visual, but also of other sensory modalities (audio, tactile and olfactory). In eight empirical studies (lab experiments and field studies) we examined factors in the content and representation of the virtual environment, and factors related to user characteristics, and their effects on the emotional response of users. We found that when users appraise a virtual environment they do not distinguish between their appraisals of the represented environment, of the representation, and of their pre-existing individual mental representations. The mental representation fills in more information than the user is aware of. The user’s emotional state, induced by other factors than the virtual environment, such as cybersickness, may influence the appraisal of the virtual environment. The absence of personal involvement, factors that diminish the perceived (graphics and audio) quality of the 3D environment, and factors that distract the attention of the user, attenuate the impact of cues and thereby the intensity of the emotional response. The importance of personal involvement and the context of use of a visualisation or a virtual training for their validity are generally underestimated. We discern three categories of features of real environments, relating to spatial layout and functionality, to the meaning and function of elements of the environment, and ambient conditions, that can be used to guide the modeling process. We developed a comprehensive framework containing factors (such as features of the environment, representational modifiers and response moderators) that influence and modify the appraisal process in virtual environments, that can be used for measuring emotional responses in virtual environments. We complete our research with guidelines for the development and use of desktop virtual environments for visualisations and training application

    Optimization of polyhedral terrains

    No full text
    A digital terrain model is a representation of a real-world terrain in a computer. Terrain models play an important role in geographic information systems, where they are used for numerous purposes. For example, a terrain model can be used to simulate rainfall and to predict terrain areas that are prone to flooding. Or it can help to locate the best spot where to place a fire lookout tower in a wilderness area, such that the terrain area visible from the tower is as large as possible. One of the main ways to represent a terrain is by a triangulation: some points are sampled from the real terrain, and they are connected by triangles that cover the whole terrain area. This results in a subdivision into triangles. It is well-known that when triangulations are used for terrain modeling, there are several geometric aspects of the triangulation that have an important effect. In particular, it is an established fact that long and skinny triangles should be avoided. Moreover, for terrain modeling one cannot ignore that every point has an elevation, thus there are other requirements the triangulation must fulfill in order to provide a reasonable model for a terrain. For example, terrains formed by natural processes have few depressions. Depending on the triangulation chosen, the number of resulting depressions can be excessively high. If the terrain is meant to be used for hydrologic simulations (such as rainfall simulation), these spurious pits will cause serious disruptions in the water flow, rendering the model useless. In this case, depressions are an example of artifacts of the triangulation that should be avoided. This thesis presents new automated methods to improve terrain models, by finding triangulations with well-shaped triangles and---at the same time---as few artifacts as possible. The methods proposed can produce a more accurate and reliable representation of terrains

    Music information retrieval based on tonal harmony

    No full text
    With the emergence of large scale digitalisation of music, content-based methods to maintain, structure, and provide access to digital music repositories have become increasingly important. This doctoral dissertation covers a wide range of methods that aim to aid in the organisation of music information. From both a practical as well as cognitive point of view, it is logical to structure musical content by defining similarity relations between documents. Consequently, the notion of music similarity has become a fundamental concept within the area music information retrieval (MIR) research. In this dissertation we study a particular type of music similarity: the similarity of musical harmony. Because both musically trained and untrained listeners have extensive knowledge about music, it is rather unlikely that all information needed for sound similarity judgement can be found in the musical information source alone. Therefore, to be able to place chord sequences in the context of Western tonal harmony, we investigate two approaches towards automatic harmony analysis. Although the first generative grammar-based solution yields good results on a small dataset, it exposed some practical challenges that prevented it to be extended to process larger datasets. Hence, the second harmonic analysis solution exploits state-of-the-art functional programming techniques, like type-level computation and error-correcting parsers, to meet these challenges. This model, named HarmTrace, is fast, flexible, and returns analyses that are in accordance with harmony theory. We evaluate these harmonic annotations, which explain the role of a chord in its tonal context, both qualitatively as well as quantitatively, and show how they can aid in harmonic similarity estimation and automatic chord transcription. We investigate three novel approaches to harmonic similarity: a geometric, a local alignment, and a common embeddable subtree based approach. The geometric approach, named TPSD, uses a music theoretically motivated step functions to assess the similarity of two chord sequences; the common embeddable subtree approach estimates harmonic similarity by matching hierarchical harmonic analysis annotations; and the local alignment solution uses context-aware substitution functions to align sequences of chords. For each of these harmonic similarity solutions, the adjustable parameters are discussed and evaluated. For the evaluation a large new chord sequence corpus is assembled consisting of 5028 different chord sequences, some of which describe the same song. The results show that an alignment approach that uses the HarmTrace harmony model performs best in retrieving these similar chord sequences. All proposed similarity measures rely on the availability of sequences of symbolic chord labels. To extend the application domain, we demonstrate how automatic chord transcription from musical audio can be improved by exploiting our model of tonal harmon

    Comparing Harmonic Similarity Measures

    No full text
    We present an overview of the most recent developments in polyphonic music retrieval and an experiment in which we compare two harmonic similarity measures. In contrast to earlier work, in this paper we specifically focus on the symbolic chord description as the primary musical representation and the similarity between sequences of these descriptions. In the experiment we compare a geometrical and an alignment approach to harmonic similarity, and measure the effects of chord description detail and a priori key information on retrieval performance. For this experiment a large new chord sequence corpus is assembled. The results show that a computational costly alignment approach significantly outperforms a much faster geometrical approach in most cases, that a priori key information boosts retrieval performance, and that using a triadic chord representation yields significantly better results than using more simple or more complex chord representations

    Space-Efficient Hidden Surface Removal

    No full text
    We propose a space-efficient algorithm for hidden surface removal that combines one of the fastest previous algorithms for that problem with techniques based on bit manipulation. Such techniques had been successfully used in other settings, for example to reduce working space for several graph algorithms. However, bit manipulation is not usually employed in geometric algorithms because the standard model of computation (the real RAM) does not support it. For this reason, we first revisit our model of computation to have a reasonable theoretical framework. Under this framework we show how the use of a bit representation for the union of triangles, in combination with rank-select data structures, allows to implicitly compute the union of nn triangles with roughly O(1)O(1) bits per union boundary vertex. This allows us to reduce the required working space by a factor logn)log n) while maintaining the running time
    corecore