1,721,004 research outputs found

    Locality-Sensitive Hashing of Curves

    Full text link
    We study data structures for storing a set of polygonal curves in R^d such that, given a query curve, we can efficiently retrieve similar curves from the set, where similarity is measured using the discrete Fréchet distance or the dynamic time warping distance. To this end we devise the first locality-sensitive hashing schemes for these distance measures. A major challenge is posed by the fact that these distance measures internally optimize the alignment between the curves. We give solutions for different types of alignments including constrained and unconstrained versions. For unconstrained alignments, we improve over a result by Indyk [SoCG 2002] for short curves. Let n be the number of input curves and let m be the maximum complexity of a curve in the input. In the particular case where m 0, our solutions imply an approximate near-neighbor data structure for the discrete Fréchet distance that uses space in O(n^(1+a) log n) and achieves query time in O(n^a log^2 n) and constant approximation factor. Furthermore, our solutions provide a trade-off between approximation quality and computational performance: for any parameter k in [m], we can give a data structure that uses space in O(2^(2k) m^(k-1) n log n + nm), answers queries in O( 2^(2k) m^(k) log n) time and achieves approximation factor in O(m/k)

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Constrained Clustering Problems and Parity Games

    Full text link
    Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. We study several clustering objectives. We begin with studying the Euclidean k-center problem. The k-center problem is a classical combinatorial optimization problem which asks to select k centers and assign each input point in a set P to one of the centers, such that the maximum distance of any input point to its assigned center is minimized. The Euclidean k-center problem assumes that the input set P is a subset of a Euclidean space and that each location in the Euclidean space can be chosen as a center. We focus on the special case with k = 1, the smallest enclosing ball problem: given a set of points in m-dimensional Euclidean space, find the smallest sphere enclosing all the points. We combine known results about convex optimization with structural properties of the smallest enclosing ball to create a new algorithm. We show that on instances with rational coefficients our new algorithm computes the exact center of the optimal solutions and has a worst-case run time that is polynomial in the size of the input. We use the new algorithm to show that we can solve the Euclidean k-center problem in polynomial time for constant k and dimension m. The general unconstrained clustering problems are mostly very well studied. The k-center problem for example allows for elegant 2-approximation algorithms(Gonzalez 1985, Hochbaum,Shmoys 1986). However, the situation becomes significantly more difficult when constraints are added to the problem. We first look at the fair clustering. The fairness constraint is motivated by the fact that the general process of computing a clustering may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a model for fair clustering requiring the representation in each cluster to (approximately) preserve the global fraction of each protected class. Restricting to two protected classes, they developed both a 4-approximation algorithm for the fair k-center problem and an O(t)-approximation algorithm for the fair k-median problem, where t is a parameter for the fairness model. For multiple protected classes, the best known result is a 14-approximation algorithm for fair k-center (Rösner, Schmidt 2018). We extend and improve the known results. Firstly, we give a 5-approximation algorithm for the fair k-center problem with multiple protected classes. Secondly, we propose a relaxed fairness notion under which we can give bicriteria constant-factor approximation algorithms for the fair version of all of the classical clustering objectives (k-center, k-supplier, k-median, k-means and facility location). The latter approximation algorithms are achieved by a framework that takes an arbitrary existing unfair (integral) solution and a fair (fractional) LP solution and combines them into an essentially fair clustering with a weakly supervised rounding scheme. In this way, a fair clustering can be established belatedly, in a situation where for example the centers are already fixed. The second clustering constraint we study is privacy: Here, we are asked to only open a center when at least l points will be assigned to it. We raise the question whether a general method can be derived to turn an approximation algorithm for a clustering problem with some constraints into an approximation algorithm that additionally respects privacy. We show how to combine privacy with several other constraints and obtain approximation algorithms for the k-center problem with several combinations of constraints. In this dissertation we also study parity games, a two player game played on a directed graph. We study the case in which one of the two players controls only a small number k of nodes and the other player controls the n-k other nodes of the game. Our main result is a fixed-parameter-tractable algorithm that solves bipartite parity games in time k^{O(sqrt{k})} O(n^3), and general parity games in time (p+k)^{O(sqrt{k})} O(pnm), where p is the number of distinct priorities and m is the number of edges. For all games with k = o(n) this improves the previously fastest algorithm by Jurdziński, Paterson, and Zwick (2008). We also obtain novel kernelization results and an improved deterministic algorithm for parity games on graphs with small average node-degree

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    On Discrete and Geometric Firefighting

    Full text link
    Wildfires ravaging forests around the globe cost lives, homes and billions in damages every year, which motivates the study of effective firefighting. In the area of theoretical computer science, several different models inspired by firefighting have been established and studied to find efficient firefighting strategies.In Hartnell’s fire fighter problem, a fire burns through the vertices in a graph in rounds: in each round, the fire spreads from each burning vertex to all adjacent vertices. A firefighter is tasked with protecting as many vertices of the graph as possible by blocking a single vertex each round. While a plethora of results have been obtained with respect to specific graph classes or variations of the firefighter’s power, the modus operandi of the fire rarely changes.We study a new model generalizing the one used by Hartnell to better simulate the varying speeds, at which a fire spreads through different terrain types, by incorporating a fire resistance and energy for each vertex. We present an efficient algorithm to track the fire propagation in a given graph G = (V,E) in time O(|E| · |V|), that is particularly efficient in graphs with bounded vertex degree, where its runtime lies in O(|V| log |V|). We also obtain polynomial-time algorithms for two problems regarding protecting a set of vertices along the boundary of a hexagonal cell graph. We show NP-completeness for an inverted third problem, where the goal is instead to ignite a set of target cells given a set of starter cells. We also examine the unique features of the model and propose a number of new questions utilizing them.In the second part of this thesis we focus on geometric firefighting as introduced by Bressan. In that model, the fire burns a region of the Euclidean plane that grows over time with unit speed, and has to be contained by building barrier curves with some building speed v. In the original problem, the exact necessary building speed to contain the fire is not known, as a gap remains between the best known strategy for a speed of v > 2 and a lower bound of v > 1. The difficulty in closing this gap seems to lie with the following question: To contain a fire, should one build an enclosing barrier at maximum speed, or is it better to invest some time in building extra delaying barriers that will not be part of the final enclosure but can slow the fire down during construction?To get a step closer to the answer to this question, we mainly study a variant of the original problem, where a fire spreading at unit speed according to the L1-metric is to be contained in an open half-plane. Towards that goal, the firefighter is allowed to build one infinite enclosing barrier along the x-axis, and vertical delaying barriers attached to it. We prove that at least a building speed of v > 1.6 is necessary to contain the fire in this variant, while providing a strategy that suceeds for a speed v > 1.8772. We also study some smaller variants of both this and the original problem.In the final part of this thesis, we study the Minimum Enclosing Ball problem in high dimensions: given a set of n points in the d-dimensional Euclidean space, find the ball of minimum radius containing all points. This is a classic clustering problems and has been studied extensively in the past, often together with its generalization, the Euclidean k-center problem. Among the known results are polynomial-time algorithms to obtain optimal solutions for fixed k and d by Megiddo and Welzl, polynomial-time (1+ε)-approximation algorithms for k = 1, but arbitrary d (see Bădoiu and Clarkson or Kumar et al.) as well as – most recently – a polynomial-time algorithm for instance with rational coefficients by Rösner. However, it can not be solved in polynomial time for general d. We provide a simple gradient-descent based (1+ε)-approximation algorithm, that runs in time O(nd 1/ε ) and improves on the similar core-set based approximation algorithms

    Subtrajectory Clustering, Curve Averaging and the Complexity of Underlying Range Spaces

    Full text link
    In this thesis, we study the clustering of spatial data with a focus on trajectory data. Trajectory data appears in various applications. These range from the recorded positions of moving objects (e.g. animals, humans, vehicles) to the change of measurements over time (e.g. biomarkers, electricity demand, temperature, sea level). A trajectory is usually modeled as a polygonal curve that is derived from the data by linear interpolation between consecutive observations. A clustering area that we are particularly interested in is subtrajectory clustering which consists of finding reoccurring patterns in trajectory data. We model subtrajectory clustering as a set cover problem and measure similarity based on the Fréchet distance. Given a polygonal curve with n vertices, the goal is to find the smallest set of center curves of complexity l such that each point on the input curve is part of a subcurve that has Fréchet distance of at most a given Delta to at least one of the center curves. We design bicriterial-approximation algorithms for this NP-hard problem. If there exists a solution of size k, then our algorithms find solutions of size O(k l log(k) log(l)) that solve the problem under distance O(Delta). The expected running time and space requirement of our algorithms is polynomial in k, l, n, 1/Delta and the arclength of the input curve. Our approach uses a variation of the multiplicative weight update method on a simplified version of the problem. The second clustering problem that we study is curve averaging: the problem of optimizing the center curve for a fixed set of curves. In particular, we study a widely used heuristic for curve averaging under the dynamic time warping (DTW) distance called the DTW Barycenter Averaging (DBA) algorithm. The algorithm is very similar to the popular k-means algorithm. Given an initial center curve, it alternates between assignment and update steps until convergence. We study the number of iterations that DBA performs until it converges to a local optimal solution. We assume that DBA is given n polygonal curves of m points in R^d and a parameter k that specifies the length of the average curve to be computed. We conduct experiments that support the general view that DBA converges fast in practice. In contrast, we show that in the worst-case the number of iterations can be exponential in k. This gap between practical performance and worst-case analysis suggests that the worst-case behaviour is likely degenerate. To analyze the number of iterations on non-degenerate input, we further study DBA in the model of smoothed analysis. This model is based on bounding the expected number of iterations in the worst-case under random perturbation of the input. We achieve a bound that is polynomial in k, n and d, and for constant n/d is also polynomial in m. Additionally, we study the complexity of range spaces underlying clustering problems where ranges are balls that are implicitly given by a center and a radius Delta and include all elements that are at a distance of at most Delta to the center. As distance measures, we consider Hausdorff distance, Fréchet distance and DTW. As centers and elements of the ground set, we consider polygonal curves in R^d and polygonal regions in R^2. To measure the complexity, we bound the VC-dimension and the shattering dimension of the resulting range spaces. Our approach is based on splitting range queries for the considered range spaces into simple predicates that can be determined by sign values of polynomials. This enables us to bound the VC-dimension and shattering dimension based on the number of cells in the arrangement of zero sets of these polynomials

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore