Search CORE

1,720,990 research outputs found

Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

Author: Chris Schwiegelshohn
Schwiegelshohn Chris
Alexander Munteanu
Munteanu Alexander
Publication venue
Publication date: 19/12/2017
Field of study

We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Digital Library of Gesellschaft für Informatik e.V.

On Finding the Jaccard Center

Author: SCHWIEGELSHOHN CHRIS RENE
Schwiegelshohn Chris
Bury Marc
Publication venue
Publication date: 01/01/2017
Field of study

We initiate the study of finding the Jaccard center of a given collection N of sets. For two sets X,Y, the Jaccard index is defined as |X\cap Y|/|X\cup Y| and the corresponding distance is 1-|X\cap Y|/|X\cup Y|. The Jaccard center is a set C minimizing the maximum distance to any set of N. We show that the problem is NP-hard to solve exactly, and that it admits a PTAS while no FPTAS can exist unless P = NP. Furthermore, we show that the problem is fixed parameter tractable in the maximum Hamming norm between Jaccard center and any input set. Our algorithms are based on a compression technique similar in spirit to coresets for the Euclidean 1-center problem. In addition, we also show that, contrary to the previously studied median problem by Chierichetti et al. (SODA 2010), the continuous version of the Jaccard center problem admits a simple polynomial time algorithm

DROPS Dagstuhl Research Online Publication Server

Archivio della ricerca- Università di Roma La Sapienza

Sketch 'Em All: Fast Approximate Similarity Search for Dynamic Data Streams

Author: SCHWIEGELSHOHN CHRIS RENE
Sorella Mara
Chris Schwiegelshohn
Marc Bury
Mara Sorella
Bury Marc
Publication venue
Publication date: 01/01/2018
Field of study

Recommender systems are an integral part of many web applica- tions. With increasingly larger user bases, scalability has become an important issue. Many of the most scalable algorithms with respect to both space and running times are based on locality-sensitive hashing (LSH). However, a significant drawback is that these meth- ods are only able to handle insertions to user profiles and tend to perform poorly when items may be removed. We initiate the study of scalable locality-sensitive hashing for dynamic input. Specifi- cally, using the Jaccard index as similarity measure, we design (1) a sketching algorithm for similarity estimation via a black box re- duction to l0 norm estimation and (2) a locality sensitive hashing scheme maintainable in fully dynamic data streams that quickly filters out low-similarity pairs. Our algorithms have little to no overhead in terms of running time compared to previous LSH ap- proaches for the insertion only case, and drastically outperform previous algorithms in case of deletion

Crossref

Archivio della ricerca- Università di Roma La Sapienza

PEPPA: a project for evolutionary predator prey algorithms

Author: Chris Schwiegelshohn
Küch Christiane
Losemann Katja
Schwiegelshohn Chris
Christiane Küch
Katja Losemann
Blom Hendrik
Hendrik Blom
Publication venue
Publication date: 01/01/2009
Field of study

The predator-prey model--based on aspects of the natural interplay of predators and prey--has become an alternative method for tackling multi-objective optimization problems. In this process, each predator targets a single objective, and it is expected that the joint influence of all predators affects the prey population in such a way that good solutions survive. This paper describes PEPPA, a modular software framework for designing and analyzing predator-prey models. It allows to model arbitrary world environments, complex predator behavior and dynamic prey adaptation. Further, PEPPA provides various tools for modeling, visualization and parallelization. We explain the architecture and handling of the framework and provide exemplary results on a simple multi-objective benchmark problem

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Fair Projections as a Means Towards Balanced Recommendations

Author: Böhm Matteo
Fazzone Adriano
Leonardi Stefano
Schwiegelshohn Chris
Anagnostopoulos Aris
Becchetti Luca
Menghini Cristina
Publication venue
Publication date: 01/01/2024
Field of study

The goal of recommender systems is to provide to users suggestions that match their interests, with the eventual goal of increasing their satisfaction, as measured by the number of transactions (clicks, purchases, etc.). Often, this leads to providing recommendations that are of a particular type. For some contexts (e.g., browsing videos for information) this may be undesirable, as it may enforce the creation of filter bubbles. This is because of the existence of underlying bias in the input data of prior user actions. Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. In this paper, we consider both the densest subgraph and the k-clustering problem, two primitives that are being used by some recommender systems. We are given a coloring on the nodes, respectively the points, and aim to compute a fair solution S, consisting of a subgraph or a clustering, such that none of the colors is disparately impacted by the solution. Unfortunately, introducing fair solutions typically makes these problems substantially more difficult. Unlike the unconstrained densest subgraph problem, which is solvable in polynomial time, the fair densest subgraph problem is NP-hard even to approximate. For k-clustering, the fairness constraints make the problem very similar to capacitated clustering, which is a notoriously hard problem to even approximate. Despite such negative premises, we are able to provide positive results in important use cases. In particular, we are able to prove that a suitable spectral embedding allows recovery of an almost optimal, fair, dense subgraph hidden in the input data, whenever one is present, a result that is further supported by experimental evidence. We also show a polynomial-time, 2-approximation algorithm to the problem of fair densest subgraph, assuming that there exist only two colors and both colors occur equally often in the graph. This result turns out to be optimal assuming the small set expansion hypothesis. For fair k-clustering, we show that we can recover high quality fair clusterings effectively and efficiently. For the special case of k-median and k-center, we offer additional, fast and simple approximation algorithms as well as new hardness results. The above theoretical findings drive the design of heuristics, which we experimentally evaluate on a scenario based on real data, in which our aim is to strike a good balance between diversity and highly correlated items from Amazon co-purchasing graphs and facebook contacts

Archivio della ricerca- Università di Roma La Sapienza

Solving the Minimum String Cover Problem

Author: Chris Schwiegelshohn
Schwiegelshohn C.
Schwiegelshohn Chris
Stefan Canzar
Sven Rahmann
Canzar S. (Stefan)
Marschall T. (Tobias)
Marschall Tobias
Tobias Marschall
Canzar Stefan
Rahmann Sven
Rahmann S. (Sven)
Publication venue
Publication date: 01/01/2012
Field of study

A string cover

C

of a set of strings

S

is a set of substrings from

S

such that every string in

S

can be written as a concatenation of the strings in

C

. Given costs assigned to each substring from

S

, the \textsc{Minimum String Cover} (MSC) problem asks for a cover of minimum total cost. This NP-hard problem has so far only been approached from a purely theoretical perspective. A~previous integer linear programming (ILP) formulation was designed for a special case, in which each string in

S

must be generated by a (small) constant number of substrings. If this restriction is removed, the ILP has an exponential number of variables, for which we show the pricing problem to be NP-hard. We propose an alternative flow-based ILP formulation of polynomial size, whose structure is particularly favorable for a Lagrangian relaxation approach. By making use of the strong bounds obtained through a repeated shortest path computation in a branch-and-bound manner, we show for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time. We also provide and solve real-world instances derived from the classic text ``Alice in Wonderland''. On almost all instances, our Lagrangian relaxation approach outperforms a CPLEX-based implementation by an order of magnitude. Our software is available under the terms of the GNU general public license

Crossref

CWI's Institutional Repository

Archivio della ricerca- Università di Roma La Sapienza

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository

Appropriate Similarity Measures for Author Cocitation Analysis

Author: Waltman L.R.
Eck N.J.P. van
Publication venue
Publication date
Field of study

We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authorsâ€™ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

Research Papers in Economics