1,720,965 research outputs found
Spectral relaxations and fair densest subgraphs
Reducing hidden bias in the data and ensuring fairness in algorithmic data analysis has recently received significant attention. In this paper, we address the problem of identifying a densest subgraph, while ensuring that none of one binary protected attribute is disparately impacted. Unfortunately, the underlying algorithmic problem is NP-hard, even in its approximation version: approximating the densest fair subgraph with a polynomial-time algorithm is at least as hard as the densest subgraph problem of at most k vertices, for which no constant approximation algorithms are known. Despite such negative premises, we are able to provide approximation results in two important cases. In particular, we are able to prove that a suitable spectral embedding allows recovery of an almost optimal, fair, dense subgraph hidden in the input data, whenever one is present, a result that is further supported by experimental evidence. We also show a polynomial-time, -approximation algorithm, whenever the underlying graph is itself fair. We finally prove that, under the small set expansion hypothesis, this result is tight for fair graphs. The above theoretical findings drive the design of heuristics, which we experimentally evaluate on a scenario based on real data, in which our aim is to strike a good balance between diversity and highly correlated items from Amazon co-purchasing graphs
Polynomial Time Approximation Schemes for All 1-Center Problems on Metric Rational Set Similarities
In this paper, we investigate algorithms for finding centers of a given collection N of sets. In particular, we focus on metric rational set similarities, a broad class of similarity measures including Jaccard and Hamming. A rational set similarity S is called metric if D= 1 - S is a distance function. We study the 1-center problem on these metric spaces. The problem consists of finding a set C that minimizes the maximum distance of C to any set of N. We present a general framework that computes a (1 + ε) approximation for any metric rational set similarity
Commitment and Slack for Online Load Maximization
We consider a basic admission control problem in which jobs with deadlines arrive online and our goal is to maximize the total volume of executed job processing times. We assume that the deadlines have a slack of at least ε, that is, each deadline d satisfies d≥ (1+ε)· p+r with processing time p and release date r. In addition, we require the admission policy to support immediate commitment, that is, upon a job's submission, we must immediately make the decision of if and where we schedule the job, and this decision is irreversible. Our main contribution is a deterministic algorithm with nearly optimal competitive ratio for load maximization on multiple machines in the non-preemptive model. Previous results either only held for a single machine, did not support commitment, or required job preemption and migration
(1 + ε)-approximate incremental matching in constant deterministic amortized time
We study the matching problem in the incremental setting, where we are given a sequence of edge insertions and aim at maintaining a near-maximum cardinality matching of the graph with small update time. We present a deterministic algorithm that, for any constant ε > 0, maintains a (1 + ε)-approximate matching with constant amortized update time per insertion
Algorithms for fair k-clustering with multiple protected attributes
We study fair center based clustering problems. In an influential paper, Chierichetti, Kumar, Lattanzi and Vassilvitskii (NIPS 2017) consider the problem of finding a good clustering, say of women and men, such that every cluster contains an equal number of women and men. They were able to obtain a constant factor approximation for this problem for most center based k-clustering objectives such as k-median, k-means, and k-center. Despite considerable interest in extending this problem for multiple protected attributes (e.g. women and men, with or without citizenship), so far constant factor approximations for these problems have remained elusive except in special cases. We settle this question in the affirmative by giving the first constant factor approximation for a wide range of center based k-clustering objectives
Random projection to preserve patient privacy
With the availability of accessible and widely used cloud services, it is natural that large components of healthcare systems migrate to them; for example, patient databases can be stored and processed in the cloud. Such cloud services provide enhanced flexibility and additional gains, such as availability, ease of data share, and so on. This trend poses serious threats regarding the privacy of the patients and the trust that an individual must put into the healthcare system itself. Thus, there is a strong need of privacy preservation, achieved through a variety of different approaches. In this paper, we study the application of a random projection-based approach to patient data as a means to achieve two goals: (1) provably mask the identity of users under some adversarial-attack settings, (2) preserve enough information to allow for aggregate data analysis and application of machine-learning techniques. As far as we know, such approaches have not been applied and tested on medical data. We analyze the tradeoff between the loss of accuracy on the outcome of machine-learning algorithms and the resilience against an adversary. We show that random projections proved to be strong against known input/output attacks while offering high quality data, as long as the projected space is smaller than the original space, and as long as the amount of leaked data available to the adversary is limited
Algorithms for fair team formation in online labour marketplaces
As freelancing work keeps on growing almost everywhere due to a sharp decrease in communication costs and to the widespread of Internet-based labour marketplaces (e.g., guru.com, feelancer.com, mturk.com, upwork.com), many researchers and practitioners have started exploring the benefits of outsourcing and crowdsourcing [13, 14, 16, 23, 25, 29]. Since employers often use these platforms to find a group of workers to complete a specific task, researchers have focused their efforts on the study of team formation and matching algorithms and on the design of effective incentive schemes [2-4, 17]. Nevertheless, just recently, several concerns have been raised on possibly unfair biases introduced through the algorithms used to carry out these selection and matching procedures. For this reason, researchers have started studying the fairness of algorithms related to these online marketplaces [8, 19], looking for intelligent ways to overcome the algorithmic bias that frequently arises. Broadly speaking, the aim is to guarantee that, for example, the process of hiring workers through the use of machine learning and algorithmic data analysis tools does not discriminate, even unintentionally, on grounds of nationality or gender. In this short paper, we define the Fair Team Formation problem in the following way: given an online labour marketplace where each worker possesses one or more skills, and where all workers are divided into two or more not overlapping classes (for examples, men and women), we want to design an algorithm that is able to find a team with all the skills needed to complete a given task, and that has the same number of people from all classes. We provide inapproximability results for the Fair Team Formation problem together with four algorithms for the problem itself. We also tested the effectiveness of our algorithmic solutions by performing experiments using real data from an online labor marketplace
Structural Results on Matching Estimation with Applications to Streaming
We study the problem of estimating the size of a matching when the graph is revealed in a streaming fashion. Our results are multifold:1.We give a tight structural result relating the size of a maximum matching to the arboricityα of a graph, which has been one of the most studied graph parameters for matching algorithms in data streams. One of the implications is an algorithm that estimates the matching size up to a factor of (α+ 2) (1 + ε) using O~ (αn2 / 3) space in insertion-only graph streams and O~ (αn4 / 5) space in dynamic streams, where n is the number of nodes in the graph. We also show that in the vertex arrival insertion-only model, an (α+ 2) approximation can be achieved using only O(log n) space.2.We further show that the weight of a maximum weighted matching can be efficiently estimated by augmenting any routine for estimating the size of an unweighted matching. Namely, given an algorithm for computing a λ-approximation in the unweighted case, we obtain a 2 (1 + ε) · λ approximation for the weighted case, while only incurring a multiplicative logarithmic factor in the space bounds. The algorithm is implementable in any streaming model, including dynamic streams.3.We also investigate algebraic aspects of computing matchings in data streams, by proposing new algorithms and lower bounds based on analyzing the rank of the Tutte-matrix of the graph. In particular, we present an algorithm determining whether there exists a matching of size k using O(k2log n) space.4.We also show a lower bound of Ω (n1-ε) space for small approximation factors to the maximum matching size in insertion-only streams. This lower bound also holds for approximating the rank of a matrix
Fair Coresets and Streaming Algorithms for Fair k-means
We study fair clustering problems as proposed by Chierichetti et al. [CKLV17]. Here, points have a sensitive attribute and all clusters in the solution are required to be balanced with respect to it (to counteract any form of data-inherent bias). Previous algorithms for fair clustering do not scale well. We show how to model and compute so-called coresets for fair clustering problems, which can be used to significantly reduce the input data size. We prove that the coresets are composable [IMMM14] and show how to compute them in a streaming setting. This yields a streaming PTAS for fair k-means in the case of two colors (and exact balances). Furthermore, we extend techniques due to Chierichetti et al. [CKLV17] to obtain an approximation algorithm for k-means, which leads to a constant factor algorithm in the streaming model when combined with the coreset
- …
