1,721,334 research outputs found

    StochColor: Stochastic Coloring based Graph Partitioning

    No full text
    Graph partitioning is a classical problem in computer science. Most algorithms consist of heuristic, spectral and stochastic flow based methods. In this paper a novel technique for graph partitioning is presented. The proposed algorithm, called StochColor extracts partitions from the most likely state of a stochastic graph coloring process. Empirical results show that StochColor is comparable to or significantly better than state of the art spectral clustering and stochastic flow based methods, across a variety of applications.Pathak, Nishith; Banerjee, Arindam; Srivastava, Jaideep. (2010). StochColor: Stochastic Coloring based Graph Partitioning. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215829

    Bayesian Co-clustering

    No full text
    In recent years, co-clustering has emerged as a powerful data mining tool that can analyze dyadic data connecting two entities. However, almost all existing co-clustering techniques are partitional, and allow individual rows and columns of a data matrix to belong to only one cluster. Several current applications, such as recommendation systems and market basket analysis, can substantially benefit from a mixed membership of rows and columns. In this paper, we present Bayesian co-clustering (BCC) models, that allow a mixed membership in row and column clusters. BCC maintains separate Dirichlet priors for rows and columns over the mixed membership and assumes each observation to be generated by an exponential family distribution corresponding to its row and column clusters. We propose a fast variational algorithm for inference and parameter estimation. The model is designed to naturally handle sparse matrices as the inference is done only based on the non-missing entries. In addition to finding co-cluster structure in observations, the model outputs a low dimensional co-embedding, and accurately predicts missing values in the original matrix. We demonstrate the efficacy of the model through experiments on both simulated and real data.Shan, Hanhuai; Banerjee, Arindam. (2008). Bayesian Co-clustering. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215765

    Online Quadratically Constrained Convex Optimization with Applications to Risk Adjusted Portfolio Selection

    No full text
    While online convex optimization has emerged as a powerful large scale optimization approach, much of existing literature assumes a simple way to project onto a given feasible set. The assumption is often not true, and the projection step usually becomes the key computational bottleneck. Motivated by applications in risk-adjusted portfolio selection, in this paper we consider online quadratically constrained convex optimization problems, where the feasible set involves intersections of ellipsoids. We show that regret guarantees for the online problem can be achieved by solving a suitable quadratically constrained quadratic program (QCQP) at each step, and present an efficient algorithm for solving QCQPs based on the alternating directions method. We then specialize the general framework to risk adjusted portfolio selection. Through extensive experiments on two real world stock datasets, our proposed algorithm RAMP is shown to significantly outperform existing approaches at any given risk level and match the performance of the best heuristics which do not accommodate risk constraints.Das, Puja; Banerjee, Arindam. (2012). Online Quadratically Constrained Convex Optimization with Applications to Risk Adjusted Portfolio Selection. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215886

    Meta Algorithms for Portfolio Selection

    No full text
    We consider the problem of sequential portfolio selection in the stock market. There are theoretically well grounded algorithms for the problem, such as Universal Portfolio (UP), Exponentiated Gradient (EG) and Online Newton Step (ONS). Such algorithms enjoy the property of being universal, i.e., having low regret with the best constant rebalanced portfolio. However, the practical performance of such popular algorithms is sobering compared to heuristics such as Anticor, which have no theoretical guarantees but perform surprisingly well in practice. Motivated by such discrepancies, in this paper we focus on designing meta algorithms for portfolio selection which can leverage the best of both worlds. Such algorithms work with a pool of base algorithms and use online learning to redistribute wealth among the base algorithms. We develop two meta-algorithms: MA_EG which uses online gradient descent following EG and MA_ONS which uses online Newton step following ONS. If one of the base algorithms is universal, it follows that the meta-algorithm is universal. Through extensive experiments on two real stock market datasets, we show that the meta-algorithms are competitive and often better than the best base algorithm, including heuristics, while maintaining the guarantee of being an universal algorithm.Das, Puja; Banerjee, Arindam. (2010). Meta Algorithms for Portfolio Selection. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215840

    A Unified View of Graph-based Semi-Supervised Learning: Label Propagation, Graph-Cuts, and Embeddings

    No full text
    Recent years have seen a growing number of graph-based semi-supervised learning methods. While the literature currently contains several of these methods, their relationships with one another and with other graph-based data analysis algorithms remain unclear. In this paper, we present a unified view of graph-based semi-supervised learning. Our framework unifies three important and seemingly unrelated approaches to semi-supervised learning, viz label propagation, graph cuts and manifold embeddings. We show that most existing label propagation methods solve a special case of a generalized label propagation (GLP) formulation which is a constrained quadratic program involving a graph Laplacian. Different methods arise simply based on the choice of the Laplacian and the nature of the constraints. Further, we show that semi-supervised graph-cut problems can also be viewed and solved as special cases of the GLP formulation. In addition, we show that semi-supervised non-linear manifold embedding methods also solve variants of the GLP problem and propose a novel family of semi-supervised algorithms based on existing embedding methods. Finally, we present comprehensive empirical performance evaluation of the existing label propagation methods as well as the new ones derived from manifold embedding. The new family of embedding based label propagation methods are found to be competitive on several datasets.Agovic, Amrudin; Banerjee, Arindam. (2009). A Unified View of Graph-based Semi-Supervised Learning: Label Propagation, Graph-Cuts, and Embeddings. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215799

    Aggregation-Induced Modulation of the Optoelectronic Properties of Carbon Dots and Removal of Cd2+ Ions with Sustainable Use in Photocurrent Generation

    No full text
    Herein, we report the room-temperature fabrication of carbon dots (C-dots) using UV light irradiation within 15 min. The formation pathway has been investigated by time-dependent mass spectroscopic and transmission electron microscopic (TEM) studies. The inherent insolubility of carbon dots in organic solvents has been overcome by employing the phase-transfer strategy, and the C-dots have been found to remain in aggregated state in organic solvents. Interestingly, aggregation leads to the generation of a considerable amount of current conductivity in carbon dots, as is evident by current-voltage (I-V) measurements. This is not attainable by carbon dots in aqueous solution. However, the aggregates of C-dots do not show any photocurrent response. The phase transfer of C-dots has also been utilized to remove the toxic Cd2+ions from contaminated water by dragging them into the organic solvents with carbon dots. The C-dots-metal hybrid obtained by the removal of Cd2+with carbon dots in organic medium shows a very good photoresponsive nature with significant photogain. This suggests proper utilization of the removed (from polluted water) cadmium ions in a sustainable manner for photocurrent generation

    Stock Portfolio Selection Using Two-tiered Lazy Updates

    Full text link
    Adviser: Dr. Arindam BanerjeePeople make and lose vast sums of money every day on stock exchanges around the world. This research focused on developing a computer algorithm to build profitable portfolios, while taking into account transaction costs associated with trading stocks. The theory behind our algorithm is based on a subset of Machine Learning called Online Learning. Online Learning makes updated decisions as new information is provided. For our case, a new decision is made each day on what stocks to buy/sell based on transaction costs and the previous day’s stock performance. The Lazy Update part of our algorithm seeks to minimize the quantity of trading, since this leads to transaction costs being incurred. Our algorithm builds on prior work and dynamically learns which sectors to invest in and takes into account risk, which has not been considered before in the literature. Our Online Lazy Updates algorithm runs at a low level on choosing stocks within a sector, and at a high level on choosing the best sectors to invest in. We successfully establish our ability to be profitable with transaction costs on real-world datasets.This research was supported by the Undergraduate Research Opportunities Program (UROP).Cook, Alexander; Johnson, Nicholas; Banerjee, Arindam. (2014). Stock Portfolio Selection Using Two-tiered Lazy Updates. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/163162

    Common Component Analysis for Multiple Covariance Matrices

    No full text
    We consider the problem of finding a suitable common low-dimensional subspace for accurately representing a given set of covariance matrices. When the set contains only one covariance matrix, the subspace is given by Principal Component Analysis (PCA). For multiple covariance matrices, we term the problem Common Component Analysis (CCA). While CCA can be posed as a tensor decomposition problem, standard approaches to tensor decomposition have two critical issues: (i) Tensor decomposition methods are iterative and rely on the initialization. A bad initialization may lead to poor local optima; (ii) For a given level of approximation error, one does not know how to choose a suitable low dimensionality. In this paper, we present a detailed analysis of CCA which yields an effective initialization and iterative algorithms for the problem. The proposed methodology has provable approximation guarantees w.r.t. the global optimum, and also allows one to choose the dimensionality for a given level of approximation error. We also establish conditions under which the methodology will obtain the global optimum. We illustrate the effectiveness of the proposed method through extensive experiments on synthetic data as well as two real stock market datasets, where major financial events can be visualized in low dimensions.Wang, Huahua; Banerjee, Arindam; Boley, Daniel. (2010). Common Component Analysis for Multiple Covariance Matrices. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215835

    Bayesian Cluster Ensembles

    No full text
    Cluster ensembles provide a framework for combining multiple base clusterings of a dataset to generate a stable and robust consensus clustering. There are important variants of the basic cluster ensemble problem, notably including cluster ensembles with missing values, as well as row-distributed or column-distributed cluster ensembles. Existing cluster ensemble algorithms are applicable only to a small subset of these variants. In this paper, we propose Bayesian Cluster Ensembles (BCE), which is a mixed-membership model for learning cluster ensembles, and is applicable to all the primary variants of the problem. We propose two methods, respectively based on variational approximation and Gibbs sampling, for learning a Bayesian cluster ensemble. We compare BCE extensively with several other cluster ensemble algorithms, and demonstrate that BCE is not only versatile in terms of its applicability, it mostly outperforms the other algorithms in terms of stability and accuracy.Wang, Hongjun; Shan, Hanhuai; Banerjee, Arindam. (2008). Bayesian Cluster Ensembles. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215771

    Anomaly Detection for Discrete Sequences: A Survey

    No full text
    This survey attempts to provide a comprehensive and structured overview of the existing research for the problem of detecting anomalies in discrete sequences. The aim is to provide a global understanding of the sequence anomaly detection problem and how techniques proposed for different domains relate to each other. Our specific contributions are as follows: We identify three distinct formulations of the anomaly detection problem, and review techniques from many disparate and disconnected domains that address each of these formulations. Within each problem formulation, we group techniques into categories based on the nature of the underlying algorithm. For each category, we provide a basic anomaly detection technique, and show how the existing techniques are variants of the basic technique. This approach shows how different techniques within a category are related or different from each other. Our categorization reveals new variants and combinations that have not been investigated before for anomaly detection. We also provide a discussion of relative strengths and weaknesses of different techniques. We show how techniques developed for one problem formulation can be adapted to solve a different formulation; thereby providing several novel adaptations to solve the different problem formulations. We highlight the applicability of the techniques that handle discrete sequences to other related areas such as online anomaly detection and time series anomaly detection.Chandola, Varun; Banerjee, Arindam; Kumar, Vipin. (2009). Anomaly Detection for Discrete Sequences: A Survey. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215802
    corecore