1,721,408 research outputs found

    PARMETIS: Parallel Graph Partitioning and Sparse Matrix Ordering Library

    No full text
    Karypis, George; Schloegel, Kirk; Kumar, Vipin. (1997). PARMETIS: Parallel Graph Partitioning and Sparse Matrix Ordering Library. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215345

    Language and Library Support for Climate Data Applications

    No full text
    Associated research group: Minnesota Extensible Language ToolsVan Wyk, Eric; Kumar, Vipin; Steinbach, Michael; Boriah, Shyam; Choudhary, Alok. (2009). Language and Library Support for Climate Data Applications. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/217360

    METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices

    No full text
    Metis is copyrighted by the regents of the University of Minnesota. This work was supponed by IST/BMDO through Army Research Office contract DA/DAAH04-93-G-0080. and by Army High Performance Computing Research Center under the auspices of the Department of the Army. Anny Research Laboratory cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of lhe government, and no official endorsement should be inferred. Access to computing facilities were provided by Minnesota Supercomputer Institute, Cray Research Inc, and by the Pittsburgh Supercomputing Center.Karypis, George; Kumar, Vipin. (1997). METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215346

    A Computationally Efficient and Statistically Powerful Framework for Searching High-order Epistasis with Systematic Pruning and Gene-set Constraints

    No full text
    This paper has not yet been submitted.Fang, Gang; Haznadar, Majda; Wang, Wen; Steinbach, Michael; Van Ness, Brian; Kumar, Vipin. (2010). A Computationally Efficient and Statistically Powerful Framework for Searching High-order Epistasis with Systematic Pruning and Gene-set Constraints. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215831

    Similarity Measures for Categorical Data--A Comparative Study

    No full text
    Measuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. The notion of similarity for continuous data is relatively well-understood, but for categorical data, the similarity computation is not straightforward. Several data-driven similarity measures have been proposed in the literature to compute the similarity between two categorical data instances but their relative performance has not been evaluated. In this paper we study the performance of a variety of similarity measures in the context of a specific data mining task: outlier detection. Results on a variety of data sets show that while no one measure dominates others for all types of problems, some measures are able to have consistently high performance.Chandola, Varun; Boriah, Shyam; Kumar, Vipin. (2007). Similarity Measures for Categorical Data--A Comparative Study. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215736

    Design of Scalable Parallel Classification Algorithms for Mining Large Datasets

    No full text
    In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a decision tree based classification process. Like other state-of-the-art decision tree classifiers such as SPRINT, ScalParC is suited for handling large datasets. We show that existing parallel formulation of SPRINT is unscalable, whereas ScalParC is shown to be scalable in both runtime and memory requirements. We present the experimental results of classifying up to 6.4 million records on up to 128 processors of Cray T3D, in order to demonstrate the scalable behavior of ScalParC. A key component of ScalParC is the parallel hash table. The proposed parallel hashing paradigm can be used to parallelize other algorithms that require many concurrent updates to a large hash table.Joshi, Mahesh; Karypis, George; Kumar, Vipin. (1998). Design of Scalable Parallel Classification Algorithms for Mining Large Datasets. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215372

    Min-Apriori: An Algorithm for Finding Association Rules in Data with Continuous Attributes

    No full text
    This work was supported by NSF ASC-9634719, by Army Research Office contract DNDAAH04-95-1-0538, by Army High Performance Computing Research Center cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Additional support was provided by the IBM Partnership Award, and by the IBM SUR equipment grant. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute.Han, Eui-Hong; Karypis, George; Kumar, Vipin. (1997). Min-Apriori: An Algorithm for Finding Association Rules in Data with Continuous Attributes. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215354

    Summarization - Compressing Data into an Informative Representation Report

    No full text
    Summarization is an important problem in many domains involving large datasets. Summarization can be essentially viewed as transformation of data into a concise yet meaningful representation which could be used for efficient storage or manual inspection. In this paper, we formulate the problem of summarization of a large dataset of transactions as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We propose data mining techniques to obtain a summary for a given set of transactions while optimizing these two objective functions. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a meaningful representation. We first present a direct application of a standard clustering scheme to generate summaries. We then show how this could be significantly improved by using a multi-step approach which involves generating candidate summaries for a dataset using association analysis and then choosing a subset of these candidates as the summary with the desired compaction and information content. We present results of experiments conducted on real and artificial datasets to demonstrate the effectiveness of our techniques.Chandola, Varun; Kumar, Vipin. (2005). Summarization - Compressing Data into an Informative Representation Report. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215665

    Supplement for "Contextual Time Series Change Detection"

    No full text
    Time series data are common in a variety of fields ranging from economics to medicine and manufacturing. As a result, time series analysis and modeling has become an active research area in statistics and data mining. In this paper, we focus on a type of change we call contextual time series change (CTC) and propose a novel two-stage algorithm to address it. In contrast to traditional change detection methods, which consider each time series separately, CTC is defined as a change relative to the behavior of a group of related time series. As a result, our proposed method is able to identify novel types of changes not found by other algorithms. We demonstrate the unique capabilities of our approach with several case studies on real-world datasets from the financial and Earth science domains.Chen, Xi; Steinhaeuser, Karsten; Boriah, Shyam; Chatterjee, Snigdhansu; Kumar, Vipin. (2013). Supplement for "Contextual Time Series Change Detection". Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215905

    Characterizing Pattern based Clustering

    No full text
    Recently, there has been considerable interest in using association patterns for clustering. Although several interesting algorithms have been developed, further investigation is needed to characterize (1) the benefits of using association patterns and (2) the most effective way of using them for clustering. To that end, we present a new clustering technique, bisecting K-means Clustering with pAttern Preservation (K-CAP), which exploits key properties of the hyperclique association pattern and bisecting k-means. Experimental results on document data show that, in terms of entropy, K-CAP can perform substantially better than the standard bisecting k-means algorithm when data sets contain clusters of widely different sizes--the typical situation. Furthermore, because hyperclique patterns can be found much more efficiently than other types of association patterns, K-CAP retains the appealing computational efficiency of bisecting k-means.Xiong, Hui; Steinbach, Michael; Ruslim, Arifin; Kumar, Vipin. (2005). Characterizing Pattern based Clustering. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215656
    corecore