Search CORE

1,721,408 research outputs found

PARMETIS: Parallel Graph Partitioning and Sparse Matrix Ordering Library

Author: Karypis George
Schloegel Kirk
Kumar Vipin
Publication venue
Publication date: 01/01/1997
Field of study

Karypis, George; Schloegel, Kirk; Kumar, Vipin. (1997). PARMETIS: Parallel Graph Partitioning and Sparse Matrix Ordering Library. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215345

University of Minnesota Digital Conservancy

Language and Library Support for Climate Data Applications

Author: Steinbach Michael
Choudhary Alok
Boriah Shyam
Van Wyk Eric
Kumar Vipin
Publication venue
Publication date: 01/01/2009
Field of study

Associated research group: Minnesota Extensible Language ToolsVan Wyk, Eric; Kumar, Vipin; Steinbach, Michael; Boriah, Shyam; Choudhary, Alok. (2009). Language and Library Support for Climate Data Applications. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/217360

University of Minnesota Digital Conservancy

METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices

Author: Karypis George
Kumar Vipin
Publication venue
Publication date: 01/01/1997
Field of study

Metis is copyrighted by the regents of the University of Minnesota. This work was supponed by IST/BMDO through Army Research Office contract DA/DAAH04-93-G-0080. and by Army High Performance Computing Research Center under the auspices of the Department of the Army. Anny Research Laboratory cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of lhe government, and no official endorsement should be inferred. Access to computing facilities were provided by Minnesota Supercomputer Institute, Cray Research Inc, and by the Pittsburgh Supercomputing Center.Karypis, George; Kumar, Vipin. (1997). METIS: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215346

University of Minnesota Digital Conservancy

A Computationally Efficient and Statistically Powerful Framework for Searching High-order Epistasis with Systematic Pruning and Gene-set Constraints

Author: Steinbach Michael
Haznadar Majda
Wang Wen
Fang Gang
Van Ness Brian
Kumar Vipin
Publication venue
Publication date: 21/06/2010
Field of study

This paper has not yet been submitted.Fang, Gang; Haznadar, Majda; Wang, Wen; Steinbach, Michael; Van Ness, Brian; Kumar, Vipin. (2010). A Computationally Efficient and Statistically Powerful Framework for Searching High-order Epistasis with Systematic Pruning and Gene-set Constraints. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215831

University of Minnesota Digital Conservancy

Similarity Measures for Categorical Data--A Comparative Study

Author: Chandola Varun
Boriah Shyam
Kumar Vipin
Publication venue
Publication date: 15/10/2007
Field of study

Measuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. The notion of similarity for continuous data is relatively well-understood, but for categorical data, the similarity computation is not straightforward. Several data-driven similarity measures have been proposed in the literature to compute the similarity between two categorical data instances but their relative performance has not been evaluated. In this paper we study the performance of a variety of similarity measures in the context of a specific data mining task: outlier detection. Results on a variety of data sets show that while no one measure dominates others for all types of problems, some measures are able to have consistently high performance.Chandola, Varun; Boriah, Shyam; Kumar, Vipin. (2007). Similarity Measures for Categorical Data--A Comparative Study. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215736

University of Minnesota Digital Conservancy

Design of Scalable Parallel Classification Algorithms for Mining Large Datasets

Author: Karypis George
Joshi Mahesh
Kumar Vipin
Publication venue
Publication date: 01/01/1998
Field of study

In this paper, we present ScalParC (Scalable Parallel Classifier), a new parallel formulation of a decision tree based classification process. Like other state-of-the-art decision tree classifiers such as SPRINT, ScalParC is suited for handling large datasets. We show that existing parallel formulation of SPRINT is unscalable, whereas ScalParC is shown to be scalable in both runtime and memory requirements. We present the experimental results of classifying up to 6.4 million records on up to 128 processors of Cray T3D, in order to demonstrate the scalable behavior of ScalParC. A key component of ScalParC is the parallel hash table. The proposed parallel hashing paradigm can be used to parallelize other algorithms that require many concurrent updates to a large hash table.Joshi, Mahesh; Karypis, George; Kumar, Vipin. (1998). Design of Scalable Parallel Classification Algorithms for Mining Large Datasets. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215372

University of Minnesota Digital Conservancy

Min-Apriori: An Algorithm for Finding Association Rules in Data with Continuous Attributes

Author: Han Eui-Hong
Karypis George
Kumar Vipin
Publication venue
Publication date: 01/01/1997
Field of study

This work was supported by NSF ASC-9634719, by Army Research Office contract DNDAAH04-95-1-0538, by Army High Performance Computing Research Center cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Additional support was provided by the IBM Partnership Award, and by the IBM SUR equipment grant. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute.Han, Eui-Hong; Karypis, George; Kumar, Vipin. (1997). Min-Apriori: An Algorithm for Finding Association Rules in Data with Continuous Attributes. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215354

University of Minnesota Digital Conservancy

Summarization - Compressing Data into an Informative Representation Report

Author: Chandola Varun
Kumar Vipin
Publication venue
Publication date: 08/06/2005
Field of study

Summarization is an important problem in many domains involving large datasets. Summarization can be essentially viewed as transformation of data into a concise yet meaningful representation which could be used for efficient storage or manual inspection. In this paper, we formulate the problem of summarization of a large dataset of transactions as an optimization problem involving two objective functions - compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We propose data mining techniques to obtain a summary for a given set of transactions while optimizing these two objective functions. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a meaningful representation. We first present a direct application of a standard clustering scheme to generate summaries. We then show how this could be significantly improved by using a multi-step approach which involves generating candidate summaries for a dataset using association analysis and then choosing a subset of these candidates as the summary with the desired compaction and information content. We present results of experiments conducted on real and artificial datasets to demonstrate the effectiveness of our techniques.Chandola, Varun; Kumar, Vipin. (2005). Summarization - Compressing Data into an Informative Representation Report. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215665

University of Minnesota Digital Conservancy

Supplement for "Contextual Time Series Change Detection"

Author: Chen Xi
Chatterjee Snigdhansu
Boriah Shyam
Kumar Vipin
Steinhaeuser Karsten
Publication venue
Publication date: 25/01/2013
Field of study

Time series data are common in a variety of fields ranging from economics to medicine and manufacturing. As a result, time series analysis and modeling has become an active research area in statistics and data mining. In this paper, we focus on a type of change we call contextual time series change (CTC) and propose a novel two-stage algorithm to address it. In contrast to traditional change detection methods, which consider each time series separately, CTC is defined as a change relative to the behavior of a group of related time series. As a result, our proposed method is able to identify novel types of changes not found by other algorithms. We demonstrate the unique capabilities of our approach with several case studies on real-world datasets from the financial and Earth science domains.Chen, Xi; Steinhaeuser, Karsten; Boriah, Shyam; Chatterjee, Snigdhansu; Kumar, Vipin. (2013). Supplement for "Contextual Time Series Change Detection". Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215905

University of Minnesota Digital Conservancy

Characterizing Pattern based Clustering

Author: Ruslim Arifin
Steinbach Michael
Xiong Hui
Kumar Vipin
Publication venue
Publication date: 19/04/2005
Field of study

Recently, there has been considerable interest in using association patterns for clustering. Although several interesting algorithms have been developed, further investigation is needed to characterize (1) the benefits of using association patterns and (2) the most effective way of using them for clustering. To that end, we present a new clustering technique, bisecting K-means Clustering with pAttern Preservation (K-CAP), which exploits key properties of the hyperclique association pattern and bisecting k-means. Experimental results on document data show that, in terms of entropy, K-CAP can perform substantially better than the standard bisecting k-means algorithm when data sets contain clusters of widely different sizes--the typical situation. Furthermore, because hyperclique patterns can be found much more efficiently than other types of association patterns, K-CAP retains the appealing computational efficiency of bisecting k-means.Xiong, Hui; Steinbach, Michael; Ruslim, Arifin; Kumar, Vipin. (2005). Characterizing Pattern based Clustering. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/215656

University of Minnesota Digital Conservancy