5,551 research outputs found

    A mutation profile for top-kappa patient search exploiting Gene-Ontology and orthogonal non-negative matrix factorization

    Full text link
    Motivation: As the quantity of genomic mutation data increases, the likelihood of finding patients with similar genomic profiles, for various disease inferences, increases. However, so does the difficulty in identifying them. Similarity search based on patient mutation profiles can solve various translational bioinformatics tasks, including prognostics and treatment efficacy predictions for better clinical decision making through large volume of data. However, this is a challenging problem due to heterogeneous and sparse characteristics of the mutation data as well as their high dimensionality. Results: To solve this problem we introduce a compact representation and search strategy based on Gene-Ontology and orthogonal non-negative matrix factorization. Statistical significance between the identified cancer subtypes and their clinical features are computed for validation; results show that our method can identify and characterize clinically meaningful tumor subtypes comparable or better in most datasets than the recently introduced Network-Based Stratification method while enabling real-time search. To the best of our knowledge, this is the first attempt to simultaneously characterize and represent somatic mutational data for efficient search purposes.11106Ysciescopu

    Fast triangular mesh approximation of surface data using wavelet coefficients

    No full text
    This paper proposes a new, fast, triangular mesh approximation method for the 3D visualization of surface data. Using spatio-frequency localization characteristics and directional information of wavelet coefficients, we determine local complexities of surface data and approximate the data to a proper triangular mesh. The proposed algorithm is quite simple, and the computational cost is low due to the direct use of wavelet coefficients for vertex removal. The computer simulation results for terrain data show that the proposed algorithm is excellent for fast 3D visualization

    A neural network for 500 word vocabulary word spotting using non-uniform units

    No full text
    We introduce acoustic sub-word units to neural networks for speaker-independent continuous speech recognition. The functions of segmenting input and detecting words are implemented with networks of simple structures. The non-uniform unit which we introduce in this research can model phoneme variations caused by co-articulation spread over several phonemes and between words. These units can be segmented by the network according to stationary and transition parts of speech without iteration or without considering all possible position shifts. A word lexicon can be trained by the network, which can effectively memorize all transcription variations in the training utterances of words. The results of speaker-independent word spotting of 520 words with TIMIT data are described. (C) 2000 Elsevier Science Ltd. All rights reserved

    PEBL: WEB PAGE CLASSIFICATION WITHOUT NEGATIVE EXAMPLES

    No full text
    Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of nonhomepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. This paper presents a framework, called Positive Example Based Learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called Mapping-Convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of "strong" negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples, the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.X1195sciescopu

    PRIVACY-PRESERVING SVM CLASSIFICATION

    No full text
    Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.X1147sciescopu
    corecore