Search CORE

1,721,090 research outputs found

Han, Buhm

Author: Han Buhm
Publication venue
Publication date: 17/03/2016
Field of study

UIN (Universitas Islam Negeri) Sunan Kalijaga, Yogyakarta: E-Journal Lembaga Penelitian dan Pengembangan Masyarakat

Pseudobulk with proper offsets has the same statistical properties as generalized linear mixed models in single-cell case-control studies

Author: Lee Hanbin
Han Buhm
Publication venue
Publication date: 2024
Field of study

Motivation Generalized linear mixed models (GLMMs), such as the negative-binomial or Poisson linear mixed model, are widely applied to single-cell RNA sequencing data to compare transcript expression between different conditions determined at the subject level. However, the model is computationally intensive, and its relative statistical performance to pseudobulk approaches is poorly understood. Results We propose offset-pseudobulk as a lightweight alternative to GLMMs. We prove that a count-based pseudobulk equipped with a proper offset variable has the same statistical properties as GLMMs in terms of both point estimates and standard errors. We confirm our findings using simulations based on real data. Offset-pseudobulk is substantially faster (>x10) and numerically more stable than GLMMs.Y

SNU Open Repository and Archive

A theory-based practical solution to correct for sex-differential participation bias

Author: Lee Hanbin
Han Buhm
Publication venue
Publication date: 27/06/2022
Field of study

Most genomic cohorts are retrospective where the exposures and outcomes are predetermined prior to sample collection. Therefore, a spurious association between an exposure and an outcome can arise if both variables affect study participation. Such concerns were raised in previous studies questioning the representativeness of the UK Biobank. Recently, a genome-wide association study (GWAS) on biological sex found many autosomal hits and non-negligible autosomal heritability which the authors attribute to selection bias. In this study, we propose a simple and a practical method that can overcome sex-driven selection bias based on theoretical analysis and simulations. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-022-02703-0

SNU Open Repository and Archive

PubMed Central

Exploration of errors in variance caused by using the first-order approximation in Mendelian randomization

Author: Kim Hakin
Han Buhm
Kim Kunhee
Publication venue
Publication date: 2022
Field of study

Mendelian randomization (MR) uses genetic variation as a natural experiment to investigate the causal effects of modifiable risk factors (exposures) on outcomes. Two-sample Mendelian randomization (2SMR) is widely used to measure causal effects between exposures and outcomes via genome-wide association studies. 2SMR can increase statistical power by utilizing summary statistics from large consortia such as the UK Biobank. However, the first-order term approximation of standard error is commonly used when applying 2SMR. This approximation can underestimate the variance of causal effects in MR, which can lead to an increased false-positive rate. An alternative is to use the second-order approximation of the standard error, which can considerably correct for the deviation of the first-order approximation. In this study, we simulated MR to show the degree to which the first-order approximation underestimates the variance. We show that depending on the specific situation, the first-order approximation can underestimate the variance almost by half when compared to the true variance, whereas the second-order approximation is robust and accurate.Y

SNU Open Repository and Archive

Hap-seq: An Optimal Algorithm for Haplotype Phasing with Imputation Using Sequencing Data

Author: He Dan
Han Buhm
Eskin Eleazar
Publication venue
Publication date: 2013
Field of study

Inference of haplotypes, or the sequence of alleles along each chromosome, is a fundamental problem in genetics and is important for many analyses, including admixture mapping, identifying regions of identity by descent, and imputation. Traditionally, haplotypes are inferred from genotype data obtained from microarrays using information on population haplotype frequencies inferred from either a large sample of genotyped individuals or a reference dataset such as the HapMap. Since the availability of large reference datasets, modern approaches for haplotype phasing along these lines are closely related to imputation methods. When applied to data obtained from sequencing studies, a straightforward way to obtain haplotypes is to first infer genotypes from the sequence data and then apply an imputation method. However, this approach does not take into account that alleles on the same sequence read originate from the same chromosome. Haplotype assembly approaches take advantage of this insight and predict haplotypes by assigning the reads to chromosomes in such a way that minimizes the number of conflicts between the reads and the predicted haplotypes. Unfortunately, assembly approaches require very high sequencing coverage and are usually not able to fully reconstruct the haplotypes. In this work, we present a novel approach, Hap-seq, which is simultaneously an imputation and assembly method that combines information from a reference dataset with the information from the reads using a likelihood framework. Our method applies a dynamic programming algorithm to identify the predicted haplotype, which maximizes the joint likelihood of the haplotype with respect to the reference dataset and the haplotype with respect to the observed reads. We show that our method requires only low sequencing coverage and can reconstruct haplotypes containing both common and rare alleles with higher accuracy compared to the state-of-the-art imputation methods.Y

Crossref

SNU Open Repository and Archive

Analysis of differences in human leukocyte antigen between the two wellcome trust case control consortium control datasets

Author: Han Buhm
Jang Chloe Soohyun
Choi Wanson
Cook Seungho
Publication venue
Publication date: 2019
Field of study

© 2019, Korea Genome Organization.The Wellcome Trust Case Control Consortium (WTCCC) study was a large genome-wide association study that aimed to identify common variants associated with seven diseases. That study combined two control datasets (58C and UK Blood Services) as shared controls. Prior to using the combined controls, the WTCCC performed analyses to show that the genomic content of the control datasets was not significantly different. Recently, the analysis of human leukocyte antigen (HLA) genes has become prevalent due to the development of HLA imputation technology. In this project, we extended the between-control homogeneity analysis of the WTCCC to HLA. We imputed HLA information in the WTCCC control dataset and showed that the HLA content was not significantly different between the two control datasets, suggesting that the combined controls can be used as controls for HLA fine-map-ping analysis based on HLA imputation.Y

SNU Open Repository and Archive

Structural Alignment Of Pseudoknotted Rna

Author: Zhang Shaojie
Han Buhm
Dost Banu
Bafna Vineet
Publication venue
Publication date: 2008
Field of study

In this paper, we address the problem of discovering novel non-coding RNA (ncRNA) using primary sequence, and secondary structure conservation, focusing on ncRNA families with pseudoknotted structures. Our main technical result is an efficient algorithm for computing an optimum structural alignment of an RNA sequence against a genomic substring. This algorithm has two applications. First, by scanning a genome, we can identify novel (homologous) pseudoknotted ncRNA, and second, we can infer the secondary structure of the target aligned sequence. We test an implementation of our algorithm (PAL) and show that it has near-perfect behavior for predicting the structure of many known pseudoknots. Additionally, it can detect the true homologs with high sensitivity and specificity in controlled tests. We also use PAL to search entire viral genome and mouse genome for novel homologs of some viral and eukaryotic pseudoknots, respectively. In each case, we have found strong support for novel homologs. © Mary Ann Liebert, Inc. 2008

SNU Open Repository and Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

MicroPredict: predicting species-level taxonomic abundance of whole-shotgun metagenomic data using only 16S amplicon sequencing data

Author: Kim Hakin
Kim Donghyun
Han Buhm
Jang Chloe Soohyun
Publication venue
Publication date: 2024
Field of study

Background The importance of the human microbiome in the analysis of various diseases is emerging. The two main methods used to profile the human microbiome are 16S rRNA gene sequencing (16S sequencing) and whole-genome shotgun sequencing (WGS). Owing to the full coverage of the genome in sequencing, WGS has multiple advantages over 16S sequencing, including higher taxonomic profiling resolution at the species-level and functional profiling analysis. However, 16S sequencing remains widely used because of its relatively low cost. Although WGS is the standard method for obtaining accurate species-level data, we found that 16S sequencing data contained rich information to predict high-resolution species-level abundances with reasonable accuracy.Objective In this study, we proposed MicroPredict, a method for accurately predicting WGS-comparable species-level abundance data using 16S taxonomic profile data.Methods We employed a mixed model using two key strategies: (1) modeling both sample- and species-specific information for predicting WGS abundances, and (2) accounting for the possible correlations among different species.Results We found that MicroPredict outperformed the other machine learning methods.Conclusion We expect that our approach will help researchers accurately approximate the species-level abundances of microbiome profiles in datasets for which only cost-effective 16S sequencing has been applied.Y

SNU Open Repository and Archive

IPED: Inheritance Path-based Pedigree Reconstruction Algorithm Using Genotype Data

Author: He Dan
Parida Laxmi
Han Buhm
Wang Zhanyong
Eskin Eleazar
Publication venue
Publication date: 01/10/2013
Field of study

The problem of inference of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. Various methods have been proposed to automate the process of pedigree reconstruction given the genotypes or haplotypes of a set of individuals. Current methods, unfortunately, are very time-consuming and inaccurate for complicated pedigrees, such as pedigrees with inbreeding. In this work, we propose an efficient algorithm that is able to reconstruct large pedigrees with reasonable accuracy. Our algorithm reconstructs the pedigrees generation by generation, backward in time from the extant generation. We predict the relationships between individuals in the same generation using an inheritance path-based approach implemented with an efficient dynamic programming algorithm. Experiments show that our algorithm runs in linear time with respect to the number of reconstructed generations, and therefore, it can reconstruct pedigrees that have a large number of generations. Indeed it is the first practical method for reconstruction of large pedigrees from genotype data

Crossref

SNU Open Repository and Archive

eScholarship - University of California

PASTRY: achieving balanced power for detecting risk and protective minor alleles in meta-analysis of association studies with overlapping subjects

Author: Kim Hakin
Han Buhm
Jang Chloe Soohyun
Kim Emma E.
Publication venue
Publication date: 2024
Field of study

Background Meta-analysis is a statistical method that combines the results of multiple studies to increase statistical power. When multiple studies participating in a meta-analysis utilize the same public dataset as controls, the summary statistics from these studies become correlated. To solve this challenge, Lin and Sullivan proposed a method to provide an optimal test statistic adjusted for the correlation. This method quickly became the standard practice. However, we identified an unexpected power asymmetry phenomenon in this standard framework. This can lead to unbalanced power for detecting protective minor alleles and risk minor alleles. Results We found that the power asymmetry of the current framework is mainly due to the errors in approximating the correlation term. We then developed a meta-analysis method based on an accurate correlation estimator, called PASTRY (A method to avoid Power ASymmeTRY). PASTRY outperformed the standard method on both simulated and real datasets in terms of the power symmetry. Conclusions Our findings suggest that PASTRY can help to alleviate the power asymmetry problem. PASTRY is available at https://github.com/hanlab-SNU/PASTRY.This work was supported by the National Research Foundation of Korea (NRF) (Grant number 2022R1A2B5B02001897) funded by the Korean government, Ministry of Science, and ICT. This work was also supported by the Creative-Pioneering Researchers Program funded by Seoul National University (SNU). This study was supported by the BK21 FOUR Biomedical Science Program at Seoul National University (SNU)

SNU Open Repository and Archive