1,721,020 research outputs found
Single-cell generalized trend model (scGTM): a flexible and interpretable model of gene expression trend along cell pseudotime.
MotivationModeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. Most existing methods rely on nonparametric regression models for their flexibility; however, nonparametric models often provide trends too complex to interpret. Other existing methods use interpretable but restrictive models. Since model interpretability and flexibility are both indispensable for understanding biological processes, the single-cell field needs a model that improves the interpretability and largely maintains the flexibility of nonparametric regression models.ResultsHere, we propose the single-cell generalized trend model (scGTM) for capturing a gene's expression trend, which may be monotone, hill-shaped or valley-shaped, along cell pseudotime. The scGTM has three advantages: (i) it can capture non-monotonic trends that are easy to interpret, (ii) its parameters are biologically interpretable and trend informative, and (iii) it can flexibly accommodate common distributions for modeling gene expression counts. To tackle the complex optimization problems, we use the particle swarm optimization algorithm to find the constrained maximum likelihood estimates for the scGTM parameters. As an application, we analyze several single-cell gene expression datasets using the scGTM and show that scGTM can capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying biological processes.Availability and implementationThe Python package scGTM is open-access and available at https://github.com/ElvisCuiHan/scGTM.Supplementary informationSupplementary data are available at Bioinformatics online
PyWGCNA: a Python package for weighted gene co-expression network analysis
MOTIVATION: Weighted gene co-expression network analysis (WGCNA) is frequently used to identify modules of genes that are co-expressed across many RNA-seq samples. However, the current R implementation is slow, is not designed to compare modules between multiple WGCNA networks, and its results can be hard to interpret as well as to visualize. We introduce the PyWGCNA Python package, which is designed to identify co-expression modules from large RNA-seq datasets. PyWGCNA has a faster implementation than the R version of WGCNA and several additional downstream analysis modules for functional enrichment analysis using GO, KEGG, and REACTOME, inter-module analysis of protein-protein interactions, as well as comparison of multiple co-expression modules to each other and/or external lists of genes such as marker genes from single cell.
RESULTS: We apply PyWGCNA to two distinct datasets of brain bulk RNA-seq from MODEL-AD to identify modules associated with the genotypes. We compare the resulting modules to each other to find shared co-expression signatures in the form of modules with significant overlap across the datasets.
AVAILABILITY AND IMPLEMENTATION: The PyWGCNA library for Python 3 is available on PyPi at pypi.org/project/PyWGCNA and on GitHub at github.com/mortazavilab/PyWGCNA. The data underlying this article are available in GitHub at github.com/mortazavilab/PyWGCNA/tutorials/5xFAD_paper
dsRID: in silico identification of dsRNA regions using long-read RNA-seq data
MOTIVATION: Double-stranded RNAs (dsRNAs) are potent triggers of innate immune responses upon recognition by cytosolic dsRNA sensor proteins. Identification of endogenous dsRNAs helps to better understand the dsRNAome and its relevance to innate immunity related to human diseases.
RESULTS: Here, we report dsRID (double-stranded RNA identifier), a machine-learning-based method to predict dsRNA regions in silico, leveraging the power of long-read RNA-sequencing (RNA-seq) and molecular traits of dsRNAs. Using models trained with PacBio long-read RNA-seq data derived from Alzheimer's disease (AD) brain, we show that our approach is highly accurate in predicting dsRNA regions in multiple datasets. Applied to an AD cohort sequenced by the ENCODE consortium, we characterize the global dsRNA profile with potentially distinct expression patterns between AD and controls. Together, we show that dsRID provides an effective approach to capture global dsRNA profiles using long-read RNA-seq data.
AVAILABILITY AND IMPLEMENTATION: Software implementation of dsRID, and genomic coordinates of regions predicted by dsRID in all samples are available at the GitHub repository: https://github.com/gxiaolab/dsRID
Virtual Tissue Expression Analysis
Abstract Motivation Bulk RNA expression data is widely accessible, whereas single-cell data is relatively scarce in comparison. However, single-cell data offers profound insights into the cellular composition of tissues and cell type-specific gene regulation, both of which remain hidden in bulk expression analysis. Results Here, we present tissueResolver, an algorithm designed to extract single-cell information from bulk data, enabling us to attribute expression changes to individual cell types. When validated on simulated data tissueResolver outperforms competing methods. Additionally, our study demonstrates that tissueResolver reveals cell type-specific regulatory distinctions between the activated B-cell-like (ABC) and germinal center B-cell-like (GCB) subtypes of diffuse large B-cell lymphomas (DLBCL). Availability and Implementation R package available at https://github.com/spang-lab/tissueResolver. Code for reproducing the results of this paper is available at https://github.com/spang-lab/tissueResolver-docs1. Supplementary material Supplementary material and additional analyses available online
Scywalker : scalable end-to-end data analysis workflow for long-read single-cell transcriptome sequencing
Motivation: Existing nanopore single-cell data analysis tools showed severe limitations in handling current data sizes.
Results: We introduce scywalker, an innovative and scalable package developed to comprehensively analyze long-read sequencing data of full-length single-cell or single-nuclei cDNA. We developed novel scalable methods for cell barcode demultiplexing and single-cell isoform calling and quantification and incorporated these in an easily deployable package. Scywalker streamlines the entire analysis process, from sequenced fragments in FASTQ format to demultiplexed pseudobulk isoform counts, into a single command suitable for execution on either server or cluster. Scywalker includes data quality control, cell type identification, and an interactive report. Assessment of datasets from the human brain, Arabidopsis leaves, and previously benchmarked data from mixed cell lines demonstrate excellent correlation with short-read analyses at both the cell-barcoding and gene quantification levels. At the isoform level, we show that scywalker facilitates the direct identification of cell-type-specific expression of novel isoforms.
Availability and implementation: Scywalker is available on github.com/derijkp/scywalker under the GNU General Public License (GPL) and at https://zenodo.org/records/13359438/files/scywalker-0.108.0-Linux-x86_64.tar.gz
SpaCell: integrating tissue morphology and spatial gene expression to predict disease cells
Spatial transcriptomics technology is increasingly being applied because it enables the measurement of spatial gene expression in an intact tissue along with imaging morphology of the same tissue. However, current analysis methods for spatial transcriptomics data do not use image pixel information, thus missing the quantitative links between gene expression and tissue morphology.We developed a user-friendly deep learning software, SpaCell, to integrate millions of pixel intensity values with thousands of gene expression measurements from spatially-barcoded spots in a tissue. We show the integration approach outperforms the use of gene-count data alone or imaging data alone to build deep learning models to identify cell types or predict labels of tissue images with high resolution and accuracy.The SpaCell package is open source under a MIT license and it is available at https://github.com/BiomedicalMachineLearning/SpaCell.Supplementary data are available at Bioinformatics online
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Uncovering key transcription factors in breast cancer subtypes using matrix factorization
Breast cancer is the most common cancer type in women, and response to treatment varies immensely between subtypes. As of today, patients with Basal- like breast cancer lacks targeted treatment, which leads to poor prognosis for this group. Also other subtypes could benefit from a more targeted treatment. The molecular characteristics of each subtype remains an active area of research, and transcription factors that drive the subtypes need to be investigated in order to provide potential targets for more effective treatments. The molecular characteristics of each breast cancer subtype were inferred from ATAC-seq and RNA-seq data from 70 breast cancer patients, using two different matrix factorization methods. The first analysis used non-negative matrix factorization (NMF) on two separate data sets: One for ATAC-seq data, and one for RNA-seq data. The samples were clustered into five groups, based on molecular patterns shared within the groups, for both data sets. The DNA regions that were specifically open for each group were investigated for enriched transcription factor binding sites. The same was done for the promoter regions of the genes that were highly expressed in each group. The Basal-like subtype achieved the most successful clustering, and transcription factors likely to drive this subtype were uncovered. Also transcription factors responsible for driving a collective group of estrogen positive (ER+) subtypes were uncovered. The second analysis used Multi-Omics Factor Analysis (MOFA) to integrate the ATAC-seq and RNA-seq data in one combined analysis. The main purpose of this analysis was to support the findings of the first analysis, and possibly improve the clustering. The integration of multi- omics data resulted in two clusters, separating the Basal-like subtype from the rest of the subtypes. The clustering was not improved. However, some of the key transcription factors found for each group supported the results of the NMF analysis
- …
