Search CORE

1,721,020 research outputs found

Single-cell generalized trend model (scGTM): a flexible and interpretable model of gene expression trend along cell pseudotime.

Author: Weng Kee Wong
Elvis Han Cui
Wong Weng Kee
Jingyi Jessica Li
Li Jingyi Jessica
Cui Elvis Han
Dongyuan Song
Song Dongyuan
Publication venue
Publication date: 27/06/2022
Field of study

MotivationModeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. Most existing methods rely on nonparametric regression models for their flexibility; however, nonparametric models often provide trends too complex to interpret. Other existing methods use interpretable but restrictive models. Since model interpretability and flexibility are both indispensable for understanding biological processes, the single-cell field needs a model that improves the interpretability and largely maintains the flexibility of nonparametric regression models.ResultsHere, we propose the single-cell generalized trend model (scGTM) for capturing a gene's expression trend, which may be monotone, hill-shaped or valley-shaped, along cell pseudotime. The scGTM has three advantages: (i) it can capture non-monotonic trends that are easy to interpret, (ii) its parameters are biologically interpretable and trend informative, and (iii) it can flexibly accommodate common distributions for modeling gene expression counts. To tackle the complex optimization problems, we use the particle swarm optimization algorithm to find the constrained maximum likelihood estimates for the scGTM parameters. As an application, we analyze several single-cell gene expression datasets using the scGTM and show that scGTM can capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying biological processes.Availability and implementationThe Python package scGTM is open-access and available at https://github.com/ElvisCuiHan/scGTM.Supplementary informationSupplementary data are available at Bioinformatics online

Crossref

eScholarship - University of California

PyWGCNA: a Python package for weighted gene co-expression network analysis

Author: Farilie Reese
Narges Rezaie
Ali Mortazavi
Rezaie Narges
Reese Farilie
Mortazavi Ali
Publication venue
Publication date: 01/07/2023
Field of study

MOTIVATION: Weighted gene co-expression network analysis (WGCNA) is frequently used to identify modules of genes that are co-expressed across many RNA-seq samples. However, the current R implementation is slow, is not designed to compare modules between multiple WGCNA networks, and its results can be hard to interpret as well as to visualize. We introduce the PyWGCNA Python package, which is designed to identify co-expression modules from large RNA-seq datasets. PyWGCNA has a faster implementation than the R version of WGCNA and several additional downstream analysis modules for functional enrichment analysis using GO, KEGG, and REACTOME, inter-module analysis of protein-protein interactions, as well as comparison of multiple co-expression modules to each other and/or external lists of genes such as marker genes from single cell. RESULTS: We apply PyWGCNA to two distinct datasets of brain bulk RNA-seq from MODEL-AD to identify modules associated with the genotypes. We compare the resulting modules to each other to find shared co-expression signatures in the form of modules with significant overlap across the datasets. AVAILABILITY AND IMPLEMENTATION: The PyWGCNA library for Python 3 is available on PyPi at pypi.org/project/PyWGCNA and on GitHub at github.com/mortazavilab/PyWGCNA. The data underlying this article are available in GitHub at github.com/mortazavilab/PyWGCNA/tutorials/5xFAD_paper

Crossref

eScholarship - University of California

dsRID: in silico identification of dsRNA regions using long-read RNA-seq data

Author: Liu Zhiheng
Mudra Choudhury
Yamamoto Ryo
Ryo Yamamoto
Zhiheng Liu
Choudhury Mudra
Xiao Xinshu
Xinshu Xiao
Publication venue
Publication date: 23/10/2023
Field of study

MOTIVATION: Double-stranded RNAs (dsRNAs) are potent triggers of innate immune responses upon recognition by cytosolic dsRNA sensor proteins. Identification of endogenous dsRNAs helps to better understand the dsRNAome and its relevance to innate immunity related to human diseases. RESULTS: Here, we report dsRID (double-stranded RNA identifier), a machine-learning-based method to predict dsRNA regions in silico, leveraging the power of long-read RNA-sequencing (RNA-seq) and molecular traits of dsRNAs. Using models trained with PacBio long-read RNA-seq data derived from Alzheimer's disease (AD) brain, we show that our approach is highly accurate in predicting dsRNA regions in multiple datasets. Applied to an AD cohort sequenced by the ENCODE consortium, we characterize the global dsRNA profile with potentially distinct expression patterns between AD and controls. Together, we show that dsRID provides an effective approach to capture global dsRNA profiles using long-read RNA-seq data. AVAILABILITY AND IMPLEMENTATION: Software implementation of dsRID, and genomic coordinates of regions predicted by dsRID in all samples are available at the GitHub repository: https://github.com/gxiaolab/dsRID

Crossref

eScholarship - University of California

Virtual Tissue Expression Analysis

Author: Altenbuchinger Michael
Schön Marian
Spang Rainer
Hüttl Paul
Simeth Jakob
Nozari Zahra
Huttner Michael
Mathelier Anthony
Schmidt Tobias
Publication venue
Publication date: 01/01/2024
Field of study

Abstract Motivation Bulk RNA expression data is widely accessible, whereas single-cell data is relatively scarce in comparison. However, single-cell data offers profound insights into the cellular composition of tissues and cell type-specific gene regulation, both of which remain hidden in bulk expression analysis. Results Here, we present tissueResolver, an algorithm designed to extract single-cell information from bulk data, enabling us to attribute expression changes to individual cell types. When validated on simulated data tissueResolver outperforms competing methods. Additionally, our study demonstrates that tissueResolver reveals cell type-specific regulatory distinctions between the activated B-cell-like (ABC) and germinal center B-cell-like (GCB) subtypes of diffuse large B-cell lymphomas (DLBCL). Availability and Implementation R package available at https://github.com/spang-lab/tissueResolver. Code for reproducing the results of this paper is available at https://github.com/spang-lab/tissueResolver-docs1. Supplementary material Supplementary material and additional analyses available online

GRO.publications

GRO.publications (Univ. Göttingen)

Scywalker : scalable end-to-end data analysis workflow for long-read single-cell transcriptome sequencing

Author: Rademakers Rosa
Duchateau Lena
Van Breusegem Frank
Willems Patrick
Van Dongen Jasper
Seyfferth Carolin
Eekhout Thomas
De Coster Wouter
De Rijk Peter
Joris Geert
Strazisar Mojca
Rombauts Stephane
Sleegers Kristel
Watzeels Tijs
De Rybel Bert
Küçükali Fahri
De Deyn Lara
De Pooter Tim
Faura Júlia
Publication venue
Publication date: 01/01/2024
Field of study

Motivation: Existing nanopore single-cell data analysis tools showed severe limitations in handling current data sizes. Results: We introduce scywalker, an innovative and scalable package developed to comprehensively analyze long-read sequencing data of full-length single-cell or single-nuclei cDNA. We developed novel scalable methods for cell barcode demultiplexing and single-cell isoform calling and quantification and incorporated these in an easily deployable package. Scywalker streamlines the entire analysis process, from sequenced fragments in FASTQ format to demultiplexed pseudobulk isoform counts, into a single command suitable for execution on either server or cluster. Scywalker includes data quality control, cell type identification, and an interactive report. Assessment of datasets from the human brain, Arabidopsis leaves, and previously benchmarked data from mixed cell lines demonstrate excellent correlation with short-read analyses at both the cell-barcoding and gene quantification levels. At the isoform level, we show that scywalker facilitates the direct identification of cell-type-specific expression of novel isoforms. Availability and implementation: Scywalker is available on github.com/derijkp/scywalker under the GNU General Public License (GPL) and at https://zenodo.org/records/13359438/files/scywalker-0.108.0-Linux-x86_64.tar.gz

Ghent University Academic Bibliography

SpaCell: integrating tissue morphology and spatial gene expression to predict disease cells

Author: Tan Xiao
Minh Tran
Quan Nguyen
Xiao Tan
Su Andrew
Tran Minh
Nguyen Quan
Andrew Su
Publication venue
Publication date: 01/12/2019
Field of study

Spatial transcriptomics technology is increasingly being applied because it enables the measurement of spatial gene expression in an intact tissue along with imaging morphology of the same tissue. However, current analysis methods for spatial transcriptomics data do not use image pixel information, thus missing the quantitative links between gene expression and tissue morphology.We developed a user-friendly deep learning software, SpaCell, to integrate millions of pixel intensity values with thousands of gene expression measurements from spatially-barcoded spots in a tissue. We show the integration approach outperforms the use of gene-count data alone or imaging data alone to build deep learning models to identify cell types or predict labels of tissue images with high resolution and accuracy.The SpaCell package is open source under a MIT license and it is available at https://github.com/BiomedicalMachineLearning/SpaCell.Supplementary data are available at Bioinformatics online

Crossref

UQ eSpace (University of Queensland)

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Variations on the Author

Author: Sayad Cecilia
Publication venue
Publication date: 01/01/2016
Field of study

“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

Crossref

Kent Academic Repository

Uncovering key transcription factors in breast cancer subtypes using matrix factorization

Author: Klokkerud Solveig Margrete Knoph
Publication venue
Publication date: 01/01/2020
Field of study

Breast cancer is the most common cancer type in women, and response to treatment varies immensely between subtypes. As of today, patients with Basal- like breast cancer lacks targeted treatment, which leads to poor prognosis for this group. Also other subtypes could benefit from a more targeted treatment. The molecular characteristics of each subtype remains an active area of research, and transcription factors that drive the subtypes need to be investigated in order to provide potential targets for more effective treatments. The molecular characteristics of each breast cancer subtype were inferred from ATAC-seq and RNA-seq data from 70 breast cancer patients, using two different matrix factorization methods. The first analysis used non-negative matrix factorization (NMF) on two separate data sets: One for ATAC-seq data, and one for RNA-seq data. The samples were clustered into five groups, based on molecular patterns shared within the groups, for both data sets. The DNA regions that were specifically open for each group were investigated for enriched transcription factor binding sites. The same was done for the promoter regions of the genes that were highly expressed in each group. The Basal-like subtype achieved the most successful clustering, and transcription factors likely to drive this subtype were uncovered. Also transcription factors responsible for driving a collective group of estrogen positive (ER+) subtypes were uncovered. The second analysis used Multi-Omics Factor Analysis (MOFA) to integrate the ATAC-seq and RNA-seq data in one combined analysis. The main purpose of this analysis was to support the findings of the first analysis, and possibly improve the clustering. The integration of multi- omics data resulted in two clusters, separating the Basal-like subtype from the rest of the subtypes. The clustering was not improved. However, some of the key transcription factors found for each group supported the results of the NMF analysis

NORA - Norwegian Open Research Archives