1,720,959 research outputs found
Applying large scale metanalysis of transcriptomic data to uncover hyper-responsive genes and prediction via machine learning
With the increasing adoption of high-throughput transcriptomic technologies there has been efforts to leverage these pre-existing datasets to improve the biological interpretability of bulk transcriptomic data analyses. Researchers have observed that genes exhibit markedly different responsiveness to perturbation- i.e. a subset of genes are more likely to exhibit large changes to their expression. This thesis expands on this area by developing a novel approach to bulk transcriptomic data analysis which leverages pre-existing datasets. Additionally, this thesis also showcases novel approach to feature selection for scRNA-seq datasets which also leverages pre-existing transcriptomic data to improve the clustering of annotated scRNA-seq datasets. Finally, this thesis utilises machine learning models to identify the key genomic and transcript-based features of these genes that explain the differences in genes’ responsiveness to perturbation
Datasets in support of the Southampton doctoral thesis 'Applying large scale metanalysis of transcriptomic data to uncover hyper-responsive genes and prediction via machine learning'
The SQLite databases contain the outputs from the large scale analysis of pre-existing RNA-seq and microarray datasets performed in chapter 2. Both SQLite databases contain the outputs of limma- a package used to perform differential expressed gene analysis on the datasets from Gene Expression Omnibus (GEO)- https://www.ncbi.nlm.nih.gov/geo/. The Schema for both databases are as follows- the data table contains the outputs and statistics from limma. The meta table contains metadata about the number of treated and control samples, the type of experiment conducted and the tissue used. These datasets where used to derive the priors used in chapters 3 to 5 based on the proportion of datasets wherein a given gene is identified as differentially expressed- i.e. p-value below 0.05. Die to the size of the file, this is only available on request, please use https://library.soton.ac.uk/datarequest
The machine_learning_input.csv file is a comma delaminated file containing the genomic and transcript based features used to predict a gene's prior in the machine learning models.
For more information see the readme file.
The RNK files are tab delimited files. The .RNK files' first column is the gene whils the second is the rank from 1 to 0. These files were used to assess the enrichment of desired DEGs across 22 perturbation studies in chapter 2 using GSEA- https://www.gsea-msigdb.org/gsea/index.jsp. 1 represents a gene with the lowest rank- highest priority. Whilst 0 represents the lowest priority for a given gene.
The .RDS images are the R images used for the novel GEOreflect approach for ranking DEGs in bulk transcriptomic data developed in chapter 3. They are also needed to run the RShiny application used to showcase the method. The code for which can be found at GitHub (https://github.com/brandoncoke/GEOreflect) as well ain in the GEOreflect_bulk_DEG_analysis.tar. The .RDS files require R and the readRDS() function to load into the environment and contains the percentile matrices used to calculate a platform p-value rank. Within the GEOreflect_bulk_DEG_analysis.tar file is an R script GEOreflect_functions.R which when sourced after loading one of the .RDS images into the R environment enables the user to perform the GEOreflect method on bulk RNA-seq transcriptomic datasets by loading the percentile_matrix_p_value_RNAseq.RDS image. Alternatively when analysing GPL570 microarray datasets the percentile_matrix.RDS file needs to be loaded into the R environment and the appropiate R function then needs to be applied the DEG list. To run the RShiny application ensure both .RDS files are in the directory with the app.R file i.e. after using git clone https://github.com/brandoncoke/GEOreflect move both .RDS files into the GEOreflect directory with the cloned repository.
The csv files with the scRNA-seq appended. These files contain the normalised mutual index, adjusted rand index and Silhouette coefficeint obtained when using 6 single cell RNA-sequencing techniques- GEOreflect, Seurat's vst method, CellBRF, genebasis and CellBRF with the 3 sigma rule imposed. This analysis was carried out in chapter 3. These .csvs use their GEO identifier in the file name or for Zheng et al's data from genomics 10X. The name assigned to it via the DuoClustering2018 R package.
The machine_learning_input.csv file is a comma delaminated file containing the genomic and transcript based features used to predict a gene's prior in the machine learning models. The inputs from this file were used to develop the machine learning models used in chapter 5. First row- gene is the HNGC identifier for the genes whilst the min_to_be_sig column represents a gene's CDF value at 0.05 for their p-value distribution obtained from the RNA-seq datasets i.e. the target y for the regressor model. The sd column is unused- and was only relevant when calculating the priors using GPL570 microarray data were there can be redundant probes resulting in multiple priors for the same gene. This column would represent the standard deviation.
</span
Knockdown proteomics reveals USP7 as a regulator of cell-cell adhesion in colorectal cancer via AJUBA
Ubiquitin-specific protease 7 (USP7) is implicated in many cancers including colorectal cancer in which it regulates cellular pathways such as Wnt signalling and the P53-MDM2 pathway. With the discovery of small-molecule inhibitors, USP7 has also become a promising target for cancer therapy, and therefore systematically identifying USP7 deubiquitinase interaction partners and substrates has become an important goal. In this study, we selected a colorectal cancer cell model that is highly dependent on USP7 and in which USP7 knockdown significantly inhibited colorectal cancer cell viability, colony formation, and cell-cell adhesion. We then used inducible knockdown of USP7 followed by LC-MS/MS to quantify USP7 dependent proteins. We identified the Ajuba LIM domain protein as an interacting partner of USP7 through co-IP, its substantially reduced protein levels in response to USP7 knockdown, and its sensitivity to the specific USP7 inhibitor FT671. The Ajuba protein has been shown to have oncogenic functions in colorectal and other tumours, including regulation of cell-cell adhesion. We show that both knockdown of USP7 or Ajuba results in a substantial reduction of cell-cell adhesion, with concomitant effects on other proteins associated with adherens junctions. Our findings underlie the role of USP7 in colorectal cancer through its protein interaction networks and show that the Ajuba protein is a component of USP7 protein networks present in colorectal cancer.</p
Multi-omics analysis reveals key immunogenic signatures induced by oncolytic Zika virus infection of paediatric brain tumour cells
Brain tumours disproportionately affect children and are the largest cause of paediatric cancer-related death. Novel therapies that engage the immune system, such as oncolytic viruses (OVs), hold great promise and are desperately needed. Zika virus (ZIKV) infects and destroys aggressive cells from multiple paediatric central nervous system (CNS) tumours. Despite this, the molecular mechanisms underpinning this response are largely unknown. We comprehensively investigate the transcriptomic response of paediatric medulloblastoma and atypical teratoid rhabdoid tumour (ATRT) cells to ZIKV infection. We observe conserved TNF signalling and cytokine signalling-related signatures and show that the TNF-alpha signalling pathway is implicated in oncolysis by reducing the viability of ZIKV-infected brain tumour cells. Our findings highlight TNF-alpha as a potential prognostic marker for oncolytic ZIKV (oZIKV) therapy. Complementing our analysis with a 49-plex ELISA, we demonstrate that ZIKV infection induces a clinically relevant and diverse pro-inflammatory brain tumour cell secretome, including TNF-alpha. We assess publicly available scRNA-Seq data to model how ZIKV-induced secretome paracrine and endocrine signalling may orchestrate the anti-tumoural immune response during oZIKV infection of brain tumours. Our findings significantly contribute to understanding the molecular mechanisms governing oZIKV infection and will help pave the way towards oZIKV therapy.</p
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
- …
