1,721,197 research outputs found

    TPpred3 detects and discriminates mitochondrial and chloroplastic targeting peptides in eukaryotic proteins

    No full text
    Molecular recognition of N-terminal targeting peptides is the most common mechanism controlling the import of nuclear-encoded proteins into mitochondria and chloroplasts. When experimental information is lacking, computational methods can annotate targeting peptides, and determine their cleavage sites for characterizing protein localization, function, and mature protein sequences. The problem of discriminating mitochondrial from chloroplastic propeptides is particularly relevant when annotating proteomes of photosynthetic Eukaryotes, endowed with both types of sequences. Here, we introduce TPpred3, a computational method that given any Eukaryotic protein sequence performs three different tasks: (i) the detection of targeting peptides; (ii) their classification as mitochondrial or chloroplastic and (iii) the precise localization of the cleavage sites in an organelle-specific framework. Our implementation is based on our TPpred previously introduced. Here, we integrate a new N-to-1 Extreme Learning Machine specifically designed for the classification task (ii). For the last task, we introduce an organelle-specific Support Vector Machine that exploits sequence motifs retrieved with an extensive motif-discovery analysis of a large set of mitochondrial and chloroplastic proteins. We show that TPpred3 outperforms the state-of-the-art methods in all the three tasks

    INPS: predicting the impact of non-synonymous variations on protein stability from sequence

    Full text link
    Motivation: A tool for reliably predicting the impact of variations on protein stability is extremely important for both protein engineering and for understanding the effects of Mendelian and somatic mutations in the genome. Next Generation Sequencing studies are constantly increasing the number of protein sequences. Given the huge disproportion between protein sequences and structures, there is a need for tools suited to annotate the effect of mutations starting from protein sequence without relying on the structure. Here, we describe INPS, a novel approach for annotating the effect of non-synonymous mutations on the protein stability from its sequence. INPS is based on SVM regression and it is trained to predict the thermodynamic free energy change upon single-point variations in protein sequences. Results: We show that INPS performs similarly to the state-of-the-art methods based on protein structure when tested in cross-validation on a non-redundant dataset. INPS performs very well also on a newly generated dataset consisting of a number of variations occurring in the tumor suppressor protein p53. Our results suggest that INPS is a tool suited for computing the effect of non-synonymous polymorphisms on protein stability when the protein structure is not available. We also show that INPS predictions are complementary to those of the state-of-the-art, structure-based method mCSM. When the two methods are combined, the overall prediction on the p53 set scores significantly higher than those of the single methods

    Protein Sequence Annotation by Means of Community Detection

    No full text
    In the postgenomic era different electronic procedures are available for protein sequence annotation, the process of enriching, with structural and functional features, any protein after electronic translation from its correspondent gene or mRNA. The demand of reliable annotation systems is particularly urgent given the volume of genomic data that are daily produced by next generation sequencing machines. In this paper we present a procedure that enhances the annotation performance of the previously described Bologna Annotation Resource (BAR+). BAR is based on clustering of the graphs representing the similarity between a large number of protein sequences and here we apply community detection algorithms to detect subclusters within any graph. When the cluster is endowed with specific Gene Ontology terms associated both to Biological Process and Molecular Function, the application of our procedure allows a fine tuning of the annotation process and generates subclusters where proteins sharing strictly related GO terms are grouped

    SVMyr: A Web Server Detecting Co- and Post-translational Myristoylation in Proteins

    Full text link
    Myristoylation (MYR) is a protein modification where a myristoyl group is covalently attached to an translation) or after (post-translation). Myristoylated proteins have a role in signal transduction, apoptosis, and pathogen-mediated processes and their prediction can help in functionally annotating the fraction of proteins undergoing MYR in different proteomes. Here we present SVMyr, a web server allowing the detection of both co- and post-translational myristoylation sites, based on Support Vector Machines (SVM). The input encodes composition and physicochemical features of the octapeptides, known to act as substrates and to physically interact with N-myristoyltransferases (NMTs), the enzymes catalyzing the myristoylation reaction. The method, adopting a cross validation procedure, scores with values of Area Under the Curve (AUC) and Matthews Correlation Coefficient (MCC) of 0.92 and 0.61, respectively. When benchmarked on an independent dataset including experimentally detected 88 medium/high confidence co-translational myristoylation sites and 528 negative examples, SVMyr outperforms available methods, with AUC and MCC equal to 0.91 and 0.58, respectively. A unique feature of SVMyr is the ability to predict post-translational myristoylation sites by coupling the trained SVMs with the detection of caspase cleavage sites, identified by searching regular motifs matching upstream caspase cleavage sites, as reported in literature. Finally, SVMyr confirms 96% of the UniProt set of the electronically annotated myristoylated proteins (31,048) and identifies putative myristoylomes in eight different proteomes, highlighting also new putative NMT substrates. SVMyr is freely available through a user-friendly web server at https://busca.biocomp.unibo.it/lipipred. (c) 2022 The Author(s). Published by Elsevier Ltd

    INPS-MD: A web server to predict stability of protein variants from sequence and structure

    Full text link
    Protein function depends on its structural stability. The effects of single point variations on protein stability can elucidate the molecular mechanisms of human diseases and help in developing new drugs. Recently, we introduced INPS, a method suited to predict the effect of variations on protein stability from protein sequence and whose performance is competitive with the available state-of-the-art tools. RESULTS: In this article, we describe INPS-MD (Impact of Non synonymous variations on Protein Stability-Multi-Dimension), a web server for the prediction of protein stability changes upon single point variation from protein sequence and/or structure. Here, we complement INPS with a new predictor (INPS3D) that exploits features derived from protein 3D structure. INPS3D scores with Pearson's correlation to experimental ΔΔG values of 0.58 in cross validation and of 0.72 on a blind test set. The sequence-based INPS scores slightly lower than the structure-based INPS3D and both on the same blind test sets well compare with the state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: INPS and INPS3D are available at the same web server: http://inpsmd.biocomp.unibo.i

    CoCoNat: A Deep Learning–Based Tool for the Prediction of Coiled-coil Domains in Protein Sequences

    Full text link
    Coiled-coil domains (CCDs) are structural motifs observed in proteins in all organisms that perform several crucial functions. The computational identification of CCD segments over a protein sequence is of great importance for its functional characterization. This task can essentially be divided into three separate steps: the detection of segment boundaries, the annotation of the heptad repeat pattern along the segment, and the classification of its oligomerization state. Several methods have been proposed over the years addressing one or more of these predictive steps. In this protocol, we illustrate how to make use of CoCoNat, a novel approach based on protein language models, to characterize CCDs. CoCoNat is, at its release (August 2023), the state of the art for CCD detection. The web server allows users to submit input protein sequences and visualize the predicted domains after a few minutes. Optionally, precomputed segments can be provided to the model, which will predict the oligomerization state for each of them. CoCoNat can be easily integrated into biological pipelines by downloading the standalone version, which provides a single executable script to produce the output

    A natural upper bound to the accuracy of predicting protein stability changes upon mutations

    Full text link
    Accurate prediction of protein stability changes upon single-site variations (ΔΔG) is important for protein design, as well as our understanding of the mechanism of genetic diseases. The performance of high-throughput computational methods to this end is evaluated mostly based on the Pearson correlation coefficient between predicted and observed data, assuming that the upper bound would be 1 (perfect correlation). However, the performance of these predictors can be limited by the distribution and noise of the experimental data. Here we estimate, for the first time, a theoretical upper-bound to the ΔΔG prediction performances imposed by the intrinsic structure of currently available ΔΔG data

    E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence

    No full text
    : Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein-protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at https://e-prsa.biocomp.unibo.it/main/ where users can submit single-sequence and batch jobs

    DeepSig is a software package and web server to predict signal peptides in proteins

    No full text
    The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it

    ISPRED4 is a web server based on machine-learning for the prediction of protein-protein interaction sites in protein structures

    No full text
    The identification of protein-protein interaction (PPI) sites is an important step towards the characterization of protein functional integration in the cell complexity. Experimental methods are costly and time-consuming and computational tools for predicting PPI sites can fill the gaps of PPI present knowledge. We present ISPRED4, an improved structure-based predictor of PPI sites on unbound monomer surfaces. ISPRED4 relies on machine-learning methods and it incorporates features extracted from protein sequence and structure. Cross-validation experiments are carried out on a new dataset that includes 151 high-resolution protein complexes and indicate that ISPRED4 achieves a per-residue Matthew Correlation Coefficient of 0.48 and an overall accuracy of 0.85. Benchmarking results show that ISPRED4 is one of the top-performing PPI site predictors developed so far. The web server is available at https://ispred4.biocomp.unibo.i
    corecore