1,721,081 research outputs found

    ProSeqViewer: an interactive, responsive and efficient TypeScript library for visualization of sequences and alignments in web applications

    Full text link
    Summary: Biological data is ever-increasing in amount and complexity. The mapping of this data to biological entities such as nucleotide and amino acid sequences supports biological data analysis, classification and prediction. Sequence alignments and comparison allow the transfer of knowledge to evolutionary related entities, the mapping of functional domains, the identification of binding and modification sites. To support these types of studies we developed ProSeqViewer, a tool to visualize annotation on single sequences and multiple sequence alignments. This state-of-the-art multifunctional library was developed as a modular component to be integrated into static or dynamic web resources and support intuitive visualization of sequence features. ProseSeqViewer is extremely lightweight, fast, interactive, dynamic, responsive and works at any screen size. It generates pure HTML which is compatible with any browser and operating system. ProSeqViewer can exchange events with other visualization components and is already used by multiple biological databases. Availability and implementation: ProSeqViewer is an open-source TypeScript library compatible with state-of-the-art website environments. The source code and an extensive documentation including use cases are available from the URL: https://github.com/BioComputingUP/ProSeqViewer

    INGA: protein function prediction combining interaction networks, domain assignments and sequence similarity

    No full text
    Identifying protein functions can be useful for numerous applications in biology. The prediction of gene ontology (GO) functional terms from sequence remains however a challenging task, as shown by the recent CAFA experiments. Here we present INGA, a web server developed to predict protein function from a combination of three orthogonal approaches. Sequence similarity and domain architecture searches are combined with protein-protein interaction network data to derive consensus predictions for GO terms using functional enrichment. The INGA server can be queried both programmatically through RESTful services and through a web interface designed for usability. The latter provides output supporting the GO term predictions with the annotating sequences. INGA is validated on the CAFA-1 data set and was recently shown to perform consistently well in the CAFA-2 blind test. The INGA web server is available from URL: http://protein.bio.unipd.it/inga

    MobiDB-lite: Fast and highly specific consensus prediction of intrinsic disorder in proteins

    No full text
    Intrinsic disorder (ID) is established as an important feature of protein sequences. Its use in proteome annotation is however hampered by the availability of many methods with similar performance at the single residue level, which have mostly not been optimized to predict long ID regions of size comparable to domains. Here, we have focused on providing a single consensus-based prediction, MobiDB-lite, optimized for highly specific (i.e. few false positive) predictions of long disorder. The method uses eight different predictors to derive a consensus which is then filtered for spurious short predictions. Consensus prediction is shown to outperform the single methods when annotating long ID regions. MobiDB-lite can be useful in large-scale annotation scenarios and has indeed already been integrated in the MobiDB, DisProt and InterPro databases

    SODA: prediction of protein solubility from disorder and aggregation propensity

    No full text
    Solubility is an important, albeit not well understood, feature determining protein behavior. It is of paramount importance in protein engineering, where similar folded proteins may behave in very different ways in solution. Here we present SODA, a novel method to predict the changes of protein solubility based on several physico-chemical properties of the protein. SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to estimate changes in solubility. It has been trained and benchmarked on two different datasets. The comparison to other recently published methods shows that SODA has state-of-the-art performance and is particularly well suited to predict mutations decreasing solubility. The method is fast, returning results for single mutations in seconds. A usage example estimating the full repertoire of mutations for a human germline antibody highlights several solubility hotspots on the surface. The web server, complete with RESTful interface and extensive help, can be accessed from URL: http://protein.bio.unipd.it/soda

    Protein Sequence Annotation by Means of Community Detection

    No full text
    In the postgenomic era different electronic procedures are available for protein sequence annotation, the process of enriching, with structural and functional features, any protein after electronic translation from its correspondent gene or mRNA. The demand of reliable annotation systems is particularly urgent given the volume of genomic data that are daily produced by next generation sequencing machines. In this paper we present a procedure that enhances the annotation performance of the previously described Bologna Annotation Resource (BAR+). BAR is based on clustering of the graphs representing the similarity between a large number of protein sequences and here we apply community detection algorithms to detect subclusters within any graph. When the cluster is endowed with specific Gene Ontology terms associated both to Biological Process and Molecular Function, the application of our procedure allows a fine tuning of the annotation process and generates subclusters where proteins sharing strictly related GO terms are grouped
    corecore