1,720,976 research outputs found

    ncRNAScan: A pipeline to identify putative novel ncRNAs from deep sequencing data

    No full text
    <p><a href="http://dx.doi.org/10.5281/zenodo.10308"></a></p> <p><strong>ncRNAScan</strong> is a pipeline to extract putative novel ncRNAs ab initio, given a list of transcripts in GTF format assembled from deep sequencing data (ex: RNA-Seq) and annotation data.</p> <p>This pipeline script will bind together the functionality of the tools / scripts: cuffcompare, categorize_ncRNAs.pl, get_unique_features.pl, fetch_seq_from_ucsc.pl, RNAfold, Infernal and Coding Potential Calculator (CPC.sh). Transcriptome construction tools such as Cufflinks produces a set of assembled transcripts in GTF format. ncRNAScan uses this data in addition to known gene annotation to extract putative ncRNAs constructed by the ab initio assemblers. The pipeline relies on the FPKM / RPKM values generated by these assemblers to assess the confidence of the constructed de novo transcripts and validates it against the known refenrece gene and non coding RNA information to identify putative novel ncRNAs.</p> <a name="user-content-ioroutine-" class="anchor" href="#ioroutine-"></a>IO::Routine <a href="https://travis-ci.org/biocoder/Perl-for-Bioinformatics"></a> <ul> <li><p>The scripts use custom <a href="https://github.com/biocoder/Perl-for-Bioinformatics/tree/master/IO-Routine">IO::Routine</a> Perl Module.</p></li> <li><p>If you are installing <strong>ncRNAScan</strong> Pipeline, IO::Routine module is automatically installed.</p></li> </ul> <a name="user-content--ncrnascan-" class="anchor" href="#-ncrnascan-"></a>☲☴ ncRNAScan <a href="https://travis-ci.org/biocoder/Perl-for-Bioinformatics"></a> <ul> <li><p>Head on to <a href="https://github.com/biocoder/Perl-for-Bioinformatics/tree/master/NGS-Utils">NGS-Utils</a> directory for script list.</p></li> <li> <p><strong>Install ncRNAScan and all its dependencies (Mac and Linux):</strong></p> cd /to/your/preferred/install/path curl -O https://raw.githubusercontent.com/biocoder/Perl-for-Bioinformatics/master/NGS-Utils/ncRNAScan perl ncRNAScan -setup </li> <li> <p>Documentation:</p> perl ncRNAScan -h <p>or</p> perldoc ncRNAScan <p>or to get help documentation for individual modules, do:</p> perl ncRNAScan -h cuff perl ncRNAScan -h cat perl ncRNAScan -h get perl ncRNAScan -h fetch perl ncRNAScan -h cpc perl ncRNAScan -h rna perl ncRNAScan -h inf </li> <li> <p>Known issues:</p> <ul> <li>If pipeline setup fails due to XML::Parser module, you need to install XML parser C libraries.</li> <li> <p>On Ubuntu / Debian based Linux distributions, as root user, do:</p> apt-get install libexpat1 libexpat1-dev </li> <li> <p>On RedHat / Fedora / CentOS based Linux distributions, as root user do:</p> yum install expat expat-devel </li> </ul> </li> <li> <p>Caveats:</p> <ul> <li>The pipeline script uses a lot of inherent Linux core utils and has been only tested in BASH shell. </li> </ul> </li> </ul> <a name="user-content-citation" class="anchor" href="#citation"></a>Citation <p>Konganti, Kranti (2014). ncRNAScan: A pipeline to identify novel ncRNAs from deep sequencing data. ZENODO. <a href="http://dx.doi.org/10.5281/zenodo.10308">10.5281/zenodo.10308</a></p> <p>Cheers,</p> <p>BioCoder</p&gt

    Perl-for-Bioinformatics: lncRNApipe: A pipeline to identify putative novel lncRNAs from deep sequencing data

    No full text
    <p><strong>lncRNApipe</strong> is a pipeline to extract putative novel lncRNAs ab initio, given a list of transcripts in GTF format assembled from deep sequencing data (ex: RNA-Seq) and annotation data.</p> <p>This pipeline script will bind together the functionality of the tools / scripts: cuffcompare, categorize_ncRNAs.pl, get_unique_features.pl, fetch_seq_from_ucsc.pl, RNAfold, Infernal and Coding Potential Calculator (CPC.sh). Transcriptome construction tools such as Cufflinks produces a set of assembled transcripts in GTF format. lncRNApipe uses this data in addition to known gene annotation to extract putative lncRNAs constructed by the ab initio assemblers. The pipeline relies on the FPKM / RPKM values generated by these assemblers to assess the confidence of the constructed de novo transcripts and validates it against the known reference gene and non coding RNA information to identify putative novel lncRNAs.</p> <p><strong>The quality of predicted novel lncRNAs highly depends upon the most up-to-date known gene and / or ncRNA annotation file(s) supplied to the pipeline.</strong></p> <a class="anchor" href="#ioroutine-"><span class="octicon octicon-link"></span></a>IO::Routine <a href="https://travis-ci.org/biocoder/Perl-for-Bioinformatics"></a> <ul> <li><p>The scripts use custom <a href="https://github.com/biocoder/Perl-for-Bioinformatics/tree/master/IO-Routine">IO::Routine</a> Perl Module.</p></li> <li><p>If you are installing <strong>lncRNApipe</strong> Pipeline, IO::Routine module is automatically installed.</p></li> <li><p>Requires Bio::SeqIO module be installed and available.</p></li> </ul> <a class="anchor" href="#-lncrnapipe-"><span class="octicon octicon-link"></span></a>☲☴ lncRNApipe <a href="https://travis-ci.org/biocoder/Perl-for-Bioinformatics"></a> <ul> <li><p>Head on to <a href="https://github.com/biocoder/Perl-for-Bioinformatics/tree/master/NGS-Utils">NGS-Utils</a> directory for script list.</p></li> <li> <p><strong>Install lncRNApipe and all its dependencies (Mac and Linux):</strong></p> cd /to/your/preferred/install/path curl -O https://raw.githubusercontent.com/biocoder/Perl-for-Bioinformatics/master/NGS-Utils/lncRNApipe perl lncRNApipe -setup </li> <li> <p>Documentation:</p> perl lncRNApipe -h <p>or</p> perldoc lncRNApipe <p>or to get help documentation for individual modules, do:</p> perl lncRNApipe -h cuff perl lncRNApipe -h cat perl lncRNApipe -h get perl lncRNApipe -h fetch perl lncRNApipe -h cpc perl lncRNApipe -h rna perl lncRNApipe -h inf </li> <li> <p>Known issues:</p> <ul> <li>If pipeline setup fails due to XML::Parser module, you need to install XML parser C libraries.</li> <li> <p>On Ubuntu / Debian based Linux distributions, as root user, do:</p> apt-get install libexpat1 libexpat1-dev </li> <li> <p>On RedHat / Fedora / CentOS based Linux distributions, as root user do:</p> yum install expat expat-devel </li> </ul> <ul> <li> <em><strong>RNAfold:</strong></em> RNAfold is slow and does not work for sequences over 10000bp in length. I am working on including an alternative secondary structure prediction program instead of RNAfold. Meanwhile you may skip running RNAfold module by not issuing the --rnafold option with lncRNApipe.</li> </ul> </li> <li> <p>Caveats:</p> <ul> <li>The pipeline script uses a lot of inherent Linux core utils and has been only tested in BASH shell. </li> <li>Please use absolute full PATH names. Instead of using lncRNApipe -run ./lncRNApipe_output ..., use lncRNApipe -run /data/lncRNApipe_output ... </li> </ul> </li> </ul> <a class="anchor" href="#citation"><span class="octicon octicon-link"></span></a>Citation <p>Konganti, Kranti (2015). lncRNApipe: A pipeline to identify putative novel lncRNAs from deep sequencing data. <a href="https://github.com/biocoder/Perl-for-Bioinformatics/releases">https://github.com/biocoder/Perl-for-Bioinformatics/releases</a></p> <p>Cheers,</p> <p>BioCoder</p&gt

    ncRNAScan: A pipeline to identify putative novel ncRNAs from deep sequencing data

    No full text
    <p> </p> <p><strong>ncRNAScan</strong> is a pipeline to extract putative novel ncRNAs ab initio, given a list of transcripts in GTF format assembled from deep sequencing data (ex: RNA-Seq) and annotation data.</p> <p>This pipeline script will bind together the functionality of the tools / scripts: cuffcompare, categorize_ncRNAs.pl, get_unique_features.pl, fetch_seq_from_ucsc.pl, RNAfold, Infernal and Coding Potential Calculator (CPC.sh). Transcriptome construction tools such as Cufflinks produces a set of assembled transcripts in GTF format. ncRNAScan uses this data in addition to known gene annotation to extract putative ncRNAs constructed by the ab initio assemblers. The pipeline relies on the FPKM / RPKM values generated by these assemblers to assess the confidence of the constructed de novo transcripts and validates it against the known refenrece gene and non coding RNA information to identify putative novel ncRNAs.</p> <p>IO::Routine</p> <ul> <li> <p>The scripts use custom IO::Routine Perl Module.</p> </li> <li> <p>If you are installing <strong>ncRNAScan</strong> Pipeline, IO::Routine module is automatically installed.</p> </li> </ul> <p>☲☴ ncRNAScan</p> <ul> <li> <p>Head on to NGS-Utils directory for script list.</p> </li> <li> <p><strong>Install ncRNAScan and all its dependencies (Mac and Linux):</strong></p> cd /to/your/preferred/install/path curl -O https://raw.githubusercontent.com/biocoder/Perl-for-Bioinformatics/master/NGS-Utils/ncRNAScan perl ncRNAScan -setup</li> <li> <p>Documentation:</p> perl ncRNAScan -h <p>or</p> perldoc ncRNAScan <p>or to get help documentation for individual modules, do:</p> perl ncRNAScan -h cuff perl ncRNAScan -h cat perl ncRNAScan -h get perl ncRNAScan -h fetch perl ncRNAScan -h cpc perl ncRNAScan -h rna perl ncRNAScan -h inf</li> <li> <p>Known issues:</p> <ul> <li>If pipeline setup fails due to XML::Parser module, you need to install XML parser C libraries.</li> <li> <p>On Ubuntu / Debian based Linux distributions, as root user, do:</p> apt-get install libexpat1 libexpat1-dev</li> <li> <p>On RedHat / Fedora / CentOS based Linux distributions, as root user do:</p> yum install expat expat-devel</li> </ul> <ul> <li><em><strong>RNAfold:</strong></em> RNAfold is slow and does not work for sequences over 10000bp in length. I am working on including an alternative secondary structure prediction program instead of RNAfold. Meanwhile you may skip running RNAfold module by not issuing the --rnafold option with ncRNAScan.</li> </ul> </li> <li> <p>Caveats:</p> <ul> <li>The pipeline script uses a lot of inherent Linux core utils and has been only tested in BASH shell.</li> </ul> </li> </ul> <p>Citation</p> <p>Konganti, Kranti (2014). ncRNAScan: A pipeline to identify putative novel ncRNAs from deep sequencing data. ZENODO. 10.5281/zenodo.10566</p> <p>Cheers,</p> <p>BioCoder</p&gt

    Biocoder: A programming language for standardizing and automating biology protocols

    Full text link
    Abstract Background Published descriptions of biology protocols are often ambiguous and incomplete, making them difficult to replicate in other laboratories. However, there is increasing benefit to formalizing the descriptions of protocols, as laboratory automation systems (such as microfluidic chips) are becoming increasingly capable of executing them. Our goal in this paper is to improve both the reproducibility and automation of biology experiments by using a programming language to express the precise series of steps taken. Results We have developed BioCoder, a C++ library that enables biologists to express the exact steps needed to execute a protocol. In addition to being suitable for automation, BioCoder converts the code into a readable, English-language description for use by biologists. We have implemented over 65 protocols in BioCoder; the most complex of these was successfully executed by a biologist in the laboratory using BioCoder as the only reference. We argue that BioCoder exposes and resolves ambiguities in existing protocols, and could provide the software foundations for future automation platforms. BioCoder is freely available for download at http://research.microsoft.com/en-us/um/india/projects/biocoder/. Conclusions BioCoder represents the first practical programming system for standardizing and automating biology protocols. Our vision is to change the way that experimental methods are communicated: rather than publishing a written account of the protocols used, researchers will simply publish the code. Our experience suggests that this practice is tractable and offers many benefits. We invite other researchers to leverage BioCoder to improve the precision and completeness of their protocols, and also to adapt and extend BioCoder to new domains.</p

    SBEToolbox: A Matlab Toolbox for Biological Network Analysis

    No full text
    &lt;p&gt;We present SBEToolbox (Systems Biology and Evolution Toolbox), an open-source Matlab toolbox for biological network analysis. It takes a network file as input, calculates a variety of centralities and topological metrics, clusters nodes into modules, and displays the network using different graph layout algorithms. Straightforward implementation and the inclusion of high-level functions allow the functionality to be easily extended or tailored through developing custom plugins. SBEGUI, a menu-driven Graphical User Interface (GUI) of SBEToolbox enables easy access to various network and graph algorithms for programmers and non-programmers alike.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Documentation :&lt;/strong&gt; https://wsgi-promis.tamu.edu/projects/sbetoolbox/wiki/SBEToolbox_User_Manual&lt;/p&gt; &lt;p&gt;Reference :&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Konganti K, Wang G, Yang E, Cai JJ* (2013). SBEToolbox: a Matlab Toolbox for Biological Network Analysis.&lt;/strong&gt; Evolutionary Bioinformatics 2013:9 355-362&lt;/p&gt; &lt;p&gt;&nbsp;&lt;/p&gt; &lt;p&gt;Checkout repo :&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;p&gt;&lt;strong&gt;Stable&lt;/strong&gt;&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;p&gt;Git :&lt;/p&gt; git clone https://github.com/biocoder/SBEToolbox.git cd SBEToolbox git checkout v1.3.2&lt;/li&gt; &lt;li&gt; &lt;p&gt;Subversion :&lt;/p&gt; svn co https://github.com/biocoder/SBEToolbox/tags/v1.3.2 SBEToolbox_v1.3.2&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;&lt;strong&gt;Development&lt;/strong&gt;&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;p&gt;Git :&lt;/p&gt; git clone https://github.com/biocoder/SBEToolbox.git&lt;/li&gt; &lt;li&gt; &lt;p&gt;Subversion :&lt;/p&gt; svn co https://github.com/biocoder/SBEToolbox/trunk&lt;/li&gt; &lt;/ul&gt; &lt;/li&gt; &lt;/ul&gt; &lt;p&gt;Download source code :&lt;/p&gt

    BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models

    Full text link
    Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1,026 Python functions and 1,243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT- 4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (1) Successful models accommodate a long prompt (> 2,600 tokens) with full context, including functional dependencies. (2) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% vs. up to 25%). Availability and implementation: Code is available at: https://github.com/gersteinlab/biocoder and https://biocoder-benchmark. github.io/

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore