1,720,978 research outputs found
A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data
The analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on www.ibiostat.be/software/bioinformatics.The authors are grateful to Steve Swinnen, Thiago Pais, Maria R. Foulquie-Moreno, and Johan M. Thevelein of the Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven and Department of Molecular Microbiology, VIB for providing the data. This work was supported by University Hasselt [B09N106 to J.C.] and the IAP Research Network of the Belgian state (Belgian Science Policy) [P7/06 to J.C. and T.B.]
Computational methods in HDXMS
Hydrogen/Deuterium exchange (HDX) has been applied, since the 1930s, as an analytical tool to study the structure and dynamics of (small) biomolecules. The popularity of using HDX to study proteins increased drastically in the last two decades due to the successful combination with mass spectrometry (MS). Together with this growth in popularity, several technological advances have been made, such as improved quenching and fragmentation. As a consequence of these experimental improvements and the increased use of protein-HDXMS, large amounts of complex data are generated, which require appropriate analysis. Computational analysis of HDXMS requires several steps. A typical workflow for proteins consists of identification of (non-)deuterated peptides or fragments of the protein under study (local analysis), or identification of the deuterated protein as a whole (global analysis); determination of the deuteration level; estimation of the protection extent or exchange rates of the labile backbone amide hydrogen atoms; and a statistically sound interpretation of the estimated protection extent or exchange rates. Several algorithms, specifically designed for HDX analysis, have been proposed. They range from procedures that focus on one specific step in the analysis of HDX data to complete HDX workflow analysis tools. In this review, we provide an overview of the computational methods and discuss outstanding challenges
Experimental design in quantitative proteomics
Metabolites and proteins are potential biomarkers. They can be identified with the help of mass spectrometry (MS). However, measurements obtained by using MS are prone to various random and systematic errors. The sensitivity of the technology to the errors poses practical challenges, including concerns about reproducibility of the MS-based assays and the possibility of false findings. Given the sensitivity, the proper design of MS-based experiments becomes of utmost importance. In this chapter, we review the basic experimental-design tools that can be used to prevent occurrence of errors that might cause misleading findings in MS-based experiments. We also present results of an experiment aimed at investigating variability of the intensity measurements produced by a MALDI-TOF mass spectrometer. The knowledge about the potential sources of systematic and random errors is fundamental in order to properly design an MS experiment
A Conceptual Framework for Abundance Estimation of Genomic Targets in the Presence of Ambiguous Short Sequencing Reads
RNA sequencing (RNA-seq) is widely used to study gene-, transcript-, or exon expression. To quantify the expression level, millions of short sequenced reads need to be mapped back to a reference genome or transcriptome. Read mapping makes it possible to find a location to which a read is identical or similar. Based upon this alignment, expression summaries, that is, read counts are generated. However, reads may be matched to multiple locations. Such ambiguously mapped reads are often ignored in the analysis, which is a potential loss of information and may cause bias in expression estimation. We present the general principles underlying multiread allocation and unbiased estimation of the expression level of genes, exons, or transcripts in the presence of multiple mapped reads. The underlying principles are derived from a theoretical concept that identifies important sources of information such as the number of uniquely mapped reads, the total target length, and the length of the shared target regions. We show with simulation studies that methods incorporating some or all of the aforementioned sources of information estimate the expression levels of genes, exons, and/or transcripts with a higher precision and accuracy than methods that do not use this information. We identify important sources of information that should be taken into account by methods that estimate the abundance of genes, exons, and/or transcripts to achieve good precision and accuracy
A “Refined Hydrogen Rule” and a “Refined Hydrogen and Halogen Rule” for Organic Molecules
Deriving chemical formulas of organic molecules, based on spectral information, with heuristic rules is a commonly recurring task. The computational effort and the potentially extensive list of candidate formulas put a strain on the downstream analysis. In this paper, we introduce a set of redefined heuristics based on the hydrogen and halogen rules that reduce the computational burden and the number of candidate formulas for organic molecules, such as peptides and lipids.Claesen, J (reprint author), CEN SCK, Microbiol Unit, Boeretang 200, B-2400 Mol, Belgium; Hasselt Univ, Data Sci Inst, I BioStat, Hasselt, Belgium.
[email protected]
The (generalized) hydrogen rule for organic molecules
Dear editor, We would like to bring to your attention the existence of the hydrogen rule. 1,2 This rule states that for a neutral organic molecule of the form C nC H nH N nN O nO S nS , the sum of its nominal mass and the number of hydrogen atoms is divisible by four. Or, more formally, n H + m nom ð Þ mod 4 = 0, ð1Þ where m nom denotes the nominal mass of the neutral molecule. The hydrogen rule can be generalized to all neutral organic molecules, except of hypervalent molecules and free radicals, as follows: X n Y × m Y + m nom h i mod 4 = 0, ð2Þ where the sum is over all elements Y with an odd nominal mass m Y and n Y denotes the number of Y-atoms. The (generalized) hydrogen rule can be derived as a consequence of the definition of the nominal mass of a molecule 2 and the molecular formula for organic molecules. For a detailed derivation, we refer the reader to Claesen et al. 3 We illustrate the (generalized) hydrogen rule with two examples: angiotensin II and the halogenated fatty acid 2-Bromo-2-chloroacetic acid. For angiotensin II, with molecular formula C 50 H 71 N 13 O 12 , the sum of the number of hydrogens, 71, and the nominal mass, 1045, equals 1116, which is a multiple of four: 71 + 1045 = 1116 = 4 × 279: Hence, the elemental composition of angiotensin II is compliant with the hydrogen rule (1). For the halogenated fatty acid 2-Bromo-2-chloroacetic acid, the nominal mass is 172 dalton. The elemental composition of this molecule is C 2 H 2 Br 1 Cl 1 O 2 , which is compliant with the generalized hydrogen rule (2): 2 × 1 + 1 × 79 + 1 × 35 + 172 = 288 = 4 × 72: The hydrogen rule can be used to filter a set of predicted molecular formulae. Let us consider arginine, C 6 H 14 N 4 O 2 , which has a mon-oisotopic mass of 174.1117 Da. There are three theoretically possible elemental compositions of the form C nC H nH N nN O nO S nS within a 20-ppm wide mass-tolerance-window, ie, C 4 H 12 N 7 O 1 , C 6 H 14 N 4 O 2 , and C 8 H 16 N 1 O 3. All three elemental compositions have a nominal mass of 174 Da. The molecular formula C 6 H 14 N 4 O 2 is the only one that is compliant with the hydrogen rule (1), because 14 + 174 = 188 = 4 × 47. The elemental composition of organic compounds can also be predicted with the help of the hydrogen rule. For the monoisotopic mass of 174.1117 Da, pacMASS 3 predicts only one molecular formula, C 6 H 14 N 4 O 2 , which is the elemental composition of arginine. While the nitrogen rule 2,4 is an accepted definition in mass spec-trometry, the related hydrogen rule is less known. Given its validity for organic molecules commonly studied in mass spectrometry, such as proteins, peptides, and lipids, and given its practical use in molecular formulae filtering or in molecular formulae prediction, we propose to consider the term "hydrogen rule" as a new definition in the field of mass spectrometry. ORCID Jürgen Claesen https://orcid.org/0000-0001-7615-5322 Dirk Valkenborg https://orcid.org/0000-0002-1877-3496 REFERENCES 1. Claesen J, Valkenborg D, Burzykowski T. De novo prediction of the elemental composition of peptides and proteins based on a single mass. J Mass Spectrom. 2019. https://doi.Claesen, J (reprint author), SCK CEN, Microbiol Unit, Boeretang 200, B-2400 Mol, Belgium.
[email protected]
The isotope distribution: A rose with thorns
The isotope distribution, which reflects the number and probabilities of occurrence of different isotopologues of a molecule, can be theoretically calculated. With the current generation of (ultra)-high-resolution mass spectrometers, the isotope distribution of molecules can be measured with high sensitivity, resolution, and mass accuracy. However, the observed isotope distribution can differ substantially from the expected isotope distribution. Although differences between the observed and expected isotope distribution can complicate the analysis and interpretation of mass spectral data, they can be helpful in a number of specific applications. These applications include, yet are not limited to, the identification of peptides in proteomics, elucidation of the elemental composition of small organic molecules and metabolites, as well as wading through peaks in mass spectra of complex bioorganic mixtures such as petroleum and humus. In this review, we give a nonexhaustive overview of factors that have an impact on the observed isotope distribution, such as elemental isotope deviations, ion sampling, ion interactions, electronic noise and dephasing, centroiding, and apodization. These factors occur at different stages of obtaining the isotope distribution: during the collection of the sample, during the ionization and intake of a molecule in a mass spectrometer, during the mass separation and detection of ionized molecules, and during signal processing
A varying-coefficient model for the analysis of methylation sequencing data
DNA methylation is an important epigenetic modification involved in gene regulation. Advances in the next generation sequencing technology have enabled the retrieval of DNA methylation information at single -base -resolution. However, due to the sequencing process and the limited amount of isolated DNA, DNAmethylation-data are often noisy and sparse, which complicates the identification of differentially methylated regions (DMRs), especially when few replicates are available. We present a varying -coefficient model for detecting DMRs by using single -base -resolved methylation information. The model simultaneously smooths the methylation profiles and allows detection of DMRs, while accounting for additional covariates. The proposed model takes into account possible overdispersion by using a beta -binomial distribution. The overdispersion itself can be modeled as a function of the genomic region and explanatory variables. We illustrate the properties of the proposed model by applying it to two real -life case studies
Deuteros 2.0: peptide-level significance testing of data from hydrogen deuterium exchange mass spectrometry
A Summary: Hydrogen deuterium exchange mass spectrometry (HDX-MS) is becoming increasing routine for monitoring changes in the structural dynamics of proteins. Differential HDX-MS allows comparison of protein states, such as in the absence or presence of a ligand. This can be used to attribute changes in conformation to binding events, allowing the mapping of entire conformational networks. As such, the number of necessary cross-state comparisons quickly increases as additional states are introduced to the system of study. There are currently very few software packages available that offer quick and informative comparison of HDX-MS datasets and even fewer which offer statistical analysis and advanced visualization. Following the feedback from our original software Deuteros, we present Deuteros 2.0 which has been redesigned from the ground up to fulfill a greater role in the HDX-MS analysis pipeline. Deuteros 2.0 features a repertoire of facilities for back exchange correction, data summarization, peptide-level statistical analysis and advanced data plotting features.This work was supported by grants from the Biotechnology and Biological Sciences Research Council (BBSRC) Doctoral Training Partnership [BB/ M009513/1] and The Leverhulme Trust [RPG-2019-178].
The authors thank Justin Benesch, Angela Gehrckens (University of Oxford) and Ruyu Jia (King’s College London) for software testing and feedback
Differences in the Elemental Isotope Definition May Lead to Errors in Modern Mass-Spectrometry-Based Proteomics
The elemental isotope definition used to calculate the theoretical masses and isotope distribution of (bio)molecules is
considered to be a fixed, universal standard in mass-spectrometrybased proteomics. However, this is an incorrect assumption. In view of the ongoing advances in mass spectrometry technology, and in particular the ever-increasing mass precision, the elemental isotope definition and its variations should be taken into account. We illustrate the effect of the elemental isotope uncertainty on the theoretical and experimental masses with theoretical calculations and examples.The authors are grateful to the editor and the reviewers for their insightful comments. All of these comments were most helpful and have resulted in an improved text. The authors would like to acknowledge IUPAC and M. Wieser for granting us permission to use Figure 3. D.V. acknowledges support from the SBO grant "InSPECtor" (120025) of the Flemish agency for Innovation by Science and Technology (ENT). F.L. acknowledges support from the Research Foundation - Flanders (FWO)
- …
