1,721,021 research outputs found

    The effect of data transformation on low-dimensional integration of single-cell RNA-seq

    No full text
    Abstract Background Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. Results This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. Conclusions Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.Open-Access-Publikationsfonds 202

    Ensemble multi-objective hyperparameter optimization for the classification of imbalanced heart disease data

    No full text
    http://dx.doi.org/10.13039/501100002347 Federal Ministry of Education and Research Bonn Officehttp://dx.doi.org/10.13039/501100001659 German Research Foundatio

    BenchXAI: Comprehensive benchmarking of post-hoc explainable AI methods on multi-modal biomedical data

    No full text
    http://dx.doi.org/10.13039/501100002347 Bundesministerium für Bildung und Forschunghttp://dx.doi.org/10.13039/501100004189 Max-Planck-Gesellschafthttp://dx.doi.org/10.13039/501100014840 Gemeinsame Bundesausschusshttp://dx.doi.org/10.13039/501100003385 Georg-August-Universität Göttingenhttp://dx.doi.org/10.13039/501100001659 German Research Foundatio

    Species-agnostic transfer learning for cross-species transcriptomics data integration without gene orthology

    No full text
    Abstract Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species’ data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets

    Integrated Statistical Learning of Metabolic Ion Mobility Spectrometry Profiles for Pulmonary Disease Identification

    No full text
    Exhaled air carries information on human health status. Ion mobility spectrometers combined with a multi-capillary column (MCC/IMS) is a well-known technology for detecting volatile organic compounds (VOCs) within human breath. This technique is relatively inexpensive, robust and easy to use in every day practice. However, the potential of this methodology depends on successful application of computational approaches for finding relevant VOCs and classification of patients into disease-specific profile groups based on the detected VOCs. We developed an integrated state-of-the-art system using sophisticated statistical learning techniques for VOC-based feature selection and supervised classification into patient groups. We analyzed breath data from 84 volunteers, each of them either suffering from chronic obstructive pulmonary disease (COPD), or both COPD and bronchial carcinoma (COPD + BC), as well as from 35 healthy volunteers, comprising a control group (CG). We standardized and integrated several statistical learning methods to provide a broad overview of their potential for distinguishing the patient groups. We found that there is strong potential for separating MCC/IMS chromatograms of healthy controls and COPD patients (best accuracy COPD vs CG: 94). However, further examination of the impact of bronchial carcinoma on COPD/no-COPD classification performance is necessary (best accuracy CG vs COPD vs COPD + BC: 79). We also extracted 20 high-scoring VOCs that allowed differentiating COPD patients from healthy controls. We conclude that these statistical learning methods have a generally high accuracy when applied to wellstructured, medical MCC/IMS data

    Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence

    No full text
    The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question

    Fostering reproducibility, reusability, and technology transfer in health informatics

    No full text
    Summary: Computational methods can transform healthcare. In particular, health informatics with artificial intelligence has shown tremendous potential when applied in various fields of medical research and has opened a new era for precision medicine. The development of reusable biomedical software for research or clinical practice is time-consuming and requires rigorous compliance with quality requirements as defined by international standards.However, research projects rarely implement such measures, hindering smooth technology transfer into the research community or manufacturers as well as reproducibility and reusability.Here, we present a guideline for quality management systems (QMS) for academic organizations incorporating the essential components while confining the requirements to an easily manageable effort. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability.Ultimately, the emerging standardized workflows can pave the way for an accelerated deployment in clinical practice

    On the Importance of Statistics in Breath Analysis -- Hope or Curse?

    No full text
    As we saw at the 2013 Breath Analysis Summit, breath analysis is a rapidly evolving field. Increasingly sophisticated technology is producing huge amounts of complex data. A major barrier now faced by the breath research community is the analysis of these data. Emerging breath data require sophisticated, modern statistical methods to allow for a careful and robust deduction of real-world conclusions.</p

    Explainable Artificial Intelligence on Biosignals for Clinical Decision Support

    No full text
    Lower Saxony Vorab of the Volkswagen Foundation and the Ministry for Science and Culture of Lower SaxonyInstationsausschuss beim Gemeinsamen Bundesausschuss (G-BA
    corecore