1,721,332 research outputs found
Enhancing coding potential prediction for short sequences using complementary sequence features and feature selection
The identification of coding potential in DNA sequences is of major importance in bioinformatics, where it is often used to assist expert systems that automatically try to recognize genes in genomes. For longer sequences, the identification of coding potential tends to be easier due to a better signal-to-noise ratio, whereas for very short sequences the issue becomes more problematic. In this paper, we present new methods that specifically aim at a better prediction of coding potential in short sequences. To this end, we combine different, complementary sequence features together with a feature selection strategy. Results comparing the new classifiers to state of the art models show that our new approach significantly outperforms the existing methods when applied to short sequences
Evaluating feature attribution methods in the image domain
Feature attribution maps are a popular approach to highlight the most important pixels in an image for a given prediction of a model. Despite a recent growth in popularity and available methods, the objective evaluation of such attribution maps remains an open problem. Building on previous work in this domain, we investigate existing quality metrics and propose new variants of metrics for the evaluation of attribution maps. We confirm a recent finding that different quality metrics seem to measure different underlying properties of attribution maps, and extend this finding to a larger selection of attribution methods, quality metrics, and datasets. We also find that metric results on one dataset do not necessarily generalize to other datasets, and methods with desirable theoretical properties do not necessarily outperform computationally cheaper alternatives in practice. Based on these findings, we propose a general benchmarking approach to help guide the selection of attribution methods for a given use case. Implementations of attribution metrics and our experiments are available online (https://github.com/arnegevaert/benchmark-general-imaging)
Modeling intercellular communication from transcriptomics data by linking ligands to target genes
Ontcijferen hoe cellen communiceren is nodig om betere inzichten te verwerven in fundamentele biologie en in ziektes waarin cel-cel-communicatieprocessen ontregeld zijn (bv. kanker en COVID-19). Het bestuderen van intercellulaire communicatie is echter zeer uitdagend. Dankzij transcriptomics technologieën is het nu mogelijk om de genexpressie van interagerende cellen te bepalen. Maar, het achterhalen van cel-cel communicatie uit deze transcriptomics data vereist geavanceerde algoritmes. Tijdens dit doctoraat werd een nieuw algoritme, NicheNet, ontwikkeld dat toelaat om te bestuderen hoe signalen geproduceerd door de ene cel de genexpressie kunnen beïnvloeden in een andere cel. Hierdoor kan NicheNet hypotheses genereren over welke communicatiepatronen cruciaal zijn in een bepaald biologische systeem. Dit werd geïllustreerd tijdens een studie over Kupffer cellen waarin verschillende hypotheses van NicheNet gevalideerd konden worden. Hoewel NicheNet een nuttige methode is gebleken, heeft het meerdere beperkingen. Daarom werd in het laatste deel van dit doctoraat een nieuw algoritme ontwikkeld, MultiNicheNet. MultiNicheNet bouwt verder op NicheNet om datasets van grote cohorten patiënten beter te kunnen analyzeren. Hierdoor kunnen betere hypotheses over de rol van cel-cel communicatie in verschillende ziektes gegenereerd worden. Samengevat beschrijft deze thesis dus de ontwikkeling en toepassing van nieuwe algoritmes om cel-cel communicatie te bestuderen o.b.v. transcriptomics data
Structure learning to unravel mechanisms of the immune system
The cells of our immune system play an essential role in protecting us from infections from pathogens such as viruses or harmful bacte- ria. In the context of a disease, the different types of immune cells perform special roles and interact, resulting in a finely orchestrated immune response. However, this complex immune response can in some cases be disrupted. For instance, the cells that are supposed to fight a disease can be silenced. This phenomenon can be observed in tumors, in which cells can start proliferating abnormally without being controlled by a functional immune response.
Understanding how the immune system works in the context of a disease is therefore of crucial importance if we want to find efficient therapies. The cells from the immune system can now be thoroughly studied with technologies that generate unprecedented amounts of in- formation on these cells’ shape, type, and on the molecules that they contain. This enormous amount of data represents a challenge for the doctors who need to analyse it. In this context, many computational tools are being developed, to automate the analysis of medical data. These computational tools tackle typical data analysis issues, such as preprocessing (to obtain clean, noise-free data), feature selection (to identify cell features of interest), clustering (to identify groups of similar cells), trajectory inference (to identify developmental pro- cesses), and network inference (to identify genes that can influence other genes), among others.
The topic of this thesis is the application and design of computational solutions for single-cell data analysis. In the first part of this the- sis, we essentially focus on identifying structure in this type of data. We first present a new computational tool for trajectory inference, TinGa, that can identify cell developmental trajectories in a fast and flexible way. Trajectories are typically identified by compressing the information contained in thousands of genes into a low-dimensional space. We thus secondly present an exploratory study, in which we aimed at computing an optimal low-dimensional space in which the identification of a trajectory would be facilitated. Thirdly, we ap- plied trajectory inference as well as a new network inference method, BRED, to gain biological insight on the response of CD8 T cells upon an acute viral infection. We identified two sources of memory along the developmental trajectory followed by activated CD8 T cells, and we characterised these two memory precursor populations. Finally, we report our results on a multi-omics study that aimed at unravel- ing differences between patients that were tolerant to a graft trans- plantation and patients who developed graft-versus-host disease. By integrating three different types of data, we were able to uncover the crucial role between an activated state and a steady state of the im- mune system in these patients.
Computational tools allow to analyse new types of large scale datasets in a fast and efficient way. By allowing to automate analyses that were previously performed manually, they present multiple advan- tages. First, they make it possible to analyse data of unprecedented size and complexity. Secondly, they significantly reduce the time typ- ically needed for the analysis of any type of data. Lastly, they lead to more robust results, since correctly set computational experiments can be repeated by different persons and will lead to identical results. Altogether, the development and application of computational tools can lead to more robust and reproducible single-cell omics research
Optimizing computational cytometry tools for clinical applications
Translating computational cytometry tools to the clinic is not straightforward and requires close collaboration between bioinformaticians, biostatistics and wet-lab scientists. In this thesis, an overview of the challenges in translational machine learning is provided, where we go over each challenge and provide ways to counter them. The acute myeloid leukemia (AML) study, an example of translational computational cytometry, explores the immunophenotype of this malignancy in great detail and uncovers more ways to subdivide patients and discover important cell types in predicting disease outcomes, which positively impacts personal treatment strategies. FlowSOM, the clustering tool used in this study, is translated to Python to improve user-friendliness and accessibility while keeping equal performance and improving speed and memory usage. At last, this thesis introduces funkyheatmap, a visualization package for mixed-type data frames. This is useful to visualize, for example, the heterogenic clinical metadata of AML patients in a clear and interpretable way
Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods
- …
