1,721,011 research outputs found
Parallel Lossy Compression for Large FASTQ Files
In this paper we present a parallel version for the algorithm BFQzip, we introduced in [Guerrini et al., BIOSTEC – BIOINFORMATICS 2022], that modifies the bases and quality scores components taking into account both information at the same time, while preserving variant calling. The resulting FASTQ file achieves better compression than the original data. Here, we introduce a strategy that splits the FASTQ file into t blocks and processes them in parallel independently by using the BFQzip algorithm. The resulting blocks with the modified bases and smoothed qualities are merged (in order) and compressed. We show that our strategy can improve the compression ratio of large FASTQ files by taking advantage of the redundancy of reads. When splitting into blocks, the reads belonging to the same portion of the genome could end up in different blocks. Therefore, we analyze how reordering reads before splitting the input FASTQ can improve the compression ratio as the number of threads increases. We also propose a paired-end mode that allows to exploit the paired-end information by processing blocks of FASTQ files in pairs. Availability: The software is freely available at https://github.com/veronicaguerrini/BFQzi
Metagenomic analysis through the extended Burrows-Wheeler transform
Background: The development of Next Generation Sequencing (NGS) has had a major impact on the study of genetic sequences. Among problems that researchers in the field have to face, one of the most challenging is the taxonomic classification of metagenomic reads, i.e., identifying the microorganisms that are present in a sample collected directly from the environment. The analysis of environmental samples (metagenomes) are particularly important to figure out the microbial composition of different ecosystems and it is used in a wide variety of fields: for instance, metagenomic studies in agriculture can help understanding the interactions between plants and microbes, or in ecology, they can provide valuable insights into the functions of environmental communities. Results: In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. The tool LiME (Lightweight Metagenomics via eBWT) is available at https://github.com/veronicaguerrini/LiME. Conclusions: In order to assess the reliability of our approach, we run several experiments on NGS data from two simulated metagenomes among those provided in benchmarking analysis and on a real metagenome from the Human Microbiome Project. The experiment results on the simulated data show that LiME is competitive with the widely used taxonomic classifiers. It achieves high levels of precision and specificity - e.g. 99.9% of the positive control reads are correctly assigned and the percentage of classified reads of the negative control is less than 0.01% - while keeping a high sensitivity. On the real metagenome, we show that LiME is able to deliver classification results comparable to that of MagicBlast. Overall, the experiments confirm the effectiveness of our method and its high accuracy even in negative control samples
The Burrows-Wheeler Transform of an Elastic-Degenerate String
Degenerate strings (DS) and elastic degenerate strings (EDS) are a way to represent, in a compact form, strings that contain a high degree of similarity. They can be particularly useful in some fields, such as text processing or the study of DNA mutations in computational biology, where it is necessary to efficiently manage several variations of a sequence. In practice, a degenerate string is a string whose symbols, called degenerate, can have several alternatives (hence a degenerate symbol is a set). In the literature different constraints have been imposed on degenerate string symbols. For example, the symbol can only be i) a set of letters of the alphabet, ii) a set of strings of the same length, or iii) a set of strings of variable length (including the empty string). We consider the latter in its most general form, which is known as elastic degenerate strings. Our contribution is the introduction of the Burrows-Wheeler transform of an elastic-degenerate string (EDS-BWT). We show that EDS-BWT is reversible and that it can be used to solve the pattern matching problem, i.e., the problem of finding a standard string pattern within an EDS, by adapting the inner properties of the classical Burrows-Wheeler transform. Finally, we implemented the EDS-BWT encoding/decoding and the prototype edsBWTSearch to experimentally compare our pattern matching approach to other existing tools managing elastic degenerate strings
Lightweight Metagenomic Classification via eBWT
The development of Next Generation Sequencing has had a major impact on the study of genetic sequences, and in particular, on the advancement of metagenomics, whose aim is to identify the microorganisms that are present in a sample collected directly from the environment. In this paper, we describe a new lightweight alignment-free and assembly-free framework for metagenomic classification that compares each unknown sequence in the sample to a collection of known genomes. We take advantage of the combinatorial properties of an extension of the Burrows-Wheeler transform, and we sequentially scan the required data structures, so that we can analyze unknown sequences of large collections using little internal memory. For the best of our knowledge, this is the first approach that is assembly- and alignment-free, and is not based on k-mers. We show that our experiments confirm the effectiveness of our approach and the high accuracy even in negative control samples. Indeed we only classify 1 short read on 5,726,358 random shuffle reads. Finally, the results are comparable with those achieved by read-mapping classifiers and by k-mer based classifiers
Femmes et pouvoir en étruire entre sources littéraires et documentation archéologique
The paper aims to provide a synthetic framework and some reflections on the female condition in Etruria with particular regard to the relationship between women and power. The studies of the last thirty years, analyzing the literary and archaeological sources, have outlined an increasingly realistic profile of the Etruscan woman who, despite the differences due to chronologies and sites, undoubtedly appears connoted according to cultural and social models not shared by the contemporary civilizations. Analyzing literary, epigraphic and archaeological testimonies, we will highlight some of those aspects that make it possible to exploit the peculiarities of the woman condition in Etruria. We will pay particular attention to the data from the necropoleis of Bologna dating from the second half of the 6th and the middle of the 4th century BC, object of study of the Chair of Etruscology of the University of Bologna for years. The analysis of the funerary goods and the exegesis of the extraordinary iconographic heritage of the stelai, the most peculiar and compact corpus of funerary monuments of Etruscan Bologna, offer an exceptional sample of research stimulating some reflections on the female world
Studio e sviluppo di un sistema per il controllo fine linea di parti ottenute da fusione di alluminio e ghisa
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
