Search CORE

1,721,104 research outputs found

CSC: Circular Strings Comparison

Author: GROSSI ROBERTO
Ahmad Retha
Fatima Vayani
Solon P. Pissis
Costas S. Iliopoulos
Robert Mercas
PISANTI NADIA
Publication venue
Publication date: 01/01/2016
Field of study

Description: Given two sequences x and y, CSC finds the cyclic rotation of x (or an approximation of it) that minimises the blockwise q-gram distance from y. Installation: To compile CSC, please follow the instructions given in file INSTALL. Citation: Roberto Grossi, Costas S. Iliopoulos, Robert Mercas, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Fatima Vayani: Circular Sequence Comparison with q-grams. WABI 2015: 203-216. License: GNU GPLv3 License; Copyright (C) 2015 Solon P. Pissis, Ahmad Retha and Fatima Vayani

Archivio della Ricerca - Università di Pisa

Datasets used in "Fast phylogenetic inference from typing data"

Author: Solon P Pissis
Maxime Crochemore
Alexandre P Francisco
João A Carriço
Cátia Vaz
Bruno Ribeiro-Gonçalves
Publication venue
Publication date: 14/07/2020
Field of study

Real datasets used in the paper "Fast phylogenetic inference from typing data", by João A Carriço, Maxime Crochemore, Alexandre P Francisco, Solon P Pissis, Bruno Ribeiro-Gonçalves, and Cátia Vaz, Algorithms for Molecular Biology 13 (1), 1-14, 2018. Check the paper for more information on the datasets

ZENODO

Substring Complexity in Sublinear Space

Author: Gabriele Fici
Giulia Bernardini
Paweł Gawrychowski
and Solon P. Pissis.
Publication venue
Publication date: 28/11/2023
Field of study

Shannon’s entropy is a definitive lower bound for statistical compression. Unfortunately, no such clear measure exists for the compressibility of repetitive strings. Thus, ad hoc measures are employed to estimate the repetitiveness of strings, e.g., the size z of the Lempel–Ziv parse or the number r of equal-letter runs of the Burrows-Wheeler transform. A more recent one is the size γ of a smallest string attractor. Let T be a string of length n. A string attractor of T is a set of positions of T capturing the occurrences of all the substrings of T. Unfortunately, Kempa and Prezza [STOC 2018] showed that computing γ is NP-hard. Kociumaka et al. [LATIN 2020] considered a new measure of compressibility that is based on the function S_T(k) counting the number of distinct substrings of length k of T, also known as the substring complexity of T. This new measure is defined as δ = sup{S_T(k)/k, k ≥ 1} and lower bounds all the relevant ad hoc measures previously considered. In particular, δ ≤ γ always holds and δ can be computed in O(n) time using Θ(n) working space. Kociumaka et al. showed that one can construct an O(δ log n/(δ))-sized representation of T supporting efficient direct access and efficient pattern matching queries on T. Given that for highly compressible strings, δ is significantly smaller than n, it is natural to pose the following question: Can we compute δ efficiently using sublinear working space? It is straightforward to show that in the comparison model, any algorithm computing δ using O(b) space requires Ω(n^{2-o(1)}/b) time through a reduction from the element distinctness problem [Yao, SIAM J. Comput. 1994]. We thus wanted to investigate whether we can indeed match this lower bound. We address this algorithmic challenge by showing the following bounds to compute δ: - O((n3log b)/b2) time using O(b) space, for any b ∈ [1,n], in the comparison model. - Õ(n2/b) time using Õ(b) space, for any b ∈ [√n,n], in the word RAM model. This gives an Õ(n^{1+ε})-time and Õ(n^{1-ε})-space algorithm to compute δ, for any 0 < ε ≤ 1/2. Let us remark that our algorithms compute S_T(k), for all k, within the same complexities

Archivio istituzionale della ricerca - Università di Palermo

Towards Distance-Based Phylogenetic Inference in Average-Case Linear-Time

Author: Pissis Solon P.
Maxime Crochemore
Vaz Cátia
Francisco Alexandre P.
Cátia Vaz
Solon P. Pissis
Pissis Solon
Pissis Solon P.; id_orcid
A. P. Francisco
Crochemore Maxime
Publication venue
Publication date: 01/01/2017
Field of study

Computing genetic evolution distances among a set of taxa dominates the running time of many phylogenetic inference methods. Most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles. We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method

RCIPL Repositório Científico do Instituto Politécnico de Lisboa

INESC-ID RCAAP Portal

DROPS Dagstuhl Research Online Publication Server

King's Research Portal

LIPIcs, Volume 312, WABI 2024, Complete Volume

Author: Pissis Solon P.
Sung Wing-Kin
Publication venue
Publication date: 01/01/2024
Field of study

LIPIcs, Volume 312, WABI 2024, Complete Volum

DROPS Dagstuhl Research Online Publication Server

LIPIcs, Volume 128, CPM'19, Complete Volume

Author: P. Pissis Solon
Pisanti Nadia
Publication venue
Publication date: 01/01/2019
Field of study

LIPIcs, Volume 128, CPM'19, Complete Volum

DROPS Dagstuhl Research Online Publication Server

RSDS: Reverse-Safe-data-structure

Author: Chen Huiping
Loukides Grigorios
Pissis Solon
Publication venue
Publication date: 01/01/2020
Field of study

RSDS: Reverse-safe Text Indexing Copyright (C) 2020 Grigorios Loukides, Solon P. Pissis, Huiping Chen This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version

RSDS: Reverse-Safe-data-structure

Author: Chen H. (Huiping)
Loukides G. (Grigorios)
Pissis S. (Solon)
Publication venue
Publication date: 01/01/2020
Field of study

CWI's Institutional Repository

Front Matter, Table of Contents, Preface, Conference Organization

Author: Pissis Solon P.
Sung Wing-Kin
Publication venue
Publication date: 01/01/2024
Field of study

Front Matter, Table of Contents, Preface, Conference Organizatio

DROPS Dagstuhl Research Online Publication Server

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

Author: Pissis Solon P.
Retha Ahmad
Pissis Solon P.; id_orcid
Publication venue
Publication date: 01/01/2018
Field of study

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm^2+N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N * ceil[m/w]) with pre-processing time and space O(m * ceil[m/w]), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M. A straightforward generalization of the existing bit-vector algorithm would require time O(N * ceil[M/w]) with pre-processing time and space O(M * ceil[M/w]), which is prohibitive in practice. We present a new on-line O(N * ceil[M/w])-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population's variants are considered

DROPS Dagstuhl Research Online Publication Server

King's Research Portal