Search CORE

325 research outputs found

Front Matter, Table of Contents, Preface, Conference Organization

Author: Pissis S. P.
Pisanti N.
P. Pissis Solon
Pisanti Nadia
Publication venue
Publication date: 01/01/2019
Field of study

Front Matter, Table of Contents, Preface, Conference Organizatio

Archivio della Ricerca - Università di Pisa

DROPS Dagstuhl Research Online Publication Server

CSC: Circular Strings Comparison

Author: GROSSI ROBERTO
Ahmad Retha
Fatima Vayani
Solon P. Pissis
Costas S. Iliopoulos
Robert Mercas
PISANTI NADIA
Publication venue
Publication date: 01/01/2016
Field of study

Description: Given two sequences x and y, CSC finds the cyclic rotation of x (or an approximation of it) that minimises the blockwise q-gram distance from y. Installation: To compile CSC, please follow the instructions given in file INSTALL. Citation: Roberto Grossi, Costas S. Iliopoulos, Robert Mercas, Nadia Pisanti, Solon P. Pissis, Ahmad Retha, Fatima Vayani: Circular Sequence Comparison with q-grams. WABI 2015: 203-216. License: GNU GPLv3 License; Copyright (C) 2015 Solon P. Pissis, Ahmad Retha and Fatima Vayani

Archivio della Ricerca - Università di Pisa

LIPIcs, Volume 128, CPM'19, Complete Volume

Author: P. Pissis Solon
Pisanti Nadia
Publication venue
Publication date: 01/01/2019
Field of study

LIPIcs, Volume 128, CPM'19, Complete Volum

DROPS Dagstuhl Research Online Publication Server

Dictionary Matching in Elastic-Degenerate Texts with Applications in Searching VCF Files On-line

Author: Pissis Solon P.
Retha Ahmad
Pissis Solon P.; id_orcid
Publication venue
Publication date: 01/01/2018
Field of study

An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent multiple sequence alignments of closely-related sequences in a compact form. For a standard pattern of length m, pattern matching in an elastic-degenerate text can be solved on-line in time O(nm^2+N) with pre-processing time and space O(m) (Grossi et al., CPM 2017). A fast bit-vector algorithm requiring time O(N * ceil[m/w]) with pre-processing time and space O(m * ceil[m/w]), where w is the size of the computer word, was also presented. In this paper we consider the same problem for a set of patterns of total length M. A straightforward generalization of the existing bit-vector algorithm would require time O(N * ceil[M/w]) with pre-processing time and space O(M * ceil[M/w]), which is prohibitive in practice. We present a new on-line O(N * ceil[M/w])-time algorithm with pre-processing time and space O(M). We present experimental results using both synthetic and real data demonstrating the performance of the algorithm. We further demonstrate a real application of our algorithm in a pipeline for discovery and verification of minimal absent words (MAWs) in the human genome showing that a significant number of previously discovered MAWs are in fact false-positives when a population's variants are considered

DROPS Dagstuhl Research Online Publication Server

King's Research Portal

LIPIcs, Volume 312, WABI 2024, Complete Volume

Author: Pissis Solon P.
Sung Wing-Kin
Publication venue
Publication date: 01/01/2024
Field of study

LIPIcs, Volume 312, WABI 2024, Complete Volum

DROPS Dagstuhl Research Online Publication Server

Faster Approximate Elastic-Degenerate String Matching - Part A

Author: Pissis Solon P.
Radoszewski Jakub
Zuba Wiktor
Radoszewski J. (Jakub)
Pissis Solon P.; id_orcid
Pissis S. (Solon)
Zuba W.P. (Wiktor)
Publication venue
Publication date: 01/01/2025
Field of study

An elastic-degenerate (ED) string is a sequence = [1] ⋯ [n] of n finite sets of strings. The cardinality m of is the total number of strings in [i], for all i ∈ [1..n]. The size N of is the total length of all m strings of . ED strings have been introduced to represent a set of closely-related DNA sequences. Let P = P[1..p] be a pattern of length p and k > 0 be an integer. We consider the problem of k-Approximate ED String Matching (EDSM): searching k-approximate occurrences of P in the language of . We call k-Approximate EDSM under the Hamming distance, k-Mismatch EDSM; and we call k-Approximate EDSM under edit distance, k-Edit EDSM. Bernardini et al. (Theoretical Computer Science, 2020) showed a simple (k m p + kN)-time algorithm for k-Mismatch EDSM and an (k² m p + kN)-time algorithm for k-Edit EDSM. We improve the dependency on k in both results, obtaining an Õ(k^{2/3}mp+√kN)-time algorithm for k-Mismatch EDSM and an Õ(kmp+ kN)-time algorithm for k-Edit EDSM. Bernardini et al. (Theory of Computing Systems, 2024) presented several algorithms for 1-Approximate EDSM working in Õ(np²+N) time. They have also left the possibility to generalize these solutions for k > 1 as an open problem. We improve the runtime of their solution for 1-Mismatch and 1-Edit EDSM from Õ(np²+N) to (np²+N). We further show algorithms for k-Approximate EDSM for the Hamming and edit distances working in Õ(np² + N) time, for any constant k > 0. Finally, we show how our techniques can be applied to improve upon the complexity of the k-Approximate ED String Intersection and k-Approximate Doubly EDSM problems that were introduced very recently by Gabory et al. (Information and Computation, 2025)

VU Research Portal

CWI's Institutional Repository

DROPS Dagstuhl Research Online Publication Server

Front Matter, Table of Contents, Preface, Conference Organization

Author: Pissis Solon P.
Sung Wing-Kin
Publication venue
Publication date: 01/01/2024
Field of study

Front Matter, Table of Contents, Preface, Conference Organizatio

DROPS Dagstuhl Research Online Publication Server

Internal Shortest Absent Word Queries

Author: Charalampopoulos P. (Panagiotis)
Pissis Solon P.
Badkobeh Golnaz
Badkobeh G. (Golnaz)
Charalampopoulos Panagiotis
Pissis Solon P.; id_orcid
Pissis S. (Solon)
Pissis Solon
Publication venue
Publication date: 01/01/2021
Field of study

Given a string T of length n over an alphabet Σ ⊂ {1,2,…,n^{(1)}} of size σ, we are to preprocess T so that given a range [i,j], we can return a representation of a shortest string over Σ that is absent in the fragment T[i]⋯ T[j] of T. For any positive integer k ∈ [1,log log_σ n], we present an ((n/k)⋅ log log_σ n)-size data structure, which can be constructed in (nlog_σ n) time, and answers queries in time (log log_σ k)

Goldsmiths Research Online

VU Research Portal

CWI's Institutional Repository

DROPS Dagstuhl Research Online Publication Server

Size-Constrained Weighted Ancestors with Applications

Author: Bille Philip
Pissis Solon P.
Nekrich Y. (Yakov)
Bille P. (Philip)
Nekrich Yakov
Pissis Solon P.; id_orcid
Pissis S. (Solon)
Publication venue
Publication date: 01/01/2024
Field of study

The weighted ancestor problem on a rooted node-weighted tree T is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require Ω(log log n) time for queries provided (n poly log n) space is available, where n is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This research has culminated in a data structure for weighted ancestors in suffix trees with (1) query time and an (n)-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function weight that maps each node of T to a positive integer, such that weight(u) ≤ size(u) for any node u and weight(u₁) ≤ weight(u₂) if node u₁ is a descendant of node u₂, where size(u) is the number of nodes in the subtree rooted at u. In the size-constrained weighted ancestor (SWA) problem, for any node u of T and any integer k, we are asked to return the lowest ancestor w of u with weight at least k. We show that for any rooted tree with n nodes, we can locate node w in (1) time after (n)-time preprocessing. In particular, this implies a data structure for the SWA problem in suffix trees with (1) query time and (n)-time preprocessing, when the nodes are weighted by weight. We also show several string-processing applications of this result

Michigan Technological University

VU Research Portal

CWI's Institutional Repository

DROPS Dagstuhl Research Online Publication Server

Online Research Database In Technology

Efficient Index for Weighted Sequences

Author: Pissis Solon P.
Radoszewski Jakub
Kociumaka Tomasz
Pissis Solon P.; id_orcid
Barton Carl
Publication venue
Publication date: 01/01/2016
Field of study

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold 1/z, we say that a pattern string P matches a weighted text at position i if the product of probabilities of the letters of P at positions i,...,i+|P|-1 in the text is at least 1/z. In this article, we present an O(nz)-time construction of an O(nz)-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of z log z. Other applications of this data structure include an O(nz)-time construction of the weighted prefix table and an O(nz)-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor

DROPS Dagstuhl Research Online Publication Server

King's Research Portal