1,721,052 research outputs found
Refined tagging of complex verbal phrases for the Italian language
A verb phrase is a syntactic unit consisting of one verbal form, combined with any other elements, representing the verbal part of the speech. In Italian, as in many other languages, the verb phrase is the central element in a sentence. In this paper, we investigate the problem of the automatic recognition of complex verb phrases in the Italian language, where the wide variety of syntactic units and the complexity of morphology make the problem more difficult to solve than in English. In particular we propose an hybrid approach which faces the recognition and the disambiguation of Italian verb phrases by using language generation. We provide also a web tool1 for testing and querying our method. The level of accuracy and the grade of detail reached by our solution is higher than any other known approach
Alignment of Sequences Allowing for Non-overlapping Unbalanced Translocations of Adjacent Factors
Unbalanced translocations take place when two unequal chromosome sub-sequences swap, resulting in an altered genetic sequence. Such large-scale gene modification are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity. However, despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of aligning sequences allowing for this kind of modification. In this paper we investigate the sequence alignment problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. Specifically, we present an alignment algorithm for the problem working in -time and -space, where m is the length of the involved sequences. To the best of our knowledge this is the first solution in literature for the alignment problem allowing for unbalanced translocations of factors
An efficient skip-search approach to swap matching
The swap matching problem consists in finding all occurrences of a pattern x of length m in a text y of length n, allowing for disjoint local swaps of characters in the pattern. In 2003, Amir et al. solved the problem in O(n log m log σ) worst-case time complexity, where σ is the size of the alphabet. In recent years, much research has focused on practical solutions and efficient algorithms have been devised by means of the bit-parallel simulation of non-deterministic automata. In this paper, we present a new efficient algorithm for the swap matching problem based on character comparison and structured as a generalization of the Skip-Search algorithm for the exact string matching problem. Although our solution has a quadratic worst-case time complexity, it shows a sub-linear behaviour on average. According to experimental results, our algorithm obtains in most practical cases the best running times, when compared against the most effective solutions. The gain in speed-up, in terms of running times, is up to 48%. This makes the new algorithm one of the most efficient solutions in practical cases
433. Faro (S.), évêque de Meaux
433. Faro (S.), évêque de Meaux. In: Molinier Auguste. Les Sources de l'histoire de France - Des origines aux guerres d'Italie (1494). I. Époque primitive, mérovingiens et carolingiens. Paris : A. Picard et fils, 1901. p. 138
Improved characters distance sampling for online and offline text searching
Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text
Quantum Path Parallelism: A Circuit-Based Approach to Text Searching
Text searching problems are fundamental problems in computer science whose applications are found in a variety of fields. The string matching problem is the common framework for the class of text searching problems where the objective is to find all, possibly approximate, occurrences of a given pattern of length m within a larger text of length n. In this paper, we present the quantum path parallelism approach, a general strategy based on quantum computation that can be easily adapted to a variety of nonstandard text searching problems. Our method translates a text searching problem to an automata-based string recognition problem, associating each possible approximation of the pattern with a different path accepted by the automaton. Under favourable conditions, our proposed method solves the approximate search problem in O(nmlog2(n)) time, offering a quadratic speed-up over classical solutions. In other cases such speed-up is achieved only for short patterns, i.e. assuming m=O(log(n)). As a demonstration of the flexibility of our method, we show how it can be adapted for solving some approximate string matching problems, obtaining a significant quantum advantage
Speeding up string matching by weak factor recognition
String matching is the problem of finding all the substrings of a text which match a given pattern. It is one of the most investigated problems in computer science, mainly due to its very diverse applications in several fields. Recently, much research in the string matching field has focused on the efficiency and flexibility of the searching procedure and quite effective techniques have been proposed for speeding up the exist- ing solutions. In this context, algorithms based on factors recognition are among the best solutions.
In this paper, we present a simple and very efficient algorithm for string matching based on a weak factor recognition and hashing. Our algorithm has a quadratic worst-case running time. However, despite its quadratic complexity, experimental results show that our algorithm obtains in most cases the best running times when compared, un- der various conditions, against the most effective algorithms present in literature. In the case of small alphabets and long patterns, the gain in running times reaches 28%. This makes our proposed algorithm one of the most flexible solutions in practical cases
Sequence searching allowing for non-overlapping adjacent unbalanced translocations
Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of tumor suppressor genes. Despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of matching sequences allowing for this kind of chromosomal alteration.
In this paper we investigate the approximate string matching problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. In particular, we first present a O(nm3)-time and O(m2)-space algorithm based on the dynamic-programming approach. Then we improve our first result by designing a second solution which makes use of the Directed Acyclic Word Graph of the pattern. In particular, we show that under the assumptions of equiprobability and independence of characters, our algorithm has a O(n log2σ m) average time complexity, for an alphabet of size σ, still maintaining the O(nm3)-time and the O(m2)-space complexity in the worst case. To the best of our knowledge this is the first solution in literature for the approximate string matching problem allowing for unbalanced translocations of factors
Efficient Online String Matching Based on Characters Distance Text Sampling
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string matching is an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a given pivot character and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions
Linear and efficient string matching algorithms based on weak factor recognition
We present a simple and very efficient algorithm for string matching based on the combination of weak factor recognition and hashing. Despite its quadratic worst-case running time, our algorithm exhibits a sublinear behaviour. We also propose some practical improvements of our algorithm and a variant with a linear worst- case time complexity. Experimental results show that, in most cases, some of the variants of our algorithm obtain the best running times when compared, under various conditions, against the most effective algorithms present in the literature. For instance, in the case of small alphabets and long patterns, the gain in running time is up to 18%. This makes our proposed algorithm one of the most flexible solutions in practical cases
- …
