1,721,186 research outputs found
QuestionCube: A framework for question answering
QuestionCube is a framework for Question Answering (QA) that combines several techniques to retrieve passages containing the exact answers for natural language questions. It exploits: (a) Natural Language Processing algorithms for question and candidate answers analysis both in English and Italian; (b) Information Retrieval probabilistic models for candidate answers retrieval and (c) Machine Learning methods for question classification. The data source for the answer is an unstructured text document collection stored in search indices. In this paper an overview of the QuestionCube framework architecture is provided, together with a description of Wikiedi, a QA system for Wikipedia which exploits the proposed framework
Encoding syntactic dependencies using Random Indexing and Wikipedia as a corpus
Distributional approaches are based on a simple hypothesis: the meaning of a word can be inferred from its usage. The application of that idea to the vector space model makes possible the construction of a WordSpace in which words are represented by mathematical points in a geometric space. Similar words are represented close in this space and the definition of "word usage" depends on the definition of the context used to build the space, which can be the whole document, the sentence in which the word occurs, a fixed window of words, or a specific syntactic context. However, in its original formulation WordSpace can take into account only one definition of context at a time. We propose an approach based on vector permutation and Random Indexing to encode several syntactic contexts in a single WordSpace. We adopt WaCkypedia EN corpus to build our WordSpace that is a 2009 dump of the English Wikipedia (about 800 million tokens) annotated with syntactic information provided by a full dependency parser. The effectiveness of our approach is evaluated using the GEometrical Models of natural language Semantics (GEMS) 2011 Shared Evaluation data
Natural browsing
Natural Browsing is an ongoing industrial research project3 which aims to develop a framework able to automatically build a knowledge base from unstructured data. The project relies on NLP methods and Semantic Web technologies in order to mine facts from data
UNIBA-CORE: Combining Strategies for Semantic Textual Similarity.
This paper describes the UNIBA participation in the Semantic Textual Similarity (STS) core
task 2013. We exploited three different systems for computing the similarity between two texts. A system is used as baseline, which represents the best model emerged from our previous
participation in STS 2012. Such system is based on a distributional model of semantics capable of taking into account also syntactic
structures that glue words together. In addition, we investigated the use of two different learning strategies exploiting both syntactic
and semantic features. The former uses a combination strategy in order to combine the best machine learning techniques trained on
2012 training and test sets. The latter tries to
overcame the limit of working with different datasets with varying characteristics by selecting only the more suitable dataset for the training purpose
- …
