1,720,993 research outputs found
Topic Modelling Games
This paper presents a new topic modelling framework inspired by game theoretic principles. It is formulated as a normal form game in which words are represented as players and topics as strategies that the players select. The strategies of each player are modelled with a probability distribution guided by a utility function that the players try to maximize. This function induces players to select strategies similar to those selected by similar players and to choice strategies not shared with those selected by dissimilar players. The proposed framework is compared with state-of-the-art models demonstrating good performances on standard benchmarks
Evolutionary game theoretic models for natural language processing
This thesis is aimed at discovering new learning algorithms inspired by principles of biological evolution, which are able to exploit relational and contextual information, viewing clustering and classification problems in a dynamical system perspective. In particular, we have investigated how game theoretic models can be used to solve different Natural Language Processing tasks. Traditional studies of language have used a game-theoretic perspective to study how language evolves over time and how it emerges in a community but to the best of our knowledge, this is the first attempt to use game-theory to solve specific problems in this area.
These models are based on the concept of equilibrium, a state of a system, which emerges after a series of interactions among the elements, which are part of it. Starting from a situation in which there is uncertainty about a particular phenomenon, they describe how a disequilibrium state resolves in equilibrium. The games are situations in which a group of objects has to be classified or clustered and each of them has to choose its collocation in a predefined set of classes. The choice of each one is influenced by the choices of the other and the satisfaction that a player has, about the outcome of a game, is determined by a payoff function, which the players try to maximize. After a series of interactions the players learn to play their best strategies, leading to an equilibrium state and to the resolution of the problem.
From a machine-learning perspective this approach is appealing, because it can be employed as an unsupervised, semi-supervised or supervised learning model. We have used it to resolve the word sense disambiguation problem. We casted this task as a constraint satisfaction problem, where each word to be disambiguated is con- strained to choose the most coherent sense among the available, according to the sense that the words around it are choosing. This formulation ensures the mainte- nance of textual coherence and has been tested against state-of-the-art algorithms with higher and more stable results.
We have also used a game theoretic formulation, to improve the clustering results of dominant set clustering and non-negative matrix factorization technique. We evaluated our system on different document datasets through different approaches, achieving results, which outperform state-of-the-art algorithms.
This work opened new perspectives in game theoretic models, demonstrating that these approaches are promising and that they can be employed also for the resolution of new problems
Linguistically Based QA by Dinamyc LOD Access from Logical Form
We present a system for Question Answering which computes a prospective answer from Logical Forms (hence LFs) produced by a full-fledged NLP for text understanding, and then maps the result onto schemata in SPARQL to be used for accessing the Semantic Web. As an intermediate step, and whenever there are complex concepts to be mapped, the system looks for a corresponding amalgam in YAGO classes. This is what happens in case the query to be constructed has [president,'United States'] as its goal, and the amalgam search will produce the complex concept [PresidentOfTheUnitedStates]. In case no class has been recovered, as for instance in the query related to the complex structure [5th,president,'United States'] the system knows that the cardinal figure '5th' behaves like a quantifier restricting the class of [PresidentOfTheUnitedStates]. In fact LFs are organized with a restricted ontology made up of 7 types: FOCus, PREDicate, ARGument, MODifier, ADJunct, QUANTifier, INTensifier, CARDinal. In addition, every argument has a Semantic Role to tell Subject from Object and Referential from non-Referential predicates. Another important step in the computation of the final LF, is the translation of the interrogative pronoun into a corresponding semantic class word taken from general nouns, in our case the highest concepts of WordNet hierarchy.
The result is mapped into classes, properties, and restrictions (filters) as for instance in the question:
Who was the wife of President Lincoln ?
which becomes the final LF:
be-[focus-person, arg-[wife/theme_bound], arg-['Lincoln'/theme-[mod-[pred-['President']]]]]
and is then turned into the SPARQL expression,
?x dbpedia-owl:spouse :Abraham_Lincoln
where "dbpedia-owl:spouse" is produced by searching the DBpedia properties and in case of failure looking into the synset associated to the concept as WIFE. In particular then, the concept "Abraham_Lincoln" is derived from DBpedia by the association of a property and an entity name, "President" and "Lincoln", which contextualizes the reference of the name to the appropriate referent in the world.
It is just by the internal structure of the Logical Form that we are able to produce a suitable and meaningful context for concept disambiguation. Logical Forms are the final output of a complex system for text understanding - GETARUNS - which can deal with different levels of syntactic and semantic ambiguity in the generation of a final structure, by accessing computational lexical equipped with sub-categorization frames and appropriate selectional restrictions applied to the attachment of complements and adjuncts. The system also produces pronominal binding and instantiates the implicit arguments, if needed, in order to complete the required Predicate Argument structure which is licensed by the semantic component
Transductive Learning Games for Word Sense Disambiguation
Word Sense Disambiguation (WSD) is the task of identifying the intended sense of a word in a computational manner based on the context in which it appears. Understanding the ambiguity of natural languages is considered an AI-hard problem. Computational problems like this are the central objectives of Artificial Intelligence (AI) and Natural Language Processing (NLP) because they aim to solve the epistemological question of how the mind works. It has been studied since the beginning of NLP, and today is a central topic of this discipline
Semantics for social media
In this paper we present four experiments on the analysis Italian social media texts using a linguistically-based semantic approach. The experiments are respectively: two on newspaper articles about two political crises, one on a twitter corpus centered on political themes, and one on a case study of strategic plan programs of candidates to the presidency of our university. The analyses carried out by the same system, focus on semantic features of texts highlighting three main traits: “factivity” or factuality, “subjectivity” and polarity. The system uses semantic knowledge derived from deep linguistic analysis at propositional level to classify texts at a fine-grained level. As will be shown in the paper, linguistically-based semantic information allows for neat distinction of writing styles when comparing newspapers writing styles, for irony detection in tweets, and in different degrees, for making readability judgements
Game Theory Meets Embeddings: a Unified Framework for Word Sense Disambiguation
Game-theoretic models, thanks to their intrinsic ability to exploit contextual information, have shown to be particularly suited for the Word Sense Disambiguation task. They represent ambiguous words as the players of a non cooperative game and their senses as the strategies that the players can select in order to play the games. The interaction among the players is modeled with a weighted graph and the payoff as an embedding similarity function, that the players try to maximize. The impact of the word and sense embedding representations in the framework has been tested and analyzed extensively: experiments on standard benchmarks show state-of-art performances and different tests hint at the usefulness of using disambiguation to obtain contextualized word representations
Document Clustering Games
In this article we propose a new model for document clustering, based on game theoretic principles. Each document to be clustered is represented as a player, in the game theoretic sense, and each cluster as a strategy that the players have to choose in order to maximize their payoff. The geometry of the data is modeled as a graph, which encodes the pairwise similarity among each document and the games are played among similar players. In each game the players update their strategies, according to what strategy has been effective in previous games. The Dominant Set clustering algorithm is used to find the prototypical elements of each cluster. This information is used in order to divide the players in two disjoint sets, one collecting labeled players, which always play a definite strategy and the other one collecting unlabeled players, which update their strategy at each iteration of the games. The evaluation of the system was conducted on 13 document datasets and shows that the proposed method performs well compared to different document clustering algorithms
Document Clustering Games in Static and Dynamic Scenarios
In this work we propose a game theoretic model for document clustering. Each document to be clustered is represented as a player and each cluster as a strategy. The players receive a reward interacting with other players that they try to maximize choosing their best strategies. The geometry of the data is modeled with a weighted graph that encodes the pairwise similarity among documents, so that similar players are constrained to choose similar strategies, updating their strategy preferences at each iteration of the games. We used different approaches to find the prototypical elements of the clusters and with this information we divided the players into two disjoint sets, one collecting players with a definite strategy and the other one collecting players that try to learn from others the correct strategy to play. The latter set of players can be considered as new data points that have to be clustered according to previous information. This representation is useful in scenarios in which the data are streamed continuously. The evaluation of the system was conducted on 13 document datasets using different settings. It shows that the proposed method performs well compared to different document clustering algorithms
Semantics and Discourse Processing for Expressive TTS
In this paper we present ongoing work to produce an expressive TTS reader that can be used both in text and dialogue applications. The system has been previously used to read (English) poetry and it has now been extended to apply to short stories. The text is fully analyzed both at phonetic and phonological level, and at syntactic and semantic level. The core of the system is the Prosodic Manager which takes as input discourse structures and relations and uses this information to modify parameters for the TTS accordingly. The text is transformed into a poem-like structures, where each line corresponds to a Breath Group, semantically and syntactically consistent. Stanzas correspond to paragraph boundaries. Analogical parameters are related to ToBI theoretical indices but their number is doubled
Analysis of Italian Word Embeddings
In this work we analyze the performances of two of the most used word embeddings algorithms, skip-gram and continuous bag of words on Italian language. These algorithms have many hyper-parameter that have to be carefully tuned in order to obtain accurate word representation in vectorial space. We provide an extensive analysis and an evaluation, showing what are the best configuration of parameters for specific analogy tasks
- …
