1,721,035 research outputs found
Contextualized BERT Sentence Embeddings for Author Profiling: The Cost of Performances
The necessity to know information about the real identity of an online subject is a highly relevant issue in User Profiling, especially for analysis from digital sources such as social media. The digital identity of a user does not always present explicit data about her offline life such as age, gender, work, and more. This problem makes the task of user profiling complex and incomplete. For many years this issue has received a considerable amount of attention from the whole community, which has developed several solutions, also based on machine learning, to estimate user characteristics. The increasing diffusion of deep learning approaches has allowed, on the one hand, to obtain a considerable increase in predictive performance, but on the other hand, to have available models that cannot be interpreted and that require very high computational power. Considering the validity of new pre-trained language models on extensive data for resolving many natural language processing and classification tasks, we decided to propose a BERT-based approach (BERT-DNN) also for the author profiling task. In a first analysis, we compared the results obtained by our model with them of more classical approaches. As a follow, a critical analysis was carried out. We analyze the advantages and disadvantages of these approaches also in terms of resources needed to run them. The results obtained by our model are encouraging in terms of reliability but very disappointing if we consider the computational power required for running it
An investigation on the impact of natural language on conversational recommendations
In this paper, we investigate the combination of Virtual Assistants and Conversational Recommender Systems (CoRSs) by designing and implementing a framework named ConveRSE, for building chatbots that can recommend items from different domains and interact with the user through natural language. An user experiment was carried out to understand how natural language influences both the cost of interaction and recommendation accuracy of a CoRS. Experimental results show that natural language can indeed improve user experience, but some critical aspects of the interaction should be mitigated appropriately
Generating post hoc review-based natural language justifications for recommender systems
In this article, we present a framework to build post hoc natural language justifications that supports the suggestions generated by a recommendation algorithm. Our methodology is based on the intuition that reviews’ excerpts contain much relevant information that can be used to justify a recommendation; thus, we propose a black-box explanation strategy that takes as input a recommended item and a set of reviews and builds as output a post hoc natural language justification which is completely independent of the underlying recommendation model. To validate our claims, we also introduce three different implementations of our conceptual framework: the first one uses natural language processing and sentiment analysis techniques to identify relevant and distinguishing aspects discussed in the reviews and combines reviews’ excerpts mentioning these aspects in a natural language justification which is presented to the target user. The second implementation extends the first one by introducing automatic aspect extraction and text summarization, which are exploited to generate a unique synthesis presenting the main characteristics of the item that is used as justification. Finally, the third implementation tackles the problem of generating a context-aware justification, that is to say, a justification that differs on varying of the different contextual situations, by automatically learning a lexicon for each contextual setting and by using such a lexicon to diversify the justifications. In the experimental evaluation, we carried out three user studies in different domains, and the results showed that our methodology is able to make the recommendation process more transparent, engaging and trustful for the users, thus confirming the validity of the intuitions behind this work
Adapting a Large Language Model to the Legal Domain: A Case Study in Italian
This work presents a methodology for adapting an open Large Language Model (LLM) to the Italian legal domain. We construct a legal document corpus from the Normattiva website and develop a custom scraper to ensure high-quality text extraction. The resulting corpus is used to adapt the Llama-3.1-8b model through continuous pre-training and Low-Rank Adaptation (LoRA). The adapted model's performance is evaluated by assessing its ability to complete sentences coherently within the new domain. Results demonstrate that the adapted model surpasses the original model across all metrics, considering various prompt lengths and different sizes of the training corpus
Humanoid Robots and Conversational Recommender Systems: A Preliminary Study
Conversational Recommender Systems (CoRSs) implement a paradigm in which users can interact with the system to define their preferences and discover items that best fit their needs. When the CoRS is implemented as a dialog agent, user and recommender interact by exchanging text messages. However, there is little evidence on how effective the interaction is when the CoRS is implemented through a Social Humanoid Robot. In this paper, we evaluate the possibility of introducing an interface based on a Social Humanoid Robot in ConveRSE, a domain-independent framework for the development of Conversational Recommender Systems. The novel interface will be compared against the existing chatbot-based one. The objective is to discover whether the framework can adapt to the new interface without worsening user experience and accuracy. We carried out a preliminary study, which involved 20 subjects. Results proved that, even though there are differences in how users approach the system using the two interfaces, there is no significant difference in its performance
Analysis of lexical semantic changes in corpora with the diachronic engine
With the growing availability of digitized diachronic corpora, the need for tools capable of taking into account the diachronic component of corpora becomes ever more pressing. Recent works on diachronic embeddings show that computational approaches to the diachronic analysis of language seem to be promising, but they are not user friendly for people without a technical background. This paper presents the Diachronic Engine, a system for the diachronic analysis of corpora lexical features. Diachronic Engine computes word frequency, concordances and collocations taking into account the temporal dimension. It is also able to compute temporal word embeddings and time-series that can be exploited for lexical semantic change detection
A comparative study of approaches for the diachronic analysis of the Italian language
In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic drift. Most of those approaches rely on diachronic word embeddings. Some of them are created as post-processing of static word embeddings, while others produce dynamic word embeddings where vectors share the same geometric space for all time slices. The large majority of the methods use English as the target language for the diachronic analysis, while other languages remain under-explored. In this work, we compare state-of-the-art approaches in computational historical linguistics to evaluate the pros and cons of each model, and we present the results of an in-depth analysis conducted using an Italian diachronic corpus. Specifically, several approaches based on both static embeddings and dynamic ones are implemented and evaluated by using the Kronos-It dataset. We train all word embeddings on the Italian Google n-gram corpus. The main result of the evaluation is that all approaches fail to significantly reduce the number of false-positive change points, which confirms that lexical semantic change is still a challenging task
Random indexing for content-based recommender systems
The use of Vector Space Models (VSM) in the area of Information Retrieval is an established practice, thanks to its very clean and solid formalism that allows us to easily represent objects in a vector space and to perform calculations on them. The goal of this work is to investigate the impact of VSM on Recommender Systems (RS) performance. Specifically, we will introduce two approaches: The first is based on a dimensionality reduction technique called Random Indexing, while the second extends the previous one by integrating a negation operator implemented in the Semantic Vectors open-source package. The results emerged from the experimental evaluation confirmed the predictive accuracy of the model. This work summarizes the results already presented in the RecSys 2010 Doctoral Consortium
- …
