Search CORE

1,721,031 research outputs found

MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)

Author: Navigli Roberto
Tedeschi Simone
Publication venue
Publication date: 01/01/2022
Field of study

Named Entity Recognition (NER) is the task of identifying named entities in texts and classifying them through specific semantic categories, a process which is crucial for a wide range of NLP applications. Current datasets for NER focus mainly on coarse-grained entity types, tend to consider a single textual genre and to cover a narrow set of languages, thus limiting the general applicability of NER systems.In this work, we design a new methodology for automatically producing NER annotations, and address the aforementioned limitations by introducing a novel dataset that covers 10 languages, 15 NER categories and 2 textual genres.We also introduce a manually-annotated test set, and extensively evaluate the quality of our novel dataset on both this new test set and standard benchmarks for NER.In addition, in our dataset, we include: i) disambiguation information to enable the development of multilingual entity linking systems, and ii) image URLs to encourage the creation of multimodal systems. We release our dataset at https://github.com/Babelscape/multinerd

Archivio della ricerca- Università di Roma La Sapienza

NER4ID at SemEval-2022 Task 2: Named Entity Recognition for Idiomaticity Detection

Author: Navigli Roberto
Tedeschi Simone
Publication venue
Publication date: 01/01/2022
Field of study

Idioms are lexically-complex phrases whose meaning cannot be derived by compositionally interpreting their components. Although the automatic identification and understanding of idioms is essential for a wide range of Natural Language Understanding tasks, they are still largely under-investigated. This motivated the organization of the SemEval-2022 Task 2, which is divided into two multilingual subtasks: one about idiomaticity detection, and the other about sentence embeddings. In this work, we focus on the first subtask and propose a Transformer-based dual-encoder architecture to compute the semantic similarity between a potentially-idiomatic expression and its context and, based on this, predict idiomaticity. Then, we show how and to what extent Named Entity Recognition can be exploited to reduce the degree of confusion of idiom identification systems and, therefore, improve performance. Our model achieves 92.1 F1 in the one-shot setting and shows strong robustness towards unseen idioms achieving 77.4 F1 in the zero-shot setting. We release our code at https://github.com/Babelscape/ner4id

Archivio della ricerca- Università di Roma La Sapienza

L’elasticità della domanda di sigarette rispetto al prezzo

Author: Tedeschi Simone
simone tedeschi
Publication venue
Publication date: 01/01/2016
Field of study

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Archivio della Ricerca - Università di Roma 3

Volatilità dei consumi: il ruolo della proprietà immobiliare nella protezione dai rischi non assicurabili e le preferenze delle famiglie Italiane

Author: Tedeschi Simone
simone tedeschi
Publication venue
Publication date: 01/01/2015
Field of study

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Archivio della Ricerca - Università di Roma 3

ID10M: Idiom Identification in 10 Languages

Author: Martelli Federico
Navigli Roberto
Tedeschi Simone
Publication venue
Publication date: 01/01/2022
Field of study

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely under-investigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a high-quality automatically-created training dataset in 10 languages, along with a novel manually-curated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https://github.com/Babelscape/ID10M

Archivio della ricerca- Università di Roma La Sapienza

Micro Data Fusion of Italian Expenditures and Incomes Surveys

Author: Tedeschi S
Tedeschi Simone
Simone Tedeschi
Elena Pisano
Publication venue
Publication date: 01/01/2014
Field of study

The aim of this work is to match household consumption information from Indagine sui Consumi delle Famiglie (Household Budget Survey, HBS) by the Italian National Statistical Institute (ISTAT) with Indagine sui Bilanci delle Famiglie Italiane (Survey of Householdsâ€TM Income and Wealth, SHIW) by the Bank of Italy for the year 2010. The work offers a review of the main matching methodologies, coupled with adiscussion of the underlying hypotheses (such as the CIA) which, in our case, are less demanding to assume given the presence consumption aggregates as common variables between the two surveys. Moreover, some tests measuring the validity of the matching procedure are presented in order to check the preservation of joint distributions.The resulting sample is expected to allow better distributional and micro-econometric analyses onconsumption income and wealth (e.g. Engel curves, consumption age/income profiles). Moreover, the very detailed integrated dataset would constitute a platform for an integrated microsimulation analysis of direct, indirect and wealth tax reforms which, so far, has not been feasible taking available sample surveys separately.Our matching achieves a good preservation of the marginal distributions of all consumption aggregates from the donor survey. However, a thorough comparison of the original distributions suggests that the HBS is a convenient donor for the imputation of non-durable commodities only. Consumption aggregates closer to the concept of wealth (such as durables and the extraordinary expenditure for dwelling maintenance) or savings (such as mortgages and private pensions) prove to be better assessed by the longer - and more issue-specific - recall of the SHIW. As secondary outcomes, the information derived from HBS on non-durables entails an increase in the dispersion and an upward adjustment of consumption profiles in the synthetic distribution relative to SHIW. This implies also a downsized average propensity to save for the household sector which gets closer to the National Accounts figures

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Archivio della Ricerca - Università di Roma 3

La riforma della tassazione del tabacco in Italia: effetti e criticità

Author: LIBERATI PAOLO
TEDESCHI SIMONE
CRESPI FRANCESCO
Publication venue
Publication date: 01/01/2016
Field of study

Archivio della Ricerca - Università di Roma 3

Towards comprehensive and efficient information extraction across languages

Author: TEDESCHI SIMONE
Publication venue
Publication date: 24/01/2025
Field of study

The exponential growth of textual data shared online has created an urgent need for methods that can effectively extract, structure, and interpret information from vast and varied texts. Information Extraction (IE), a key area within Natural Language Processing (NLP), addresses this need by transforming unstructured text into structured formats enabling automated text analytics and decision-making. However, existing IE systems face substantial challenges in scalability and generalization. These challenges include limited labeled data for low-resource languages, computational demands that restrict accessibility to only well-resourced institutions, and a predominant focus on popular entities. Additionally, most IE tasks are entity-centric tasks (e.g. Named Entity Recognition, Entity Disambiguation, and Relation Extraction), thus overlooking the broader contextual richness present in many texts. This thesis aims at advancing the field of IE by tackling these critical issues through novel resources, methodologies, and theoretical approaches aimed at fostering a multilingual, scalable, and semantically-enriched IE framework. To bridge the multilingual gap, we leverage a combination of neural and knowledge-based approaches and create multilingual datasets for NER and Relation Extraction, ensuring that IE systems can operate effectively across diverse linguistic settings. On the computational front, we propose optimizations designed to reduce the resource requirements of IE models, especially in the context of Entity Disambiguation, enabling broader adoption of NLP technologies by reducing dependence on high-performance hardware and extensive labeled datasets. Additionally, this work challenges traditional IE frameworks by expanding the focus beyond named entities to encompass abstract concepts, idiomatic expressions, and tail entities, which are essential for a more nuanced and comprehensive understanding of texts. Through these contributions, this research aims to establish a robust foundation for multilingual, resource-efficient IE systems that can meet the evolving demands of global text analytics across varied languages, domains, and cultural contexts. Finally, to further encourage the usage and development of multilingual IE systems, we publicly release all the artifacts -- datasets and models -- introduced in this thesis

Archivio della ricerca- Università di Roma La Sapienza

Preferences for public education spending in hierarchical education systems: theory and empirical evidence from OECD countries

Author: Debora Di Gioacchino
Laura Sabani
Tedeschi Simone
Simone Tedeschi
Publication venue
Publication date: 01/01/2018
Field of study

This paper analyses the factors affecting preferences for public education spending, focusing on household income and other individuals’ characteristics as well as on institutional features. Standard redistributive arguments à la Meltzer and Richard (1981) suggest that the impact of household income on preferences should be negative since richer families are likely to oppose the redistributive effect of public funding. However, the empirical evidence does not seem to confirm this prediction. To shed some light on this issue, our proposed interpretative key hinges on the hierarchical structure of the education system. To this purpose, we set up a model in which agents are heterogeneous in terms of income and education and human capital is produced in a two-tier education system. We show that individual preferences for public education spending are affected by household income and by variables related to the socioeconomic context, such as income inequality and social inclusiveness of the education system, which determine the ultimate redistributive effect of public spending. We are able to test some of the predictions of our model using individuals’ data from ISSP (2006 wave). The econometric analysis points out that household income is, unambiguously, a negative predictor of preferences when considering openly redistributive education expenses. Differently, when considering general schooling expenses, the intensity and even the direction of the income effect is affected by income inequality and by the social inclusiveness of the education system. We also assess the presence of significant residual variability in the income coefficient, due to unobserved factors, which for the most part is due to the individual within-country rather than to the cross-country leve

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Archivio della Ricerca - Università di Roma 3

Smokers are different: The impact of price increases on smoking reduction and downtrading

Author: Crespi Francesco
Paradiso Massimo
Tedeschi Simone
Liberati Paolo
Scialà Antonio
Publication venue
Publication date: 01/01/2020
Field of study

Using data from an ad hoc survey conducted in July 2016 on Italian smokers’ habits, we investigate how different categories of smokers react to different types of price changes by means of latent class econometric analysis. While the previous literature focused on the effects of general price changes and overlooked substitution effects among brands, the present analysis unveils that the probability of reducing cigarette consumption is always higher for uniform rather than uneven price increases across brands. Moreover, downtrading to cheaper products is found to increase with the size of price changes, provided that these are uneven across brands. Finally, we provide a range for the implicit elasticity of cigarette demand. While being inelastic on average, it ranges between 0.2 and 0.9 depending on the smoker category. These findings have important implications for the design of both health and tax policies, as they provide new insights into the potential reactions of smokers to policy interventions

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Archivio della Ricerca - Università di Roma 3