Search CORE

1,720,974 research outputs found

Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation

Author: Corbara Silvia
Moreo Alejandro
Publication venue
Publication date: 01/01/2024
Field of study

Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author (A) or by someone else ( A ̄ ̄ ̄ ̄ ). Itehas been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, oreby imitating the style of another author. Inethis paper, weeinvestigate the potential benefits of augmenting the classifier training set with (negative) synthetic examples. These synthetic examples are generated to imitate the style of A. Weeanalyze the improvements in the classifier predictions that this augmentation brings to bear in the task of AV in an adversarial setting. Ineparticular, weeexperiment with three different generator architectures (one based on Recurrent Neural Networks, another based on small-scale transformers, and another based on the popular GPT model) and with two training strategies (one inspired by standard Language Models, and another inspired by Wasserstein Generative Adversarial Networks). Weeevaluate our hypothesis on five datasets (three of which have been specifically collected to represent an adversarial setting) and using two learning algorithms for the AV classifier (Support Vector Machines and Convolutional Neural Networks). This experimentation yields negative results, revealing that, although our methodology proves effective in many adversarial settings, its benefits are too sporadic for a pragmatical application

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Enhancing Adversarial Authorship Verification with Data Augmentation

Author: Corbara Silvia
Moreo Alejandro
Publication venue
Publication date: 01/01/2023
Field of study

Archivio istituzionale della Ricerca - Scuola Normale Superiore

: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts

Author: Alejandro Moreo
Corbara Silvia
Sebastiani Fabrizio
Fabrizio Sebastiani
Silvia Corbara
Moreo Alejandro
Tavoni Mirko
Mirko Tavoni
Publication venue
Publication date: 01/01/2022
Field of study

We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets, we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars. on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets, we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars

Crossref

Archivio istituzionale della Ricerca - Scuola Normale Superiore

The Epistle to Cangrande Through the Lens of Computational Authorship Verification

Author: Alejandro Moreo
Corbara Silvia
Sebastiani Fabrizio
Fabrizio Sebastiani
Silvia Corbara
Moreo Alejandro
Tavoni Mirko
Mirko Tavoni
Publication venue
Publication date: 01/01/2019
Field of study

The Epistle to Cangrande is one of the most controversial among the works of Italian poet Dante Alighieri. For more than a hundred years now, scholars have been debating over its real paternity, i.e., whether it should be considered a true work by Dante or a forgery by an unnamed author. In this work we address this philological problem through the methodologies of (supervised) Computational Authorship Verification, by training a classifier that predicts whether a given work is by Dante Alighieri or not. We discuss the system we have set up for this endeavour, the training set we have assembled, the experimental results we have obtained, and some issues that this work leaves open

Crossref

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Explainable authorship identification in cultural heritage applications

Author: Corbara Silvia
Sebastiani Fabrizio
Monreale Anna
Moreo Alejandro
Setzu Mattia
Publication venue
Publication date: 01/01/2024
Field of study

While a substantial amount of work has recently been devoted to improving the accuracy of computational Authorship Identification (AId) systems for textual data, little to no attention has been paid to endowing AId systems with the ability to explain the reasons behind their predictions. This substantially hinders the practical application of AId methods, since the predictions returned by such systems are hardly useful unless they are supported by suitable explanations. In this article, we explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId, with a focus on explanations addressed to scholars working in cultural heritage. In particular, we assess the relative merits of three different types of XAI techniques (feature ranking, probing, factual and counterfactual selection) on three different AId tasks (authorship attribution, authorship verification and same-authorship verification) by running experiments on real AId textual data. Our analysis shows that, while these techniques make important first steps towards XAI, more work remains to be done to provide tools that can be profitably integrated into the workflows of scholars

Archivio istituzionale della Ricerca - Scuola Normale Superiore

L’Epistola a Cangrande al vaglio della Computational Authorship Verification: risultati preliminari (con una postilla sulla cosiddetta “XIV Epistola di Dante Alighieri”)

Author: Corbara Silvia
Sebastiani Fabrizio
Moreo Alejandro
Tavoni Mirko
Publication venue
Publication date: 01/01/2020
Field of study

Questo lavoro applica tecniche automatiche di “Authorship Verification” (AV) al problema di riconoscere se l’“Epistola a Cangrande” sia un’o- pera autentica di Dante Alighieri o sia invece opera di un falsario. L’al- goritmo di AV che viene utilizzato usa tecniche di “machine learning”: esso “addestra” un sistema automatico (un “classificatore”) a rilevare se un certo testo latino è di Dante o meno, esponendolo a un corpus di testi latini di Dante e di testi latini di autori coevi a Dante. L’algoritmo basa le sue ipotesi sull’analisi di un insieme di caratteristiche stilome- triche, cioè di tratti linguistici legati allo stile, le cui frequenze d’uso tendono a rappresentare la “firma” inconscia di un autore. L’analisi condotta in questo lavoro suggerisce che, delle due parti in cui l’Epistola è tradizionalmente suddivisa, nessuna è di Dante. Esperimenti in cui lo stesso sistema di AV è stato applicato a ciascun testo del corpus sugge- riscono che esso ha un grado di accuratezza abbastanza elevato, dando così credibilità alla sua ipotesi sulla paternità dell’Epistola. Nell’ultima sezione di questo lavoro applichiamo il nostro classificatore a quella che è stata ipotizzata essere la “14a Epistola di Dante”; il sistema rigetta, con grande sicurezza, l’ipotesi che questa epistola possa essere di Dante.n this work we apply techniques from computational Authorship Veri- fication (AV) to the problem of detecting whether the “Epistle to Can- grande” is an authentic work by Dante Alighieri or is instead the work of a forger. The AV algorithm we use is based on “machine learning”: the algorithm “trains” an automatic system (a “classifier”) to detect whether a certain Latin text is Dante’s or not Dante’s, by exposing it to a corpus of example Latin texts by Dante and example Latin texts by authors coeval to Dante. The detection is based on the analysis of a set of stylometric features, i.e., style-related linguistic traits whose us- age frequencies tend to represent an author’s unconscious “signature”. The analysis carried out in this work suggests that, of the two parts into which the Epistle is traditionally subdivided, neither is Dante’s. Experi- ments in which we have applied our AV system to each text in the corpus suggest that the system has a fairly high degree of accuracy, thus lending credibility to its hypothesis about the authorship of the Epistle. In the last section of this paper we apply our system to what has been hypothesized to be “Dante’s 14th Epistle”; the system rejects, with very high confi- dence, the hypothesis that this epistle might be Dante’s

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Investigating topic-agnostic features for authorship tasks in Spanish political speeches

Author: Corbara Silvia
Chulvi Berta
Moreo Alejandro
Rosso Paolo
Publication venue
Publication date: 01/01/2022
Field of study

Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical informatio

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Same or Different? Diff-Vectors for Authorship Analysis

Author: Alejandro Moreo
Corbara Silvia
Sebastiani Fabrizio
Fabrizio Sebastiani
Silvia Corbara
Moreo Alejandro
Publication venue
Publication date: 01/01/2023
Field of study

Crossref

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Rhythmic and psycholinguistic features for authorship tasks in the Spanish parliament : evaluation and analysis

Author: Corbara Silvia
Chulvi Berta
Moreo Alejandro
Rosso Paolo
Publication venue
Publication date: 01/01/2022
Field of study

Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topicagnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and sycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors’ political affiliation and communication styl

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)