1,721,015 research outputs found
A computational morphological analysis of Italian verbal system
La flessione verbale in italiano, come avviene per le altre lingue romanze, è complessa. La sua complessità deriva non solo dal numero di forme, ciascuna associata a un insieme distinto di proprietà morfosintattiche –modo, tempo, persona, numero– ma anche dalla variabilità di tali forme.
Benché il procedimento di strutturare il lessico verbale in classi possa rendere conto della variabilità nella parte terminale delle forme flesse (le desinenze), esso non può rendere conto della variabilità nella parte tematica, poiché sarebbe necessario un numero troppo alto di classi per tenere conto di tutti questi fenomeni di allomorfia. L’approccio tradizionale richiede che il parlante memorizzi una lista delle forme la cui parte tematica differisce dalle altre forme dello stesso paradigma, o in particolare dalla forma di presentazione del lessema (per i verbi italiani, l’infinito), come eccezioni.
Negli ultimi vent’anni, si è mostrato un particolare interesse nello studio della distribuzione paradigmatica dell’allomorfia, ovvero, delle modalità in cui la variabilità (la tradizionale irregolarità) tra forme di un dato paradigma (non solo per i verbi, ma anche per i nomi e gli aggettivi) posa su schemi regolari.
Questo interesse ha almeno tre motivazioni. La prima è puramente tecnica, basata sul desiderio di organizzare l'informazione morfologica nel modo più compatto possibile, sviluppando applicazioni software efficienti che analizzino, interpretino, traducano o producano testi (o parlato), senza la necessità di consultare quantitativi enormi di dati ridondanti. La seconda è nel dominio delle scienze cognitive: gli studi sulle associazioni analogiche e su come queste associazioni formino schemi regolari possono contribuire alla comprensione di come funziona il nostro cervello. La terza è sul piano didattico, poiché lo studio e l’insegnamento delle lingue possono trarre grande beneficio dalla conoscenza di tali schemi di associazione e del loro funzionamento.
L’approccio pratico di queste ricerche consiste nell'analisi della struttura paradigmatica della flessione, effettuata scomponendo il paradigma in zone che differiscono potenzialmente dalla forma del tema a partire dal quale si realizzano le singole forme flesse, ed esaminando le relazioni formali (sul livello fonologico) tra queste basi tematiche, studiando le catene di predicibilità che permettono a noi parlanti di gestire sia i lessemi regolari che quelli irregolari.
In questo lavoro ho compiuto un’analisi del sistema verbale italiano. Seguendo il punto di vista Word and Paradigm, e i ricercatori che si sono occupati di morfologia flessiva con un approccio paradigmatico, il mio obiettivo era sviluppare algoritmi e programmi per calcolare le relazioni tra le forme della coniugazione dei verbi italiani. L'insieme dei verbi considerati copre tutti i modelli di coniugazione, inclusi i verbi altamente irregolari.
Il contributo alla morfologia flessiva si articola nei seguenti punti:
– l’analisi è fatta sulle forme fonetiche, non sulle forme ortografiche. Per questo ho sviluppato un database per generare le forme di tutte le celle del paradigma nella trascrizione fonetica.
– l’analisi è completamente automatica. Ho sviluppato gli algoritmi necessari tramite il linguaggio di programmazione Java, così che ad ogni modifica del database (per aggiungere lessemi, o eventualmente applicare correzioni), o anche al passaggio di un insieme completamente diverso di dati, per analizzare altre lingue, l’intera elaborazione richiede pochi minuti di calcolo.
– l’analisi non dipende dal presupposto che la flessione avvenga nella parte terminale della parola, ovvero per suffissazione: gli algoritmi sviluppati funzionano anche per la flessione discontinua (come per esempio nelle lingue semitiche, o in parte in greco e in tedesco) con gli stessi principi.Verbal inflection in Italian, as it happens in other romance languages, is complex. Its complexity derives not just from the number of forms, each coupled with a distinct set of morphosyntactic properties –mood, tense, person, number– but also, especially, from the variability of said forms.
While the process of structuring the verbal lexicon into classes can account for the variability in the ending of the inflected forms (the desinence), it can not account for the variability in the stem part, because there would be too many classes needed to classify these phenomena of allomorphism. The traditional approach requires the speaker to memorize a list of the forms whose stem part is not identical to other forms of the same paradigm, or in particular to the presentation form of the lexeme (infinitive for Italian verbs), as exceptions.
In the last twenty years, there has been much interest in studying the paradigmatic distribution of allomorphy, or the way in which the variation (the traditional “irregularity”) between forms of a paradigm (not only of verbs, but also of nouns and adjectives) rests on regular schemes.
Said interest has at least three directions. The first one is purely technical, suggested by the desire to pack morphological information as dense as possible to build computing efficient applications that parse, interpret, analyse, translate or produce texts (or speech), without the need to peruse enormous amounts of redundant data. The second one is cognitive: studies on the analogical associations and on how these associations form patterns and schemes can contribute to the insight on how our brain works. The third one is didactical, since the learning of languages can greatly benefit from the knowledge on such patterns of association and their operation.
The practical approach of these researches has the goal of analysing the paradigmatical structure of inflection, that is, to decompose the paradigm in zones where the forms are realized on possibly distinct basic stems, and to examine the formal relations (on the phonological level) between these basic stems, studying the chains of predictability that permit us, the speakers, to handle both regular and irregular lexemes.
With this work I have carried an analysis of the Italian verbal system. Following a Word and Paradigm point of view, and researches who have studied the inflectional morphology with paradigmatic approach, my goal was to build algorithms and programs to calculate relations between the word forms comprising the whole flexion of a sample of Italian verbs. The set of evaluated verbs covers all models of conjugation, including highly irregular verbs.
The contribution to inflectional morphology articulates on these points:
– the analysis is on the phonetic forms, as opposed to orthographic forms. I have thus developed a database for generating forms for all paradigm cells in their phonetic transcription.
– the analysis is fully automated. I have developed all the algorithms needed in Java language, so that after a change in the database (for further lexemes, or possibly correction of mistakes), or even the switch to another set of data, for analysing other languages, the whole computation takes few minutes to run.
– the analysis does not depend on the supposition that inflection happens at the end of the word, or by suffixation: the algorithms developed can work with discontinuous flexion (as found in Semitic languages, or partially in German and Greek, for example) with the same principles
Dataset del progetto UniverS-Ita
Il dataset contiene il quadro complessivo dei metadati dei testi raccolti nell'ambito del Progetto PRIN 2017 UniverS-Ita. L'italiano scritto degli studenti universitari: quadro sociolinguistico, tendenze tipologiche, implicazioni didattiche. Il progetto aveva lo scopo di mappare le competenze nella scrittura formale della popolazione universitaria italiana, attraverso un campione di 2.137 studentesse e studenti di 44 atenei, rappresentativo per aree geografiche e disciplinari. I/le partecipanti hanno redatto un testo (di 250-500 parole) su una traccia comune e compilato un questionario sociobiografico di 58 domande (non tutte compaiono nel dataset in quanto alcune non hanno ottenuto un numero statisticamente rilevante di risposte). I testi prodotti sono poi stati analizzati sia quantitativamente, sia qualitativamente. Grazie a questo strumento è possibile individuare correlazioni sistematiche tra caratteristiche dei testi e profili degli e delle scriventi
Text Mining and Variants of Correspondence Analysis for analysing written Italian of University students
This paper aims at investigating the textual similarities/dissimilarities of the written Italian used by university students of different departments of North, Centre and South Italy. The students’ text data are part of a large survey dataset, collected from the University of Bologna aiming at analysing an alleged decline in Italian language and at highlighting peculiar linguistic features of the Italian language used by university students.
The text data comes from a sample of 2159 participants belonging to different departments of Italian universities.
Here we focus on studying the association between the written production of semiformal texts of Italian students and the University geographical area (North, Centre and South of Italy) through a non-symmetrical variant of simple correspondence analysis
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
Variations on the Author
“Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship
Appropriate Similarity Measures for Author Cocitation Analysis
We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
Dispelling the Myths Behind First-author Citation Counts
We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued
use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation
counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more
sophisticated methods
koamabayili/VECTRON-author-checklist: VECTRON author checklist
We have done our best to complete the author checklist relating to the use of animals in the hut study. Note that the objective for the hut study was to evaluate the IRS treatment applications for residual efficacy against Anopheles mosquitoes, including the local An. coluzzii mosquito population. Cows were only used to attract mosquitoes into the huts and no tests were carried out directly on the cows. The author checklist is intended for use with studies where experiments are carried out on animals, which is why we have had such difficulty in completing this for the hut study, as many of the questions do not relate to how the cows were used
- …
