DataverseNO
Not a member yet
    2167 research outputs found

    Four North Saami Ambipositions

    No full text
    We present a study of four North Sámi adpositions that can be used as both prepositions and postpositions and thus be termed “ambipositions”. We advance three hypotheses concerning 1) dialectal differences in use of ambipositions in North Sámi, 2) differences between their use as prepositions and postpositions, and 3) a possible typological correlation between the frequency of ambipositions and the extent to which position is used to differentiate meaning, with North Sámi at the high end of this scale. Our study tests these hypotheses against two databases representing the use of ambipositions in newspapers and in literature.The data consists of example sentences tagged for the position of the adposition, the meaning of the adposition and the source of the sentence. Chi-square tests are used to show where there are significant differences between the prepositional and postpositional uses and between uses in the eastern, central, and western portions of the North Saami dialect continuum

    Russian -n'ki words

    No full text
    The database includes Russian words ending in -n'ki of the bain'ki 'sleep' typ

    Variation in pri- and pod- attenuatives in Russian

    No full text
    The database contains the results of a psycholinguistic experiment targeted at variation in the prefix marking of Russian attenuative verbs. The Experiment_Results.xlsx spreadsheet contains all responses from each participant, for target stimuli and for the fillers. No personal information about the participants is included in the files, the informants are identified by numbers from 1 to 122. Experiment_Results.xlsx contains six spreadsheets that include the following information (more detailed descriptions are provided in the file itself): all responses of the informants, prefixes used, whether the prefixes used matched the prefixes in the RNC, the data on frequency, information about the experimental context etc. Statistical analysis was performed using the statistical software R, and statistical code is available in R_script_Chapters_7_8.R The ctree and cforest analyses were performed on the Chapter8_dataset_c_tree_forest.csv file

    Romanian Weak Pronoun Choice Data

    No full text
    The following corpus study shows that soft linguistic constraints are hard to describe and operationalize. In specific contexts, some Romanian clitic pronouns allow a choice between phonological hosts such as in că-mi dai cartea vs. că îmi dai cartea both meaning [that you give me the book]. What determines the choice between subjunction că in că-mi and prosthetic î in îmi (cf. Lombard 1976)? Popescu (2003, p. 160) argues for speech rate as surface realization trigger (monosyllabic că-mi in fast speech vs. bisyllabic că îmi in normal speech), while Dindelegan (2013, p. 388) argues for register rules (informal că-mi vs. formal că îmi). This means that formal, written language represents one extreme of a formality scale while informal, spoken language the other. Thus, a Romanian corpus of official documents, such as legal texts, is expected to contain only or significantly many forms with prosthetic î for constellations with otherwise optional variants. To test these two hypotheses, the Romanian part of the JRC-Acquis corpus (http://ec.europa.eu/dgs/jrc/) has been tagged with the RACAI tagger (http: //www.racai.ro). The resulting corpus of 62,650,821 tokens (including punctuation) has been evaluated wrt. the phenomena under scrutiny. Taking into account specific hosts, enclitic forms have been compared with their î-prosthetic counterparts. The numbers show almost no or statistically insignificant difference in usage for some specific host+clitic pairs (e.g., 3886 să îşi vs. 3852 să-şi [that to himself/ herself ], 200 ce îi vs. 110 ce-i [what to him/her]). From a usage-based perspective, these findings are clear arguments both against the register rules purported by D indelegan (2013) and against a pure speech rate hypothesis as in Popescu (2003). Since the JRC-Acquis corpus is translated from English by different translators, perhaps both native and non-native speakers of Romanian, a further corpus of original Romanian legal texts is being compiled for further analysis and comparison.The full dataset consists of (1) two tgz-files containing the pos-tagged data extracted from the JRC-Acquis corpus: enclitic forms and î-prosthetic forms. The data is xml format, which is described in (2) the description file. (3) the draft of the article as pdf-file for linguistic background

    Replication data for: Prefix variation in путать: в-. за-, пере- and с-

    No full text
    This case study of the four Natural Perfectives of the Russian simplex verb путать ‘tangle’ sheds light on the following questions: Is it possible to predict the choice of prefix when there is prefix variation in Russian? And if yes, how? Since these questions are particularly relevant for second-language learners, the author also discusses how the present study and similar ones, can be used to make second language learning of Russian more effective. The analysis is based on a database of 630 sentences from the Russian National Corpus (RNC) and takes two factors into consideration: type of construction and semantic category of the internal argument.The uploaded data contain 3 files: "Database, everything": Each sentence is tagged according to prefix, form of the verb (Active vs Passive), type of construction and semantic category of the internal argument. The four types of constructions and four types of semantic categories are explained with examples from the database inside the article. "Database_simplified": This version of the database contains the three parameters for the sentences: prefix, type of construction and semantic category of the internal argument. The simplified database was created to do statistical analyses in R. "R_putat": The R script that was used in order to produce the cTree which is presented in the article

    Replication data for: Allomorphs of French de in coordination: a reproducible study

    No full text
    It is known that French de ‘of’ can take wide scope in coordination—that is, the coordination can optionally be reduced by omitting the second de: de X et/ou (de) Y, meaning roughly ‘of X and/or (of) Y’. De has an allomorph d’ that is used when the following word begins with a vowel. This paper shows, using a large written corpus, that the two allomorphs, de and d’, do not behave the same when it comes to reduction/wide scope. Two main factors seem to be at play: resistance of the d’ allomorph to taking wide scope, and hiatus avoidance between et/ou (which are both vowel-final) and a following vowel-initial word. The existence of phonological factors that affect reduction rate implies that the grammar and/or processing architecture must retrieve some phonological information about X and Y before the final “decision” about reduction is made—or that the phonology is powerful enough to delete the second de on its own. This paper also aims to make a methodological contribution to reproducibility. The web materials accompanying the paper (scripts and documentation) allow the reader to reproduce all the steps of the data processing analysis, starting from a publicly available corpus

    Spanish deverbal adjectives (1): ivo

    No full text
    Attested adjectives in -ivo in contemporary Spanish, with information about their bases, their semantic interpretation and their morphological properties

    0

    full texts

    2,167

    metadata records
    Updated in last 30 days.
    DataverseNO
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇