1,721,007 research outputs found
Replication Data for: Putting the sorting hat on J.K. Rowling’s reader. A digital inquiry into the age of the implied readership of the Harry Potter series
Code (Jupyter Notebook) for calculating various textual features for the Harry Potter series (a.o. MATTR, Readability, Lexical density). Additionally, the generated topic models can be downloaded (pickle, csv and pdf visuals)
Middle Dutch syllabified words
Specifics of the data:
Text file (syllabified_crm.txt) containing 43,710 syllabified Middle Dutch words, taken from the Corpus Van Reenen-Mulder. This corpus, created by Pieter van Reenen en Maaike Mulder at the Free University Amsterdam, contains about 2,500 Middle Dutch charters. It has about 750,000 tokens. The charters were written in the Netherlands and Flanders between 1300 and 1400.
The 43,710 syllabified words in this list is the total amount of unique words from the Corpus Van Reenen-Mulder. Some tokens from this corpus were, however, excluded when assembling the data set due to the fact that they contained diacritic symbols to indicate abbreviations, clitics, or unclear parts in the original charter.
A dash-symbol (-) is used as separator.
Apart from the entire data set, this DOI also includes:
A pdf-file visualizing the data set
The splits used for the automatic syllabification experiment by Haverals, Kestemont & Karsdorp (2018).
A gold standard out-of-corpus sample of 1,748 Middle Dutch words, taken at random from the Cd-rom Middelnederlands, also used in the above-mentioned syllabification experimen
Discoverability in a Digital Library: A Study of "Rabbit Holes" within Gallica's Corpus
The phenomenon of aimless web navigation, often compared to falling "down the rabbit hole, " brings to light significant aspects of the Internet's "long tail" concept. This research examines whether longer, non-goal-oriented web sessions genuinely lead users into the long tail of digital libraries, thereby exploring the discoverability of cultural heritage. The focus of this study is on Gallica, the French national library's online platform. This work aims to identify and characterize such sessions within Gallica, defining rabbit holes as long and diversified navigation sessions. The difÏculty lies in identifying rabbit holes within server logs, which requires a mixed-methods approach involving interviews, qualitative studies, and simple statistical analyses. Despite Gallica's lack of hypertextual structure, we show that users do engage in rabbit hole-like behavior, navigating through keyword searches and filters. The study's findings align with user testimonies. A crucial conclusion is that rabbit holes in Gallica do not generally lead users to less-consulted content. This limitation is attributed to the search engine, which users must somewhat "hack" to navigate effectively. Enhancing Gallica's discoverability tools without compromising the existing user experience is essential for improving content accessibility.LHS
Middle Dutch syllabified words
Specifics of the data:
Text file containing 43,703 syllabified Middle Dutch words, taken from the Corpus Van Reenen-Mulder. This database, created by Pieter van Reenen en Maaike Mulder at the Free University Amsterdam, contains about 2,500 Middle Dutch Middle Dutch. It has about 750,000 tokens. The charters were written in the Netherlands and Flanders between 1300 and 1400.
The 43,703 syllabified words in this list is the total amount of unique words from the Corpus Van Reenen-Mulder. This number, however, is an approximation due to the fact that some words contain diacritic symbols to indicate abbreviations, clitics, or unclear parts in the original charter. These words were disregarded when assembling the data.
A dash-symbol (-) is used as separator
Dataset of Middle Dutch lexical stress patterns and syllabifications
This dataset consists of 48.219 Middle Dutch words taken from in total 205 rhymed texts of the Cd-rom Middelnederlands (1998). All of these words have been assigned a syllabification and lexical stress pattern.
E.g.: proevede is syllabified as proe-ve-de and has a stress index set at -3, which means that – counting from the rightmost syllable – the third syllable receives stress.
This upload contains the following files:
The JSON-file (compressed), which was used as input data for a machine learning algorithm trained for the automatic syllabification and stress assignment of Middle Dutch polysyllabic words (for the code of this experiment, see GitHub)
An Excel-file, containing the same data as the JSON (for more convenient reference)
A split file (compressed), used in the training proces of the above-mentioned experiment
A pdf-file with some insightful illustrations about the contents of the dataset
This dataset is part of the research of Wouter Haverals (FWO, University of Antwerp), carried out under the supervision of prof. Mike Kestemont and em. prof. Frank Willaert.</p
From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored
Abstract: This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. A) and Brussels, KBR, 2877-2878 (ms. B). On the basis of codicological and contextual arguments, it is assumed that the scribe who produced B used A as an exemplar. While the similarities in both layout and content between the two manuscripts are striking, the present article seeks to identify the differences. After all, regardless of the intention to produce a copy that closely follows the exemplar, subtle linguistic variation is apparent. Divergences relate to spelling conventions, but also to the way in which words are abbreviated (and the extent to which abbreviations occur). The present study investigates the spelling profiles of the scribes who produced mss. A and B in a computational way. In the first part of this study, we will present both manuscripts in more detail, after which we will consider prior research carried out on scribal profiling. The current study both builds and expands on Kestemont (2015). Next, we outline the methodology used to analyse and measure the degree of scribal appropriation that took place when ms. B was copied off the exemplar ms. A. After this, we will discuss the results obtained, focusing on the scribal variation that can be found both at the level of individual words and n-grams. To this end, we use machine learning to identify the most distinctive features that separate manuscript A from B. Finally, we look at possible diachronic trends in the appropriation by B's scribe of his exemplar. We argue that scribal takeovers in the exemplar impacts the practice of the copying scribe, while transitions to a different content matter cause little to no effect
De maat van het Middelnederlands : een digitaal onderzoek naar de prosodische en ritmische kenmerken van Middelnederlandse berijmde literatuur
Abstract: Wat bedoelen we wanneer we zeggen dat het ritme van een literaire tekst \u2018hortend\u2019 of \u2018vloeiend\u2019 is? De ritmische eigenschappen van literatuur worden doorgaans met intu\ueftieve en vage termen beschreven en dat is zeker het geval voor de Middelnederlandse letterkunde. De vele berijmde teksten uit de vroegste periode van onze literatuurgeschiedenis worden regelmatig beschreven in termen zoals de hierboven genoemde. Vaak is het echter onduidelijk wat er precies mee wordt bedoeld. De studie van het Middelnederlands versritme heeft daarom veel te winnen bij een computationele benadering, waarbij vooroordelen en persoonlijke smaak buiten spel worden gezet. Dit proefschrift onderzoekt de mogelijkheid om het ritme van Middelnederlandse teksten te reconstrueren en te bestuderen door te vertrekken vanuit het eigenlijke tekstmateriaal. Het resultaat is een zogenaamde \u2018automatische scansiemachine\u2019, die een voorspelling maakt over de ritmische uitspraak van deze middeleeuwse verzen
Across the Pages:A Comparative Study of Reader Response to Web Novels in Chinese and English on Qidian and WebNovel
The evolution of online reading platforms has transformed engagement with fiction, with platforms like WebNovel bridging cultural boundaries through translated Chinese web novels. This study employs topic modeling to compare reader responses to the same stories published in Chinese on Qidian and in English on WebNovel, focusing on English and Chinese language comments. We identify shared and unique themes, revealing that while both communities emphasize characterization and story development, cultural-specific expressions and platform dynamics shape readers’ interactions. Our findings underscore the nuanced interplay between language, culture, and the affordances of digital platforms in shaping global literary consumption and community engagement.</p
- …
