University of Tartu

DSpace at Tartu University Library

Not a member yet

108495 research outputs found

Sort by

Surface-Level Morphological Segmentation of Low-resource Inuktitut Using Pre-trained Large Language Models

Author: Stenlund Mathias
Myneni Hemanadhan
Riedel Morris
Publication venue: University of Tartu Library
Publication date: 2025
Field of study

Segmenting languages based on morpheme boundaries instead of relying on language independent segmenting algorithms like Byte-Pair Encoding (BPE) has shown to benefit downstream Natural Language Processing (NLP) task performance. This can however be tricky for polysynthetic languages like Inuktitut due to a high morpheme-to-word ratio and the lack of appropriately sized annotated datasets. Through our work, we display the potential of using pre-trained Large Language Models (LLMs) for surface-level morphological segmentation of Inuktitut by treating it as a binary classification task. We fine-tune on tasks derived from automatically annotated Inuktitut words written in Inuktitut syllabics. Our approach shows good potential when compared to previous neural approaches. We share our best model to encourage further studies on down stream NLP tasks for Inuktitut written in syllabics

Lattice @MultiGEC-2025: A Spitful Multilingual Language Error Correction System Using LLaMA

Author: Seminck Olga
Dupont Yoann
Dehouck Mathieu
Wang Qi
Durandard Noé
Novikov Margo
Publication venue: University of Tartu Library
Publication date: 2025
Field of study

This paper reports on our submission to the NLP4CALL shared task on Multilingual Grammatical Error Correction (MultiGEC-2025) (Masciolini et al., 2025). We developed two approaches: fine-tuning a large language model, LLaMA 3.0 (8B), for each MultiGEC corpus, and a pipeline based on the encoderbased language model XLM-RoBERTa. During development, the first method significantly outperformed the second, except for languages that are poorly supported by LLaMA 3.0 and have limited MultiGEC training data. Therefore, our official results for the shared task were produced using the neural network system for Slovenian, while fine-tuned LLaMA models were used for the eleven other languages. In this paper, we first introduce the shared task and its data. Next, we present our two approaches, as well as a method to detect cycles in the LLaMA output. We also discuss a number of hurdles encountered while working on the shared task

The BRAGE Benchmark: Evaluating Zero-shot Learning Capabilities of Large Language Models for Norwegian Customer Service Dialogues

Author: Riess Mike
Jørgensen Tollef Emil
Publication venue: University of Tartu Library
Publication date: 2025
Field of study

This study explores the capabilities of open-weight Large Language Models in a zero-shot learning setting, testing their ability to classify the content of customer service dialogues in Norwegian from a single instruction, named the BRAGE benchmark. By comparing results against widely used downstream tasks such as question-answering and named entity recognition, we find that (1) specific instruction models greatly exceed base models on the benchmark, (2) both English and multilingual instruction models outperform the tested Norwegian models of similar sizes, and (3) the difference between base and instruction models is less pronounced than in other generative tasks, suggesting that BRAGE is a challenging benchmark, requiring precise and generalizable instruction-tuning

Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)

Author
Publication venue: University of Tartu Library
Publication date: 2025
Field of study

https://aclanthology.org/2025.resourceful-1.0

Match ‘em: Multi-Tiered Alignment for Error Analysis in ASR

Author: Parsons Phoebe
Kvale Knut
Svendsen Torbjørn
Salvi Giampiero
Publication venue: University of Tartu Library
Publication date: 2025
Field of study

We introduce “Match ‘em”: a new framework for aligning output from automatic speech recognition (ASR) with reference transcriptions. This allows a more detailed analysis of errors produced by end-to-end ASR systems compared to word error rate (WER). Match ‘em performs the alignment on both the word and character level; each relying on information from the other to provide the most meaningful global alignment. At the character level, we define a speech production motivated character similarity metric. At the word level, we rely on character similarities to define word similarity and, additionally, we reconcile compounding (insertion or deletion of spaces). We evaluated Match ‘em on transcripts of three European languages produced by wav2vec2 and Whisper. We show that Match ‘em results in more similar word substitution pairs and that compound reconciling can capture a broad range of spacing errors. We believe Match ‘em to be a valuable tool for ASR error analysis across many languages

Erikaelsed taimed globaalmuutuste ajastul: kohalike, maastiku- ja kliimategurite roll

Author: Kivastik Marianne
Publication venue: Tartu Ülikooli Kirjastus
Publication date: 09/04/2025
Field of study

Väitekirja elektrooniline versioon ei sisalda publikatsiooneHiljutised kiired kliima- ja maakasutuse muutused ohustavad elurikkuse erinevaid tahke. Eriti tundlikud on maastikumuutustele loomtolmlevad ja keeruka paljunemissüsteemiga taimeliigid, näiteks erikaelsed liigid. Erikaelsus ehk heterostüülia on taimetunnus, mille puhul koosneb taimepopulatsioon kahest või kolmest morfoloogiliselt erinevast õietüübist, mis erinevad üksteisest tolmukate ja emaka paigutuse poolest õie sees. Erikaelsuse eesmärk on soodustada tolmlemist eri õietüüpide vahel ja takistada nii iseviljastumist kui ka viljastumist sama õietüüpi omava taimega. Ühes populatsioonis peaks õietüüpe olema ligikaudu võrdselt, tagamaks kõrgeim paljunemisedukus. Käesoleva doktoritöö eesmärk oli uurida, millised maastikulised ja klimaatilised tegurid mõjutavad erikaelseid liike ning millised geneetilised ja evolutsioonilised tagajärjed võivad kaasneda maakasutuse- ning kliimamuutustega. Leidsin, et populatsioonisuurus on üks tähtsamaid erikaelsete liikide õietüüpide optimaalset tasakaalu mõjutavaid tegureid. Vastupidiselt eeldusele, et erikaelsete liikide populatsioonides on õietüüpide osakaal võrdne, tuvastasin hariliku nurmenuku (Primula veris) populatsioonides ühe õietüübi domineerimise. Töö tulemused näitasid, et hariliku nurmenuku õietüüpide tasakaalu mõjutavad ka mitmed maastiku- ja klimaatilised tegurid, näiteks inimasustuse tihedus, asulate ja ehitiste osakaal maastikus, looduslike elupaikade kadumine ja sademete hulk. Viimaks tuvastasin, et paigast nihkunud õietüüpide tasakaal võib viia geneetilise mitmekesisuse languseni, mis võib omakorda kahandada populatsioonide kohasust ja pikaajalist kohastumisvõimet. Doktoritöö tulemused näitavad, et erikaelsed taimeliigid on kiiretele maastiku-, aga ka kliimamuutustele haavatavad, ning tõstavad esile looduslike elupaikade säilitamise ja kaitse olulisust loomtolmlevate taimede hea käekäigu tagamiseks.Recent rapid land-use changes, along with shifts in climate, pose significant threats to various aspects of biodiversity. Among plants, animal-pollinated species, such as heterostylous species, are considered to be especially vulnerable to landscape changes because of their complex mating system. Heterostyly is a floral polymorphism characterised by the presence of two or three morphologically different types (morphs) in a species. These morphs differ from each other in the positioning of sexual organs within the flower. The purpose of such floral polymorphism is to promote outcrossing between the opposite morphs via insect pollination and to avoid selfing within the same morph or individual. The frequency of morphs within a population should be relatively equal to ensure maximal reproductive fitness. The aim of the thesis was to explore the role of different local, landscape and climatic factors on heterostylous species and uncover the genetic and evolutionary consequences of these processes. My thesis shows that population size is one of the central factors causing the shifts of morph frequencies from the optimal balance of heterostylous plant populations. Contrary to expectations, the studies showed a dominance of one morph type over the other in Primula veris populations. In addition, the results indicate that morph balance can also be affected by different landscape and climatic factors, such as the spread of built-up areas, loss of natural habitats and increased rainfall. Finally, I show that populations with unbalanced morph ratios may experience lower genetic diversity, which can, in turn, decrease the fitness and adaptability of such populations. The results from this thesis indicate the vulnerability of heterostylous species to rapid land use changes and highlight the need to maintain and protect natural habitats to secure the viability of animal-pollinated plant species.https://www.ester.ee/record=b574140

Development of a mobile application for Estonian vocabulary learning: a cross-platform solution using react native

Author: Kapustinskii Nikolai
Publication venue: Tartu Ülikooli Narva kolledž
Publication date: 01/01/2025
Field of study

https://www.ester.ee/record=b574008

How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment

Author: Kuulmets Hele-Andra
Purason Taido
Fishel Mark
Publication venue: University of Tartu Library
Publication date: 2025
Field of study

We present a systematic evaluation of multilingual capabilities of open large language models (LLMs), specifically focusing on five Finno-Ugric (FiU) languages. Our investigation covers multiple prompting strategies across several benchmarks and reveals that Llama-2 7B and Llama-2 13B perform weakly on most FiU languages. In contrast, Llama 3.1 models show impressive improvements, even for extremely low-resource languages such as Võro and Komi, indicating successful cross-lingual knowledge transfer inside the models. Finally, we show that stronger base models outperform weaker, language-adapted models, thus emphasizing the importance of base model in successful language adaptation

Teachers' perceived assessment of cooperation with the school psychologist

Author: Kont Kätriin
Publication venue: Tartu Ülikool
Publication date: 01/01/2025
Field of study

Uurimistöö eesmärk oli uurida, kuidas tajuvad Eesti õpetajad koostööd koolipsühholoogidega, millised on õpetajate arvates olulised koolipsühholoogi tööülesanded, kas nad on praeguse koostööga rahul ning mis saaks veel paremini olla. Eesmärkideni jõudmiseks viisin läbi intervjuud 8 Eesti õpetajaga. Andmete analüüsimiseks kasutasin kvalitatiivset sisuanalüüsi. Uurimuse tulemused näitasid, et õpetajate jaoks on olulised koolipsühholoogi tööülesanded õpilaste toetamine, nõustamine ja hindamine, vajadusel õpilaste edasisuunamine ning sekkumiste kavandamine. Õpetajate jaoks on koostöö koolipsühholoogiga väga väärtuslik ning sellest on neile palju kasu. Õpetajad on koostööga väga rahul. Koostöö arendamiseks pakkusid õpetajad välja õpetajate põhjalikumat koolitamist, selgust koolipsühholoogide tööülesannete osas ning rohkem haridusasutuses praktiseerivaid tugispetsialiste

Pelgalt poliitika? Venemaa Ukraina-teemaliste narratiivide taastootmine Jaapani teadlas- ja haritlaskonnas aastatel 2014–2019

Author: Hosaka Sanshiro
Publication venue: Tartu Ülikooli Kirjastus
Publication date: 12/03/2025
Field of study

Kuidas toimus Venemaa niinimetatud „Ukraina kriisi“ narratiivide taastootmine ja normaliseerimine liberaalsetes demokraatiates? Jaapani juhtum näitlikustab haritlaskonna ja intellektuaalide märkimisväärset rolli Moskva strateegiliste narratiivide edasikandmisel kohaliku publikuni. Aastatel 2014 kuni 2019 avaldatud 460 teksti kontentanalüüs näitab, kuidas Venemaa Ukraina-teemaliste narratiivide taastootmine on seostatav erinevate faktoritega, muuhulgas nende autorite erialase võrgustikuga regiooniuuringute vallas, osalemisega Kremli spondeeritud Valdai Rahvusvahelises Diskussiooniklubis, ja teatud rahvusvaheliste suhete narratiivide omaksvõtmine Venemaa identiteedi ja Jaapani-Vene suhete teemadel. Vastupidiselt ootustele ei olnud Venemaa-uuringute alane uurimisprofiil kasulikuks Venemaa mallide taastootmise mõõdikuks, kuid aktiivne liikmelisus Valdai Rahvusvahelises Diskussiooniklubis on. Lisaks aitas autori skeptilisus nö peavooludiskursuste suhtes ja Venemaa nägemine kaitsepositsioonil oleva ohvrina märkimisväärselt kaasa Venemaa narratiivide taastootmisele. Sama mõju oli Venemaa-poolse „vennasrahvaste“ ajaloonarratiivi toetamisel. Diskursusanalüüs näitas Ukraina-teemaliste akadeemiliste narratiivide puhul kolme dimensiooni. Esiteks, Venemaa-keskne ontoloogia vähendab Ukraina agentsust rahvusvahelises poliitikas. Haritlased normaliseerisid Venemaa ebaseaduslikku Krimmi annekteerimist läbi viitamise Venemaa väidetavalt ontoloogilisele julgeolekule – perspektiiv, mis tugineb Venemaa ajaloolistele narratiividele ja eirab Ukraina oma. Teiseks otsisid lääne hegemoonia vastased, kelle nägemuses on Lääs russofoobne, alternatiivseid selgitusi Putini agressiivsetele poliitikatele. Kolmandaks võib välja tuua need akadeemikud, kelle võimetus ära tunda Kremli manipulatsiooni tõi kaasa tõsist kallutatust metodoloogilisel tasandil, näiteks välitööde läbiviimist „Donetski Rahvavabariigis“. Need narratiivid kujundasid ka Tokyo välispoliitikat. Näiteks argumenteeris grupp Valdai eksperte ja endisi diplomaate, kes diskrediteerisid ukrainlasi kui neo-natse, et Tokyo peaks keskenduma oma Põhjaterritooriumi-teemalise vaidluse lahendamisele Moskvaga ning mitte laskma Ukrainas toimuval tähelepanu hajutada. Väidetavat „Hiina-Venemaa liitu“ kasutati hirmutamistaktikana, veenmaks poliitikategijaid jätkama Venemaaga suhtlemist tavapärasel kursil ning eraldama seda Hiina-küsimusest – Tokyo peamisest julgeolekumurest. Alles Venemaa täieulatuslik sissetung tõi kaasa Jaapani juhtkonna ja avalikkuse pettumuse Venemaas.How were Russia’s narratives on the so-called “Ukraine crisis” reproduced and normalized in liberal democracies? The case of Japan demonstrates the remarkable role scholars and intellectuals play in conveying Moscow’s strategic narratives to national audiences. A content analysis of 460 texts published between 2014 and 2019 elucidated how the reproduction of Russia’s narratives on the events in Ukraine is associated with various factors, including authors’ affiliations with area studies, participation in the Kremlin-sponsored Valdai Discussion Club, and their adoption of overarching narratives concerning international affairs, Russia’s identity, and Japan’s relations with Russia. Contrary to expectations, affiliation with Russian studies was not a useful gauge for the reproduction of Russian narratives, while Valdai Club participation was. Skepticism toward mainstream discourse and the perception of Russia as defensive and victimized also played major roles, alongside the endorsement of the Russian historical narrative of “fraternal nations.” Discourse analysis revealed three dimensions of narratives on Ukraine. First, a Russocentric ontology diminishes Ukraine’s agency in international politics. Scholars normalized Russia’s illegal annexation of Crimea by citing Russia’s alleged ontological security, which relies on Russian historical narratives and disregards Ukraine’s perspective. Second, an anti-Western counterhegemonic stance that interprets Western actions as Russophobic sought alternative explanations for Putin’s aggressive policies. Third, the ill-preparedness to detect the Kremlin’s manipulation of academics led to serious methodological bias, notably in cases of fieldwork in the “Donetsk People’s Republic.” These narratives shaped Tokyo’s foreign policy. A cohort of Valdai experts and former diplomats, labeling Ukrainians as “neo-Nazis,” argued that Tokyo should prioritize resolving the Northern Territories issue with Moscow and not be distracted by Ukraine. A purported “Sino-Russia alliance” was instrumentalized as a scarecrow to persuade decision-makers to maintain business as usual with Russia and decouple it from China—Tokyo’s primary security concern. Only Russia’s full-scale invasion has disillusioned the Japanese leadership and public.https://www.ester.ee/record=b573632

60,617

full texts

108,495

metadata records

Updated in last 30 days.

DSpace at Tartu University Library is based in Estonia

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇