CLARIN-PL

Not a member yet

504 research outputs found

Sort by

Corpus of Russian Local Press of the Millennium Period (1996-2006)

Author: Fedorushkov Yury
Publication venue: Adam Mickiewicz University
Publication date: 14/04/2023
Field of study

Corpus of Russian Local Press of the Millennium Period (1996-2006): selected archives (borders - from 1995/1996-2006) of two hundred and eighty (280) local newspapers from eighty-six (86) subjects of the Russian Federation (2005-2006): Oblasts (provinces), Republics, Krais (territories), Autonomous Okrugs (with a substantial ethnic minority), Federal cities, Autonomous Oblasts

Mochnacki

Author: Mędrzecka Anna
Bernaś Tomasz
Publication venue: CLARIN-PL
Publication date: 22/11/2023
Field of study

korpus tekstów Mochnackieg

MultiCo

Author: Karpiński Maciej
Katarzyna Klessa
Ewa Jarmołowicz-Nowikow
Janusz Taborek
Brygida Sawicka-Stępińska
Michał Piosik
Publication venue: Adam Mickiewicz University, Poznań
Publication date: 31/12/2023
Field of study

The MultiCo multimodal corpus is one of the outcomes of the project "Digital Research Infrastructure for the Humanities and Arts Studies DARIAH-PL." This project was funded by POIR 4.2 of the European Regional Development Fund from 2021 to 2023 and was carried out by a consortium of academic institutions across Poland with Adam Mickiewicz University, Poznan as a member of the consortium. The MultiCo multimodal corpus was developed at the Faculty of Modern Languages of Adam Mickiewicz University in Poznań. The motivation behind creating the corpus stems from contemporary research on interpersonal communication. The studies confirm that in order to understand and model the multifaceted process of communication, it's essential to study and describe not only speech but also other components of communication, such as gestures, facial expressions, and body posture. The MultiCo corpus was designed to support and facilitate this type of research approach. The corpus contains over 15 hours of recordings and consists of three sections: - Monologs representing persuasion in parliamentary speeches and motivational talks (TEDex), - Dialogs based on task-oriented activities recorded in a lab setting, - Multilogs illustrating discussions with multiple participants, exemplified by conversations on current sports events (TVP Sport 4-4-2). The monolog and multilog sections are based on materials available in public media or archives, while the dialog section includes task-oriented dialogs originally designed and recorded specifically for this resource

Word Sense Disambiguation Based on Iterative Activation Spreading with Contextual Embeddings for Sense Matching

Author: Janz Arkadiusz
Piasecki Maciej
Publication venue: Global Wordnet Association
Publication date: 01/01/2023
Field of study

Many knowledge-based solutions were proposed to solve Word Sense disambiguation (WSD) problem with limited annotated resources. Such WSD algorithms are able to cover very large sense repositories, but still being outperformed by supervised ones on benchmark data. In this paper, we start with analysis identifying key properties and issues in application of spreading activation algorithms in knowledge-based WSD, e.g. influence of the network local structures, interaction with context information and sense frequency. Taking our observations as a point of departure, we introduce a novel solution with new context-to-sense matching using BERT embeddings, iterative parallel spreading activation function and selective sense alignment using contextual BERT embeddings. The proposed solution obtains performance beyond the state-of-the-art for the contemporary knowledge-based WSD approaches for both English and Polish data

Korpus próbny

Author: Kowalski Jan
Publication venue: korpus
Publication date: 15/01/2022
Field of study

Wikinewsy z Wikipedi

Polish WSD Datasets

Author: Janz Arkadiusz
Baran Joanna
Oleksy Marcin
Dziob Agnieszka
Publication venue: Wrocław University of Technology
Publication date: 11/04/2022
Field of study

Data and code for the paper published at ICCS 2022: "A Unified Sense Inventory for Word Sense Disambiguation in Polish". The code is available at https://gitlab.clarin-pl.eu/team-semantics/wsd-researc

DiaBiz

Author: Pęzik Piotr
Krawentek Gosia
Karasińska Sylwia
Wilk Paweł
Rybińska Paulina
Cichosz Anna
Peljak-Łapińska Angelika
Deckert Mikołaj
Adamczyk Michał
Publication venue: University of Lodz
Publication date: 01/01/2022
Field of study

DiaBiz corpus is a dialog corpus comprising recordings and annotated transcriptions of phone-based customer-agent interactions in several key business domains

Korpus - Wikinews

Author: of Wroclaw University
Publication venue: University of Wroclaw
Publication date: 23/01/2022
Field of study

Corpus with texts on various topics from the world and technology

StudEmo - corpus of consumer reviews annotated with emotions

Author: Ngo Anh
Candri Argi
Ferdinan Teddy
Kocoń Jan
Korczyński Wojciech
Publication venue: Wrocław University of Science and Technology
Publication date: 20/05/2022
Field of study

Humans' emotional perception is subjective by nature, in which each individual could express different emotions regarding the same textual content. Existing datasets for emotion analysis commonly depend on a single ground truth per data sample, derived from majority voting or averaging the opinions of all annotators. We introduce a new non-aggregated dataset, namely StudEmo, that contains 5,182 customer reviews, each annotated by 25 people with intensities of eight emotions from Plutchik's model, extended with valence and arousal. We also propose three personalized models that use not only textual content but also the individual human perspective, providing the model with different approaches to learning human representations. The experiments were carried out as a multitask classification on two datasets: our StudEmo dataset and GoEmotions dataset, which contains 28 emotional categories. The proposed personalized methods significantly improve prediction results, especially for emotions that have low inter-annotator agreement

War with striped beetle in main Polish communist party newspaper "Trybuna Ludu" 1950-1965

Author: Pawliszak Piotr
Konopko Adam
Steciąg Magdalena
Publication venue: Uniwersytet Gdański
Publication date: 04/01/2022
Field of study

Articles from main Polish communist party newspaper "Trybuna Ludu" concerning battle with potato beetle allegedly drop down by US Government to Poland and other socialists countrie

40

full texts

504

metadata records

Updated in last 30 days.

CLARIN-PL

Access Repository Dashboard

Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇