504 research outputs found

    Corpus of Russian Local Press of the Millennium Period (1996-2006)

    No full text
    Corpus of Russian Local Press of the Millennium Period (1996-2006): selected archives (borders - from 1995/1996-2006) of two hundred and eighty (280) local newspapers from eighty-six (86) subjects of the Russian Federation (2005-2006): Oblasts (provinces), Republics, Krais (territories), Autonomous Okrugs (with a substantial ethnic minority), Federal cities, Autonomous Oblasts

    Mochnacki

    No full text
    korpus tekstów Mochnackieg

    MultiCo

    No full text
    The MultiCo multimodal corpus is one of the outcomes of the project "Digital Research Infrastructure for the Humanities and Arts Studies DARIAH-PL." This project was funded by POIR 4.2 of the European Regional Development Fund from 2021 to 2023 and was carried out by a consortium of academic institutions across Poland with Adam Mickiewicz University, Poznan as a member of the consortium. The MultiCo multimodal corpus was developed at the Faculty of Modern Languages of Adam Mickiewicz University in Poznań. The motivation behind creating the corpus stems from contemporary research on interpersonal communication. The studies confirm that in order to understand and model the multifaceted process of communication, it's essential to study and describe not only speech but also other components of communication, such as gestures, facial expressions, and body posture. The MultiCo corpus was designed to support and facilitate this type of research approach. The corpus contains over 15 hours of recordings and consists of three sections: - Monologs representing persuasion in parliamentary speeches and motivational talks (TEDex), - Dialogs based on task-oriented activities recorded in a lab setting, - Multilogs illustrating discussions with multiple participants, exemplified by conversations on current sports events (TVP Sport 4-4-2). The monolog and multilog sections are based on materials available in public media or archives, while the dialog section includes task-oriented dialogs originally designed and recorded specifically for this resource

    Word Sense Disambiguation Based on Iterative Activation Spreading with Contextual Embeddings for Sense Matching

    Full text link
    Many knowledge-based solutions were proposed to solve Word Sense disambiguation (WSD) problem with limited annotated resources. Such WSD algorithms are able to cover very large sense repositories, but still being outperformed by supervised ones on benchmark data. In this paper, we start with analysis identifying key properties and issues in application of spreading activation algorithms in knowledge-based WSD, e.g. influence of the network local structures, interaction with context information and sense frequency. Taking our observations as a point of departure, we introduce a novel solution with new context-to-sense matching using BERT embeddings, iterative parallel spreading activation function and selective sense alignment using contextual BERT embeddings. The proposed solution obtains performance beyond the state-of-the-art for the contemporary knowledge-based WSD approaches for both English and Polish data

    Korpus próbny

    No full text
    Wikinewsy z Wikipedi

    Polish WSD Datasets

    No full text
    Data and code for the paper published at ICCS 2022: "A Unified Sense Inventory for Word Sense Disambiguation in Polish". The code is available at https://gitlab.clarin-pl.eu/team-semantics/wsd-researc

    DiaBiz

    No full text
    DiaBiz corpus is a dialog corpus comprising recordings and annotated transcriptions of phone-based customer-agent interactions in several key business domains

    Korpus - Wikinews

    No full text
    Corpus with texts on various topics from the world and technology

    StudEmo - corpus of consumer reviews annotated with emotions

    No full text
    Humans' emotional perception is subjective by nature, in which each individual could express different emotions regarding the same textual content. Existing datasets for emotion analysis commonly depend on a single ground truth per data sample, derived from majority voting or averaging the opinions of all annotators. We introduce a new non-aggregated dataset, namely StudEmo, that contains 5,182 customer reviews, each annotated by 25 people with intensities of eight emotions from Plutchik's model, extended with valence and arousal. We also propose three personalized models that use not only textual content but also the individual human perspective, providing the model with different approaches to learning human representations. The experiments were carried out as a multitask classification on two datasets: our StudEmo dataset and GoEmotions dataset, which contains 28 emotional categories. The proposed personalized methods significantly improve prediction results, especially for emotions that have low inter-annotator agreement

    War with striped beetle in main Polish communist party newspaper "Trybuna Ludu" 1950-1965

    No full text
    Articles from main Polish communist party newspaper "Trybuna Ludu" concerning battle with potato beetle allegedly drop down by US Government to Poland and other socialists countrie

    40

    full texts

    504

    metadata records
    Updated in last 30 days.
    CLARIN-PL
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇