Search CORE

1,720,970 research outputs found

CTexT Afrikaans GloVe Word Embeddings

Author: Eiselen Roald
Publication venue
Publication date: 10/01/2022
Field of study

The CTexT Afrikaans GloVe Word Embeddings is a 300 dimensional Afrikaans embedding model based on the Global Vectors architecture (Pennington, 2014) that provides real-valued vector representations for Afrikaans text. The embedding model was trained on a corpus of 230 million words

SADiLaR Language Resource Repository

CTexT fastText Skipgram String Embeddings

Author: Eiselen Roald
Publication venue
Publication date: 10/01/2022
Field of study

The CTexT Afrikaans fastText Skipgram String Embeddings is a 300 dimensional Afrikaans embedding model based on the Skipgram fastText architecture that provides real-valued vector representations for Afrikaans text. The embedding was trained on a corpus of 230 million words

SADiLaR Language Resource Repository

CTexT Afrikaans FLAIR Part of Speech tagger model

Author: Eiselen Roald
Publication venue
Publication date: 10/01/2022
Field of study

The CTexT Afrikaans FLAIR Part of Speech tagger model is a neural part of speech tagger model based on the FLAIR framework (Akbik et al. 2019), and includes Afrikaans Glove (Pennington et al., 2014) and FLAIR embeddings (Akbik et al. 2018) from the CTexT Afrikaans word and string embeddings. The model is trained on a collection of 100 000 part of speech annotated tokens, including the NCHLT Afrikaans annotated data

SADiLaR Language Resource Repository

CTexT Afrikaans FLAIR String Embeddings

Author: Eiselen Roald
Publication venue
Publication date: 10/01/2022
Field of study

The CTexT Afrikaans FLAIR String Embeddings are two Afrikaans embedding models based on the FLAIR architecture (Akbik et al. 2018, 2019) that provides real-valued vector representations for Afrikaans text. The embeddings were trained on a corpus of 230 million words

SADiLaR Language Resource Repository

CTexT Afrikaans FLAIR Named Entity Recognition model

Author: Eiselen Roald
Publication venue
Publication date: 10/01/2022
Field of study

The CTexT Afrikaans FLAIR Named Entity Recognition model is a neural NER model based on the FLAIR framework (Akbik et al. 2019), and includes Afrikaans fastText (Bojanowski et al., 2017) and FLAIR embeddings (Akbik et al. 2018) from the CTexT Afrikaans word and string embeddings. The model is trained on the NCHLT Afrikaans Named Entity Annotated Corpus

SADiLaR Language Resource Repository

CTexT Afrikaans fastText CBoW String Embeddings

Author: Eiselen Roald
Publication venue
Publication date: 10/01/2022
Field of study

The CTexT Afrikaans fastText CBoW String Embeddings is a 300 dimensional Afrikaans embedding model based on the Contunious Bag of Words fastText architecture that provides real-valued vector representations for Afrikaans text. The embedding was trained on a corpus of 230 million words

SADiLaR Language Resource Repository

Author Instructions

Author: Instructions Author
Publication venue
Publication date: 04/11/2013
Field of study

Crossref

Cartographic Perspectives (E-Journal - North American Cartographic Information Society, NACIS)

Designing a South African multilingual learner corpus of academic texts (SAMuLCAT)

Author: Carstens Adelia
Eiselen Roald
Publication venue
Publication date: 01/01/2019
Field of study

This article provides an overview of the process and initial outcomes of designing a multilingual corpus of academic texts produced by university students with different mother tongues in South Africa, with a view to making it available as an open resource for pedagogical applications and research. We first give an overview of the history of corpus development for pedagogical purposes world-wide, with particular emphasis on learner corpora, and highlight the absence of a South African corpus of academic learner texts. Thereafter, the objectives of the corpus project are outlined. The remainder of the article describes and justifies the design-features of the corpus as well as the process of setting up the data management system to facilitate the collection of the learner texts and their integration with the metadata. We conclude with a summary of the current status of the project, including the limitations, and a preview of the way forward.This research was made possible with support from the South African Centre for Digital Language Resources (SADiLaR).https://www.tandfonline.com/loi/rlms202020-07-22hj2019Unit for Academic Literac

UPSpace at the University of Pretoria

Going Beyond Counting First Authors in Author Co-citation Analysis

Author: Zhao Dangzhi
Publication venue
Publication date: 01/01/2005
Field of study

The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

E-LIS

Exploring Afrikaans word embeddings with analogies and nearest neighbours

Author: Eiselen Roald
Gaustad Tanja
Publication venue
Publication date: 25/01/2023
Field of study

This paper presents an exploration of word embeddings for Afrikaans using the analogies and nearest neighbours methodologies. We compare the results on three types of embeddings (fastText, FLAIR and GloVe) on a novel analogy data set for Afrikaans, inspired by the Bigger Analogy Test Set: BATS (Gladkova et al. 2016). Our analysis shows that for Afrikaans, similar to English, the types of embeddings influence the quality of analogies found for different linguistic tasks. Our investigation also demonstrates, however, that these Afrikaans embeddings do not encode as clear a linguistic representation as with English embeddings. The exact reason for this is subject to future work, but the added morphological complexity and the lack of data most likely play a role

UP Journals (Univ. of Pretoria)