SADiLaR Language Resource Repository
Not a member yet
    536 research outputs found

    Autshumato English-Setswana Parallel Corpora

    No full text
    Aligned parallel corpora for the language pair English-Setswana. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for

    Autshumato English-isiZulu Parallel Corpora

    No full text
    Aligned parallel corpora for the language pair English-isiZulu. The data is given as two separate UTF-8 text files, with each aligned segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for

    CTexT Afrikaans fastText CBoW String Embeddings

    No full text
    The CTexT Afrikaans fastText CBoW String Embeddings is a 300 dimensional Afrikaans embedding model based on the Contunious Bag of Words fastText architecture that provides real-valued vector representations for Afrikaans text. The embedding was trained on a corpus of 230 million words

    Autshumato Monolingual Xitsonga Corpus

    No full text
    Monolingual corpus for Xitsonga. The data is given as a single UTF-8 text file, with each segment on a newline. The data was specifically selected and formatted for use in the training of machine translation systems. Further clean-up and processing might be required depending on the task the data is reused for

    WAT quotation collection

    No full text
    Collection of short quotations/excerpts from a variety of books (fiction, non-fiction & academic)

    N|uu language archive

    No full text
    This collection contains information that forms the basis of the N|uu dictionary which contains a word list for N|uu with translations into Afrikaans, Nama, and English

    Multilingual spelling checker lexicons

    No full text
    Spelling checker lexicons for 10 South African languages. Lexicons created by collecting data from various sources and manually reviewed by language experts according to the standard written orthography. For each language there are four different lexicon files: abbreviations..txt abbreviations and abbreviation compounds. lowercase..txt words that are correct when written in lower case. offensive..txt words that are potentially offensive, obscene, racist, or should not be suggested by a spelling checker for some other reason. uppercase..txt words that should only be written with one or more capitalised characters, such as person and place names

    Sesotho syllabification systems

    No full text
    This package contains two syllabification systems for Sesotho (rule-based and TeX-based)

    Afrikaans morphological evaluative constructions dataset

    No full text
    A dataset of Afrikaans morphological evaluative constructions (MECs) and their word frequency classes. The MECs have been compiled using extracted constructions from the corpus collections accessible through the Virtual Institute for Afrikaans (VivA). The files are grouped in affixoids, compounds, affixes and other typed of MECs. This dataset forms the basis of the description of Afrikaans MECs in a PhD thesis

    Multilingual Linguistic Terminology

    No full text
    Multilingual Linguistic Terminology Project Termbanks of Linguistic terminology for South African languages Version 1.0 https://linguisticterminology.wordpress.com/ Languages included: Setswana (tsn), isiZulu (zul), isiXhosa (xho), Sesotho sa Leboa (nso), Tshivenda (ven), Sesotho (sot), Xitsonga (xho), isiNdebele (nde) and Siswati (ssw

    8

    full texts

    536

    metadata records
    Updated in last 30 days.
    SADiLaR Language Resource Repository
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇