SADiLaR Language Resource Repository
Not a member yet
    536 research outputs found

    NCHLT isiXhosa word2vec-Skipgram embeddings

    No full text
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiXhosa text

    NCHLT Sesotho word2vec-CBOW embeddings

    No full text
    Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Sesotho text

    NCHLT Setswana RoBERTa language model

    No full text
    Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for Setswana text

    Autshumato English-Tshivenḓa Parallel Corpora

    No full text
    Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced from translated material and created by translating English sentences into Tshivenḓa. The data is given as two separate UTF-8 text files, with each aligned segment on a newline

    NCHLT isiNdebele FLAIR-backward embeddings

    No full text
    Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for isiNdebele text

    NCHLT Xitsonga FLAIR-forward embeddings

    No full text
    Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Xitsonga text

    NCHLT Setswana FLAIR-backward embeddings

    No full text
    Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Setswana text

    NCHLT isiZulu word2vec-Skipgram embeddings

    No full text
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiZulu text

    NCHLT Sesotho fastText-Skipgram embeddings

    No full text
    Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Sesotho text

    NCHLT Tshivenḓa FLAIR-forward embeddings

    No full text
    Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Tshivenḓa text

    8

    full texts

    536

    metadata records
    Updated in last 30 days.
    SADiLaR Language Resource Repository
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇