1,721,064 research outputs found

    Autshumato Setswana Monolingual Corpora

    No full text
    Setswana monolingual corpus as a deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a new line. NOTE: There is a newer version for English-Setswana Monolingual Corpus. See https://hdl.handle.net/20.500.12185/58

    Autshumato English-Setswana Parallel Corpora

    No full text
    Aligned English-Setswana parallel corpus. This set contains data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents. The data is given as six separate UTF-8 text files; with each aligned sentence pair on a new line

    NCHLT Setswana FLAIR-forward embeddings

    No full text
    Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Setswana text

    NCHLT Setswana fastText-CBoW embeddings

    No full text
    Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Setswana text

    NCHLT Sesotho fastText-Skipgram embeddings

    No full text
    Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Sesotho text

    NCHLT Afrikaans word2vec-CBOW embeddings

    No full text
    Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Afrikaans text

    NCHLT isiZulu word2vec-Skipgram embeddings

    No full text
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiZulu text

    NCHLT Xitsonga word2vec-Skipgram embeddings

    No full text
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Xitsonga text

    NCHLT Sepedi word2vec-Skipgram embeddings

    No full text
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Sepedi text

    NCHLT isiNdebele fastText-CBoW embeddings

    No full text
    Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for isiNdebele text
    corecore