1,721,064 research outputs found
Autshumato Setswana Monolingual Corpora
Setswana monolingual corpus as a deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a new line.
NOTE: There is a newer version for English-Setswana Monolingual Corpus. See https://hdl.handle.net/20.500.12185/58
Autshumato English-Setswana Parallel Corpora
Aligned English-Setswana parallel corpus. This set contains data that was translated by professional translators, data that was sourced as translated file pairs from translators and data obtained from Government websites and documents. The data is given as six separate UTF-8 text files; with each aligned sentence pair on a new line
NCHLT Setswana FLAIR-forward embeddings
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Setswana text
NCHLT Setswana fastText-CBoW embeddings
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Setswana text
NCHLT Sesotho fastText-Skipgram embeddings
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Sesotho text
NCHLT Afrikaans word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Afrikaans text
NCHLT isiZulu word2vec-Skipgram embeddings
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiZulu text
NCHLT Xitsonga word2vec-Skipgram embeddings
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Xitsonga text
NCHLT Sepedi word2vec-Skipgram embeddings
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Sepedi text
NCHLT isiNdebele fastText-CBoW embeddings
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for isiNdebele text
- …
