SADiLaR Language Resource Repository
Not a member yet
536 research outputs found
Sort by
NCHLT isiXhosa word2vec-Skipgram embeddings
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiXhosa text
NCHLT Sesotho word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Sesotho text
NCHLT Setswana RoBERTa language model
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for Setswana text
Autshumato English-Tshivenḓa Parallel Corpora
Aligned parallel corpora for the following language pair: English-Tshivenḓa. Data was crawled from various multilingual government websites, sourced from translated material and created by translating English sentences into Tshivenḓa. The data is given as two separate UTF-8 text files, with each aligned segment on a newline
NCHLT isiNdebele FLAIR-backward embeddings
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for isiNdebele text
NCHLT Xitsonga FLAIR-forward embeddings
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Xitsonga text
NCHLT Setswana FLAIR-backward embeddings
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Setswana text
NCHLT isiZulu word2vec-Skipgram embeddings
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiZulu text
NCHLT Sesotho fastText-Skipgram embeddings
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Sesotho text
NCHLT Tshivenḓa FLAIR-forward embeddings
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Tshivenḓa text