SADiLaR Language Resource Repository
Not a member yet
536 research outputs found
Sort by
NCHLT isiXhosa word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiXhosa text
NCHLT Afrikaans FLAIR-backward embeddings
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Afrikaans text
NCHLT Sepedi word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Sepedi text
NCHLT Afrikaans fastText-CBoW embeddings
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017).
The embedding provides real-valued vector representations for Afrikaans text
NCHLT isiXhosa FLAIR-forward embeddings
Contextual word/string embeddings for the forward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for isiXhosa text
NCHLT Tshivenḓa GloVe embeddings
Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations for Tshivenḓa text
NCHLT Tshivenḓa FLAIR-backward embeddings
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Tshivenḓa text
NCHLT Xitsonga RoBERTa language model
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for Xitsonga text
NCHLT Setswana fastText-CBoW embeddings
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Setswana text
NCHLT Afrikaans RoBERTa language model
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for Afrikaans text