SADiLaR Language Resource Repository
Not a member yet
536 research outputs found
Sort by
NCHLT Xitsonga fastText-Skipgram embeddings
Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Xitsonga text
USAf National Language Resources Audit 2023
This report documents the findings of a comprehensive language resources audit conducted by the South African Centre for Digital Language Resources (SADiLaR) as supported by the Board of Universities South Africa (USAf) and the Community of Practice for the teaching and learning of African Languages (CoPAL), a sub-committee of USAf. The mandate from USAf was to conduct this audit at all public higher education institutions, from which conclusions can be drawn, and recommendations made, in terms of existing language resources at universities, milestones already achieved and envisaged, as well as language resources still required by universities to successfully implement the National Language Policy Framework for Public Higher Education Institutions. The audit therefore was conducted to, as a first phase, determine the readiness of higher education institutions to implement the Policy Framework. It comprised an in-depth analysis of staff and student perspectives on relevant issues related to advancing multilingualism and the availability of language resources across five domains: (1) institutional information, (2) language services, (3) teaching and learning practices, (4) communication and administration, and (5) student life and cocurricular activities. The audit results yielded valuable insights into challenges and prospects associated with advancing multilingualism to meet the imperatives of the National Language Policy Framework for Public Higher Education Institutions
NCHLT Sepedi FLAIR-backward embeddings
Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Sepedi text
NCHLT isiXhosa fastText-CBoW embeddings
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for isiXhosa text
NCHLT Tshivenḓa word2vec-Skipgram embeddings
Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Tshivenḓa text
NCHLT isiZulu word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiZulu text
NCHLT Setswana GloVe embeddings
Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations for Setswana text
NCHLT isiZulu fastText-CBoW embeddings
Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for isiZulu text
NCHLT isiXhosa RoBERTa language model
Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for isiXhosa text
NCHLT Afrikaans word2vec-CBOW embeddings
Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Afrikaans text