SADiLaR Language Resource Repository
Not a member yet
    536 research outputs found

    NCHLT Xitsonga fastText-Skipgram embeddings

    No full text
    Static word and subword embeddings for the Skipgram flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for Xitsonga text

    USAf National Language Resources Audit 2023

    Full text link
    This report documents the findings of a comprehensive language resources audit conducted by the South African Centre for Digital Language Resources (SADiLaR) as supported by the Board of Universities South Africa (USAf) and the Community of Practice for the teaching and learning of African Languages (CoPAL), a sub-committee of USAf. The mandate from USAf was to conduct this audit at all public higher education institutions, from which conclusions can be drawn, and recommendations made, in terms of existing language resources at universities, milestones already achieved and envisaged, as well as language resources still required by universities to successfully implement the National Language Policy Framework for Public Higher Education Institutions. The audit therefore was conducted to, as a first phase, determine the readiness of higher education institutions to implement the Policy Framework. It comprised an in-depth analysis of staff and student perspectives on relevant issues related to advancing multilingualism and the availability of language resources across five domains: (1) institutional information, (2) language services, (3) teaching and learning practices, (4) communication and administration, and (5) student life and cocurricular activities. The audit results yielded valuable insights into challenges and prospects associated with advancing multilingualism to meet the imperatives of the National Language Policy Framework for Public Higher Education Institutions

    NCHLT Sepedi FLAIR-backward embeddings

    No full text
    Contextual word/string embeddings for the backward flavour of the FLAIR architecture (Akbik et al., 2018). The embedding provides real-valued vector representations for Sepedi text

    NCHLT isiXhosa fastText-CBoW embeddings

    No full text
    Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for isiXhosa text

    NCHLT Tshivenḓa word2vec-Skipgram embeddings

    No full text
    Static word embeddings for the Skipgram flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Tshivenḓa text

    NCHLT isiZulu word2vec-CBOW embeddings

    No full text
    Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for isiZulu text

    NCHLT Setswana GloVe embeddings

    No full text
    Static word embedding model based on the Global Vectors architecture (Pennington et al., 2014). The embeddings provide real-valued vector representations for Setswana text

    NCHLT isiZulu fastText-CBoW embeddings

    No full text
    Static word and subword embeddings for the continuous bag of words (CBoW) flavour of the fastText architecture (Bojanowski et al., 2017). The embedding provides real-valued vector representations for isiZulu text

    NCHLT isiXhosa RoBERTa language model

    No full text
    Contextual masked language model based on the RoBERTa architecture (Liu et al., 2019). The model is trained as a masked language model and not fine-tuned for any downstream process. The model can be used both as a masked LM or as an embedding model to provide real-valued vectorised respresentations of words or string sequences for isiXhosa text

    NCHLT Afrikaans word2vec-CBOW embeddings

    No full text
    Static word embeddings for the continuous bag of words (CBoW) flavour of the word2vec (w2v) architecture (Mikolov et al., 2013). The embedding provides real-valued vector representations for Afrikaans text

    8

    full texts

    536

    metadata records
    Updated in last 30 days.
    SADiLaR Language Resource Repository
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇