1,722,447 research outputs found
Autshumato English-Siswati Parallel Corpora
Aligned parallel corpora for the following language pair: English-SiSwati. The data is given as four separate UTF-8 text files, with each segment on a newline. Dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The dataset contains the following types of bilingual data: Translations from English to Siswati and crawled parallel data for English-Siswati. The dataset comprises a total of 114,839 segments with 2,002,293 English words and 1, 423,414 SiSwati words.
(A new version issued since the title was changed
Say it in Siswati
Doctor EducationisSay it in siSwati is a practical course manual for beginners. It is
intended to be used in conjunction with a series of language laboratory
tapes, either for individual or group instruction; but could also be used
independently, preferably with the aid of a siSwati speaker. The main aim
has been to introduce all the principal structures and to present these in
terms of familiar objects .and everyday situations as far as possible.
Starting from a selected basic vocabulary, the drills concentrate ~~ fitting
words together effectively and fluently, and the stock of words is gradually
expanded. Appendices provide extra phonological and grammatical information
if required, and a repertoire of traditional songs, followed by a glossary.
The course represents a revised and expanded version of a siSwati Language
Manual ·devised in 1972 for teaching British volunteers who were going out to
Swaziland to undertake projects for Voluntary Service Overseas., and International
Voluntary Service. Grateful acknowledgement is due, particularly,
to Mrs. Gladys Mkhonta and Mr. A.B. Ngcobo who gave valuable assistance in
the preparation of scripts and drills, and also to Messrs. Titus Ngubeni,
Reuben Zondi, Derek Hlanze, Edward Dlamini, Nicholas Dlamini, Reginald Dladla,
Clifford Magongo and Miss Jane Maseko, whose voices are recorded on the tapes.
fundamentally, preparation of the manual would not have been possible at all
without the generosity of the School of Oriental and African Studies, University
of London, in sponsoring the necessary linguistic and musical field research
Say it in Siswati
Doctor EducationisSay it in siSwati is a practical course manual for beginners. It is
intended to be used in conjunction with a series of language laboratory
tapes, either for individual or group instruction; but could also be used
independently, preferably with the aid of a siSwati speaker. The main aim
has been to introduce all the principal structures and to present these in
terms of familiar objects .and everyday situations as far as possible.
Starting from a selected basic vocabulary, the drills concentrate ~~ fitting
words together effectively and fluently, and the stock of words is gradually
expanded. Appendices provide extra phonological and grammatical information
if required, and a repertoire of traditional songs, followed by a glossary.
The course represents a revised and expanded version of a siSwati Language
Manual �devised in 1972 for teaching British volunteers who were going out to
Swaziland to undertake projects for Voluntary Service Overseas., and International
Voluntary Service. Grateful acknowledgement is due, particularly,
to Mrs. Gladys Mkhonta and Mr. A.B. Ngcobo who gave valuable assistance in
the preparation of scripts and drills, and also to Messrs. Titus Ngubeni,
Reuben Zondi, Derek Hlanze, Edward Dlamini, Nicholas Dlamini, Reginald Dladla,
Clifford Magongo and Miss Jane Maseko, whose voices are recorded on the tapes.
fundamentally, preparation of the manual would not have been possible at all
without the generosity of the School of Oriental and African Studies, University
of London, in sponsoring the necessary linguistic and musical field research
Autshumato Monolingual Siswati Corpus
Monolingual corpus for SiSwati. The data is given as a single UTF-8 text file, with each segment on a newline. The dataset contains existing data sourced for the DSAC funded Autshumato project as well as new data sourced for the SADiLaR: Parallel corpora for English into SiSwati project. The data comprises a total of 138, 651 segments with 1,536, 356 SiSwati words
# IsiZulu News (articles and headlines) and Siswati News (headlines) Corpora - za-isizulu-siswati-news-2022
IsiZulu news and Siswati news Corpora (mixed lengths)Dataset for both isiZulu news (articles and headlines) and Siswati news headlines. Process included scraping the data from internet, from Isolezwe news website http://www.isolezwe.co.za and public posts from the SABC news LigwalagwalaFM Facebook page https://www.facebook.com/ligwalagwalafm/ respectively
Minimality in siSwati
A research report submitted in fulfillment of the requirements for the Master of Arts in Linguistics, In the Faculty of Humanities, Wits School of Art, University of the Witwatersrand, Johannesburg, 2024Many languages have minimal prosodic restrictions on the size of well-formed words. This study explores word minimality restrictions on the siSwati Prosodic Word, with emphasis on how the grammar of the language repairs submininal constructions. It provides evidence for word minimality in different forms of the Verb and the Noun within the siSwati grammar. It further illustrates that siSwati grammar triggers different augmentation strategies across various morphosyntactic domains. The dissertation provides a formal Optimality Theory analysis of the minimality restrictions on the PWord, highlighting how minimality effects in siSwati pattern with other Bantu in general and Nguni languages in particular. This work demonstrates that the Prosodic Hierarchy and its domains determine whether the siSwati grammar triggers or blocks augmentation to satisfy minimality constraints. The aim of this study is to present the first comprehensive account of repair strategies used in siSwati to maintain preferred phonological structures, highlighting the importance of the syllable and word as essential levels of phonological analysis in this language and others like it. Findings reveal that the language selects phonological or morphological augmentation to parse grammatical constructions that are minimally well-formed in all surface representations in the siSwati grammar. The requirements for minimality evident from this study are the same crosslinguistically, with siSwati and Xitsonga employing a suffixal morpheme as opposed to the prefixal morpheme employed by all the other Nguni languages in the imperative. In Nguni languages prefixing augmentation is unmarked while suffixing augmentation is marked. Additionally, the results of this analysis are compared to those of other Southern Bantu languages in an effort to situate siSwati within its language family, thereby contributing, in a small but significant way, to linguistic typology.MM202
Dataset for Siswati: Parallel textual data for English and Siswati and monolingual textual data for Siswati
This data article presents a dataset for Siswati, a Bantu language of the Nguni group that is one of the eleven official South African languages and the official language of Eswatini (together with English). The dataset contains parallel textual data between English and Siswati as well as monolingual data for Siswati and was developed for use as training data for machine translation systems, specifically the Autshumato machine translation project. Both corpora can also be used for development and evaluation of Natural Language Processing (NLP) core technologies for Siswati. In addition, the data lends itself for corpus linguistic studies. The article describes how the data was collected, what type of texts it contains and what clean-up was done. It also provides an overview of the number of words contained in the datasets
Some aspects of siSwati phonology
A thesis submitted in fulfilment of the requirements for the Doctor of Philosophy in International Relations to the Faculty of Humanities, School of Literature, Language and Media, University of the Witwatersrand, 2022The study examines siSwati segmental phonology. It highlights how various phonological processes eliminate dispreferred phonological structures, as conditioned by the morphological domains in which they occur. I use hiatus resolution patterns, loanword adaptation, /mu/ reduction, and word minimality to present evidence for the siSwati syllable structure and permissible minimal word size. Firstly, the study demonstrates how the selection of hiatus resolution patterns is contingent upon the morphological context in which they occur, displaying the intricate relationship in the phonology-morphology interface. The study also presents an analysis of loanword nativisation in siSwati to further account for how the siSwati grammar eliminates mismatched output forms. The analysis of /mu/ reduction provides evidence for single C-Slot and V-Slot specification in the siSwati grammar. Lastly, word minimality effects demonstrate the strategies that siSwati uses to maintain its preferred minimal word size. Leaning on native speaker intuition, the analysis employs Optimality Theory (Prince & Smolensky, 1993/2004) to present a unified account of markedness and faithfulness constraint interaction in parsing CV syllables and minimally well-formed Prosodic Words. Analytical insights from Feature Geometry (Clements & Hume, 1995) are used to explain feature spreading in the epenthesis patterns attested in the language. The model is deployed to account for the representation of complex segments such as NCs and CGs in the grammar, displaying how they optimally fit into the preferred CV syllable structure. The goal in each of the phonological processes under investigation is to ensure that all output forms are harmonious with the siSwati CV syllable template and word minimality restrictions in the grammar. The study places the syllable at the centre of phonological analysis, highlighting how markedness and faithfulness constraints in the various phonological processes under investigation conspire to eliminate ill-formed phonological structures in all surface forms. This thesis is motivated by the desire to ensure that siSwati grammar parses onsetful syllables and minimally well-formed Prosodic Words in all surface representations.TL (2023
The great siSwati locative shift
In siSwati the accumulation of a number of changes in the morphology and syntax of locative phrases has led to a more fundamental shift of restructuring of the underlying grammatical system 13 the great siSwati locative shift 13 so that locatives in siSwati are no longer, as in Proto-Bantu and most other present-day Bantu languages, part of the noun class system, but are prepositional. This shift explains aspects of changes in the siSwati locative system which are not otherwise independently motivated, including the degrammaticalization of a historic noun class marker into a preposition and distinct relative clause marking of locatives, and provides an example of a complex, systematic historical change of a sub-system of the grammar
Mid vowel assimilation in siSwati
Previous analyses of the behaviour of siSwati mid vowels show conflicting results. According to Ziervogel and Mabuza, and Taljaard and Snyman, siSwati mid vowels are raised to close mid [e, o] when preceding the high vowels [i, u] (e.g. [likʰeʃi] ‘lift’ (n.), [inɮovu] ‘elephant’) but remain open mid [ɛ, ɔ] before [-high] vowels (e.g. [liʦemba] ‘hope n.’, [ibola] ‘football’), suggesting that siSwati has a vowel height assimilation and/or ATR assimilation. However, Kockaert, in his acoustic analysis of the same vowels in the same environments, disputes this description. He concludes that there is no significant difference in the F1, F2, and F3 frequency values of these vowels. Results of an experiment that I have conducted show that the phonological environment does influence the quality of siSwati mid vowels. The change though is not evidence of harmony but that of co-articulation
- …
