Lacuna fund has shared a selection of projects supported by the 2nd cohort. Here 10 teams will create text and speech and datasets that are easily accessible and look to fuel natural language processing (NLP) technologies across 29 languages drawn from Western, Eastern, and Southern Africa regions.
The project seeks to address the absence of an Igbo spoken corpus on NLP tasks. Among the existing corpora is Igbo web Corpus (IgWaC) and literary, which is either unannotated or unarchived towards research and NLP tasks.
Create a multimodal speech dataset for the Bemba language spoken by 30% of Zambia’s population as either their first or second language. Bemba is the country’s most populous language yet lacks significant resources.
Building NLP Text and Speech Datasets on East African languages that are lowly-researched across Tanzania, Uganda, and Kenya. You can also take advantage of speech datasets and high-quality texts. The project will provide data to Runyankore-Rukiga, Swahili, Lumasaaba, Acholi, and Luganda.
Decolonize African scientific writing by creating a multi-lingual parallel corpus of African research through the translation of African pre-print research papers released on AfricArxiv into 6 of the continent’s diverse languages.
Develop a Part-of-Speech (POS) and a named entity recognition corpus covering 20 African languages.
Build a phonetically balanced speech focussed on the financial domain where one of the project teams enables greater access to digital financial services among the Ghanian Twa and Ga speakers. This should help spur AI innovations within that should bring the full benefits of the digital age to all Ghanaians irrespective of their social status.
Almost a year after our launch, we are proud to announce our second cohort of projects, whose teams will create openly accessible text and speech datasets fueling #NLP technologies in 29 languages, engaging dozens of institutions across Africa and beyond: https://t.co/foyN32VMJa pic.twitter.com/cX1yMFmvSq
— Lacuna Fund (@LacunaFund) April 28, 2021
You can find out more about the project here