Introducing the Swahili News Dataset for Topic Classificationby@davisdavid
891 reads
891 reads

Introducing the Swahili News Dataset for Topic Classification

by Davis David3mSeptember 27th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Swahili (also known as Kiswahili) is one of the most spoken languages in Africa. It is spoken by 100–150 million people across East Africa. News in local languages plays an important cultural role in many African countries. The goal of this project was to build an open-source text dataset focused on News articles. I mainly focus on collecting news in different categories such as Local, International, Business or Financial, health, sports, and entertainment news. The dataset is open source, and NLP practitioners can access the dataset and learn from it.

People Mentioned

Mention Thumbnail

Company Mentioned

Mention Thumbnail
featured image - Introducing the Swahili News Dataset for Topic Classification
Davis David HackerNoon profile picture
Davis David

Davis David

@davisdavid

Data Scientist | AI Practitioner | Software Developer| Technical Writer

About @davisdavid
LEARN MORE ABOUT @DAVISDAVID'S
EXPERTISE AND PLACE ON THE INTERNET.

Share Your Thoughts

About Author

Davis David HackerNoon profile picture
Davis David@davisdavid
Data Scientist | AI Practitioner | Software Developer| Technical Writer

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
L O A D I N G
. . . comments & more!