geez.org

Welcome to The Ge'ez Frontier Foundation's Data Archive

What You Will Find Here

Datasets

data.geez.org — Datasets of linguistic interest are provided in delimited text and spreadsheet formats. A goal for these dispirate datasets is to bring them together in a common RDF representation.

E-Books

ebooks.geez.org — A number of books that are unencumbered by copyright restrictions have been digitized into basic text. Goals for the book collection are to: correct errors in the text, apply formatting, and then publish the works under popular e-book formats.

Corpus

https://github.com/geezorg/enh-corpus — The newspaper article corpus collection of the Ethiopian News Headlines service which comprises a total volume of 13,079 articles from 126 newspapers spanning the years from late Hamle 1989 - Yekatit 1997.

Fonts

fonts.geez.org — A Gallery of Ethiopic Fonts -an inventory of Ethiopic typefaces both free and commercial intended to both record a visual history of computer typeface development and spread awareness of available typeface styles.

Status

Preliminary. Document artifacts are being collected, rediscovered, reviewed and sorted from various old harddrives and backup media. Following this collection phase the repository will most likely be split in two: one repository for datasets and one for e-books. Refinement of the assets will be ongoing as will be documentation of the assets.

While issues with the artifacts abound, specific defects are being tracked in the repository issue trackers.