The British Library is starting round two of their Optical Character Recognition (OCR) contest, and are calling on anyone with OCR software to apply. More details can be found here: https://blogs.bl.uk/digital-scholarship/2019/02/automatic-transcription-of-historical-arabic-scientific-manuscripts-round-2.html
In the RBSCL Digital Collections, the lack of quality OCR software has hampered research use as well as some potentially interesting projects. Most notably, AUC's student newspaper The Caravan (aka Caravan, Campus Caravan, Caravan Weekly, and AUC Review) has been published in English and Arabic since its founding in 1925. Unfortunately the Arabic OCR text is generally not as accurate or user friendly as the English, thus reducing some of the usefulness. Additionally, it could be interesting to study how Arabic and English word usage changes over time, but without good Arabic OCR it would require a lot of manual entry of data.
Hopefully the British Library contest makes progress in much the way the first contest did.
In the RBSCL Digital Collections, the lack of quality OCR software has hampered research use as well as some potentially interesting projects. Most notably, AUC's student newspaper The Caravan (aka Caravan, Campus Caravan, Caravan Weekly, and AUC Review) has been published in English and Arabic since its founding in 1925. Unfortunately the Arabic OCR text is generally not as accurate or user friendly as the English, thus reducing some of the usefulness. Additionally, it could be interesting to study how Arabic and English word usage changes over time, but without good Arabic OCR it would require a lot of manual entry of data.
Hopefully the British Library contest makes progress in much the way the first contest did.
Comments
Post a Comment