Skip to main content
The British Library is starting round two of their Optical Character Recognition (OCR) contest, and are calling on anyone with OCR software to apply. More details can be found here: https://blogs.bl.uk/digital-scholarship/2019/02/automatic-transcription-of-historical-arabic-scientific-manuscripts-round-2.html

In the RBSCL Digital Collections, the lack of quality OCR software has hampered research use as well as some potentially interesting projects. Most notably, AUC's student newspaper The Caravan (aka Caravan, Campus Caravan, Caravan Weekly, and AUC Review) has been published in English and Arabic since its founding in 1925. Unfortunately the Arabic OCR text is generally not as accurate or user friendly as the English, thus reducing some of the usefulness. Additionally, it could be interesting to study how Arabic and English word usage changes over time, but without good Arabic OCR it would require a lot of manual entry of data.

Hopefully the British Library contest makes progress in much the way the first contest did.

Comments

Popular posts from this blog

Middle East Digital Humanities Digest Blog

The Middle East Digital Humanities Digest is a blog created by the Digital Humanities program at the American University in Cairo Libraries & Learning technologies. AUC is a leading English-language, American-accredited institution of higher education and center of the intellectual, social, and cultural life of the Arab world. Its community of students, parents, faculty and staff, trustees, alumni, and other generous sponsors represent more than 60 countries. The University stands as a crossroads for the world’s cultures and a vibrant forum for reasoned argument, spirited debate, and understanding across the diversity of languages, facilities, and human experiences. MEDHD Vision: Quality guide of content for digital humanities projects and content in Egypt and the Middle East. MEDHD Audience: (Our ideal reader & Bloggers) MEDHD  blog intends to be a space for digital humanists, librarians, scholars, and researchers, and students who show great interest or experience...

Women are oppressed, coeds are elected, and men are swindled: A brief intro into text analysis using AUC's student newspaper

My next foray into digital humanities ( you can read about mapping the nationalities of AUC students here ) involves the venerable students newspaper the Caravan (aka the AUC Review , Campus Caravan , and Caravan Weekly ). The first issue was published in 1925 and it is still going strong today. Currently, we have issues up to 1996 available in our Digital Library though some years are missing (either because of scanning issues or we don’t have them at all, in the latter case please let us know if you have copies). The Caravan has been bilingual through most of its history, though this project will focus on the English issues only. With the excellent work done by the digitization lab we have over 4,000 English pages scanned, and through ABBYY FineReader we’ve generated text files for each page, creating a corpus to explore. Unfortunately for some pages the text recognition leaves a lot to be desired; often this is caused by poor quality printing or ABBYY being confused. ...

Oman puts 4000 Manuscripts Online

The Omani ministry of Heritage and Culture has announced it is sharing more than 4,000 manuscripts electronically to researchers on its website . The manuscripts are distributed in four fields, focusing mostly on the humanities, Hadith, Quran, jurisprudence, history, literature, as well as astronomy, medicine and marine science.