Skip to main content
The British Library is starting round two of their Optical Character Recognition (OCR) contest, and are calling on anyone with OCR software to apply. More details can be found here: https://blogs.bl.uk/digital-scholarship/2019/02/automatic-transcription-of-historical-arabic-scientific-manuscripts-round-2.html

In the RBSCL Digital Collections, the lack of quality OCR software has hampered research use as well as some potentially interesting projects. Most notably, AUC's student newspaper The Caravan (aka Caravan, Campus Caravan, Caravan Weekly, and AUC Review) has been published in English and Arabic since its founding in 1925. Unfortunately the Arabic OCR text is generally not as accurate or user friendly as the English, thus reducing some of the usefulness. Additionally, it could be interesting to study how Arabic and English word usage changes over time, but without good Arabic OCR it would require a lot of manual entry of data.

Hopefully the British Library contest makes progress in much the way the first contest did.

Comments

Popular posts from this blog

Welcome on Board!

This blog intends to be an open space for digital humanists, librarians, scholars, and researchers working in or on Egypt and the Middle East to share their respective projects and discuss any ideas and tools regarding digital humanities. This blog is created and managed by the Digital Humanities Program at the American University in Cairo library. If you would like to contribute please contact Abdel Aziz Galal , Digital Humanities librarian at AUC. Please consider joining our mailing list .

Women are oppressed, coeds are elected, and men are swindled: A brief intro into text analysis using AUC's student newspaper

My next foray into digital humanities ( you can read about mapping the nationalities of AUC students here ) involves the venerable students newspaper the Caravan (aka the AUC Review , Campus Caravan , and Caravan Weekly ). The first issue was published in 1925 and it is still going strong today. Currently, we have issues up to 1996 available in our Digital Library though some years are missing (either because of scanning issues or we don’t have them at all, in the latter case please let us know if you have copies). The Caravan has been bilingual through most of its history, though this project will focus on the English issues only. With the excellent work done by the digitization lab we have over 4,000 English pages scanned, and through ABBYY FineReader we’ve generated text files for each page, creating a corpus to explore. Unfortunately for some pages the text recognition leaves a lot to be desired; often this is caused by poor quality printing or ABBYY being confused. ...

#PalestineToday: Join in Marking the Nakba in the Time of Coronavirus

What is #PalestineToday? #PalestineToday is the social media hashtag created to encourage Palestinians to share their Nakpa stories  by sharing their place of birth on  Palestine Today interactive map website.  Palestinian Today is a project made by visualizing Palestine (VP).  VP creates data-driven tools to advance a factual, rights-based narrative of the Palestinian-Israeli issue. This project involves researchers, designers, technologists, and communications specialists work in partnership with civil society actors to amplify their impact and promote justice and equality. VP was launched in 2012. VP is the first portfolio of Visualizing Impact (VI), an independent, non-profit laboratory for innovation at the intersection of data science, technology, and design. Here is a video on How to navigate  Palestine Today interactive map and share your story.  Tip: Use Your Mouse Scroll while navigating interactive Map Why #PalestineToday? The 1948 Pales...