More than 31.000 manuscripts in Armenian have survived since the 5th century, kept at the BnF in Paris, at the Matenadaran in Erevan and at other libraries worldwide.
These manuscripts are invaluable and, for some of them, unpublished sources for history of civilisations, geography, canon law, theology or arithmetic. A unique heritage to preserve and to pass down.
Strongly involved in the preservation of Classical Armenian, the Calfa team is working today on a new technological project to support the digitalization of the handwritten heritage.
Using neural networks of our design, we are currently developing an artificial intelligence able to read the manuscripts in Classical Armenian : an OCR engine (Optical Character Recognition) for automated transcription of scanned manuscripts.
We want to propose a technical innovative solution to institutions and private individuals who want to promote their inheritance. Integrated in digitalization processes, our engine makes texts accessible and readable. Then, combined with our linguistic tools, it opens texts to key-word search, critical edition, analysis and translation.
We are convinced that these cutting-edge technologies will go beyond Classical Armenian and will operate for other languages.
Developing such an engine is possible today thanks to deep learning pioneering technologies.
Our team gathers specialists in several disciplines in order to put together a high-performing tool: PhD students in AI, linguists, paleographers, engineers in natural language processing, in machine learning and in image processing.
For the first time, these various skills are combined in one project to put artificial intelligence at the service of the Armenian heritage.
A smart system requires a massive amount of data to be efficient. For an OCR, it needs pictures of handwritten characters.
In order to be able to read them, we are calling on every visitor who would like to help our project. On the Vision Calfa interface, you can help us to process authentic manuscripts, whether you have only one minute or an hour. Your reading enables our system to train and supports our researches.
Today 100 people are taking part in the project and have identified 153926 characters. Click here to subscribe and participate.
Illustrations : Ms W547, Collection du Walters Museum de Baltimore, CC BY-NC-SA 3.0 et MAF55, Cliché CNRS-IRHT. Copyright Musée Arménien de France, Paris.