September 5-10th 2021, the 16th International Conference on Document Analysis and Recognition will take place at Lausanne, the premier international event for scientists and practitioners involved in document analysis and recognition, a field of growing importance in the current age of digital transition. The conference will bring together hundreds of public and private sector speakers to present the latest advances in Automatic Analysis for printed and handwritten documents, and the ongoing projects in Digital Humanities.
Our team will be attending the conference and we are very proud to present 4 papers on our latest developments and projects.
With Noëmie Lucas (GIS MOMM/CNRS) and Clément Salah (Sorbonne Université, Université de Lausanne), we will showcase our HTR models for manuscripts in Maghrebi Arabic writings. The workshop will provide the opportunity to release the RASAM dataset (Recognition and Analysis of Scripts in Arabic Maghrebi) that we have compiled between January and April 2021 and which will enable further progress of the research community for Language Processing of Arabic manuscripts.
To learn more (intermediate results, in french):
1. Chahan Vidal-Gorène, Noëmie Lucas, Clément Salah, Aliénor Decours-Perez, Boris Dupin, "RASAM - A Dataset for the Recognition and Analysis of Scripts in Arabic Maghrebi", Arabic and Derived Script Analysis and Recognition, ICDAR, 2021.
We will present two papers: the first one, in collaboration with Jean-Baptiste Camps (École Nationale des Chartes-PSL, Centre Jean Mabillon) and Marguerite Vernet (École Nationale des Chartes-PSL) on the issue of abbreviations Automatic Transcription in Latin manuscripts and the comparison of different approaches, notably Calfa's approach; the second paper deals with the Automatic Identification of Armenian scripts: Does Artificial Intelligence see something else than the erkat'agir, bolorgir, nōtrgir and šłagir scripts?
2. Jean-Baptiste Camps, Chahan Vidal-Gorène, Marguerite Vernet, "Handling Heavily Abbreviated Manuscripts: HTR engines vs text normalisation approaches", Computational Paleography, ICDAR, 2021.
3. Chahan Vidal-Gorène, Aliénor Decours-Perez, "A Computational Approach of Armenian Paleography", Computational Paleography, ICDAR, 2021.
We will introduce for the first time our Annotation Platform, Calfa Vision, dedicated to the assisted transcription of handwritten and printed documents and to automated data creation for oriental and under-resourced languages like Armenian, Syriac or Arabic. The platform integrated models are specialized on the collection currently processed on the platform and evaluated on classical datasets from the state of the art.
To learn more about Calfa Vision: Access to Calfa Vision
4. Chahan Vidal-Gorène, Boris Dupin, Aliénor-Decours Perez, Thomas Riccioli, "A Modular and Automated Annotation Platform for Handwritings: Evaluation on Under-resourced Languages", ICDAR, 2021.