Publications for the International Conference on Language Resources and Evaluation
Calfa has been selected for the International Conference LREC organized by ELRA-ELDA, for which two articles have been published about Calfa and about the research conducted on NLP for Armenian language.
The 12th International Conference on Language Resources and Evaluation organized by the European Language Resources Association was set to take place in 2020 in Marseille. This biennial conference brings together the public and private actors involved in natural language processing. Since the first LREC held in Granada in 1998, LREC has become the major event on Language Resources (LRs) and Evaluation for Language Technologies (LT). The aim of LREC is to provide an overview of the state-of-the-art, explore new R&D directions and emerging trends, exchange information regarding LRs and their applications, evaluation methodologies and tools, ongoing and planned activities, industrial uses and needs, requirements coming from the e-society, both with respect to policy issues and to technological and organizational ones.
The Armenian language today is still regarded as poorly digitally endowed, although there is an increasing number of data base and systems dedicated to natural language processing (OCR, HTR, lemmatizer, etc.), especially given the multiplicity of its variations (Classical Armenian or Grabar, Western Armenian, Eastern Armenian, dialects). Natural Language Processing of the Armenian language is an open and prolific field of study.
Despite the cancellation of the conference due to Covid19 pandemic, Calfa presented two articles on natural language processing of the Armenian language:
- Languages Ressources for Poorly Endowed Languages : The Case Study of Classical Armenian (see article) This article presents a state of the art for digital resources available for Classical Armenian, through the case study of the construction of the Calfa platform (digitization, OCRization, update and interconnection of the existing printed language resources) and the opportunities offered to poorly endowed languages thanks to crowdsourcing. Notably, the article describes in details the content of the dictionaries available on the platform.
- Lemmatization and POS-tagging process by using joint learning approach. Experimental results on Classical Armenian, Old Georgian, and Syriac (see article) This article was supposed to be part of the Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA). In collaboration with the GREgORI project of the Université de Louvain (Belgium), this article sets out a modus operandi for automatic processing of texts in classical languages, and the challenges of the creation of annotated corpora, through some experimental results to be completed. As of now, the developed models enable to achieve 91% correct in context lemmatization. This first result will be improved with the new data brought by Calfa in the weeks and months to come, in particular thanks to the ongoing digitization projects.