Professionals
Rare language or dialect, handwriting legible by specialists only: an OCR/HTR model customized on your documents is capable of providing a qualitative transcription.
The Research Plan allows you to bring your own data to train a model to transcribe a very complex corpus, for the maximum price of 1€ per page.
1. We define together the requirements of your project
Depending on the specificities of your documents, we estimate the expected recognition quality and the data required by the project.
1. You provide the transcribed data
If the data is to be created : we define together the data necessary for your model, we provide access to our semi-automated annotation tool online Calfa Vision and assistance provided by an annotation project manager.
3. We train a dedicated model specialized on your data
Depending on the results achieved, you can add a second set of data to your project for fine-tuning.
4. Your whole corpus is processed with the final model
You receive the whole transcription of your corpus.
About 6,000 pages of a collection of Arabic manuscripts from the 15th century have been processed for automated transcription, using an HTR model customized with the data created by the research laboratory during a collaborative data creation workshop.
Languages: all
Layout: all
The Research Plan includes: assistance provided by an annotation project manager during data creation (if the step is needed), data reception and preparation, training of a customized model, text recognition of the corpus, data formatting and results delivery
Format de sortie : raw text (TXT, XML), text with coordinates in the page (XML)
Research Plan: 3,500€ including the processing of 3,500 pages (1€/page)
Additional pages: 0.40€/page rate
Before subscribing to this plan, we can offer an assessment of your corpus to check the feasibility of a specialized model training and whether it can yield good results.
If data creation is required, the Research Plan includes support from one of our annotation project managers for this step.
Upon assessing your corpus, we will also provide you with an estimate of the volume of data required.
You can also choose to delegate us the task of data creation as part of a custom project.
Please note that as part of the Research Plan, the decicated model created is not open source and cannot be downloaded.
You have a question regarding the Research Plan or the feasibility of OCR/HTR processing on your corpus. Contact us.