Calfa OCR
Dedicated to oriental languages and manuscripts

Use AI-based models to run text recognition and extract data from your scanned documents, archives, books, and push them to their full digital potential.

Discover Try online (Arabic or Armenian)

Powerful generic models for text recognition in Arabic and Armenian

To process printed or simple, handwritten documents in Arabic or Armenian script, our ready-to-use general AI models offer you and economical solution for an excellent performance. Try it now online.

Creation of custom AI models able to read the most complex corpora

For any corpora in non-Western language, dialect or from a complex hand : get an unmatched text recognition accuracy by using a custom AI model.

Our offers

Generic models

Direct OCR/HTR

Immediate processing of regular documents by our general AI models

Processing

Per page
TXT and XML

Custom

Research Plan

Bring your own data or create it online

Development

1 model customized on your corpus

Processing

3500 p. included
TXT and XML

Custom

Custom project

Data creation

By our experts according to your requirements

Development

Customized models for each of your needs

Processing

Large page volumes
Custom data formatting

Overcome the obstacles

Text recognition on the most complex handwritings

95% to 99% of accuracy on average

View a sample

Page layout detection and analysis

Titles, subtitles, notes etc. are detected and labelled

View a sample

Curved or vertical lines of text

All line orientations are natively supported

View a sample

Noisy scans, damaged pages

We deal with it through customized trainings

View a sample

Mixed alphabets and multi-languages

Processed even when mixed within the text

View a sample

Left-to-right, right-to-left, top-to-bottom...

Powerful whatever the reading direction is

View a sample

Use Cases

Analysis of Ancient Greek manuscripts of the Iliad, to automatically distinguish main text from paraphrase and commentary in a very complex layout.

Indexation and digitization of the Armenian Dulaurier collection from the BULAC at Paris

To create digital editions of the Corpus Scriptorum Christianorum Orientalium (CSCO) texts from printed books

Technical features

Supported languages

Arabic languages Armenian Chinese Hebrew Georgian Syriac Ancient Greek Modern Greek ...

Other languages on demand.

Features

Text Recognition on very complex writings (handwritten and printed)

Page layout analysis

Auto keywording and semantic classification

Curved text, vertical lines, damaged pages, noisy scans

Mixed alphabets and multi-languages

Data input

PDF
Image file (JPG, PNG, TIFF...)
Color and B&W
IIIF server

Data output

TXT, DOC, ODT, PDF

IIIF server

PDF with text overlay

ALTO

PageXML

Others on demand

Learn more

Frequently Asked Questions

Yes, Calfa OCR is specially developed to recognize manuscripts. The oldest manuscripts we processed was from the 9^th Century, the most recent from the 20^th.

Calfa OCR can be run on many writing styles. When necessary, for very special handwritings, we include a training phase in the project to adapt the OCR recognition.

The recognition rate is the percentage of correctness in the text recognition compared to the document. It varies depending on the handwriting style, font layout and scan quality. Feel free to request a demo to get a view on the recognition rate Calfa OCR can reach on your documents.

Yes, Calfa OCR also recognizes typed documents like newspapers pages, machine-typed, letters etc. in applicable languages.

Text recognition (OCR/HTR)

Automated text and document analysis

Models, tools, datasets

Our commitments

Armenian-French-English Dictionary

Log in to your account

Access OCR/HTR Demo

Contact Sales

français

Calfa OCRDedicated to oriental languages and manuscripts

Powerful generic models for text recognition in Arabic and Armenian

Creation of custom AI models able to read the most complex corpora

Our offers

Generic models

Direct OCR/HTR

Processing

Custom

Research Plan

Development

Processing

Custom

Custom project

Data creation

Development

Processing

Overcome the obstacles

Text recognition on the most complex handwritings

Page layout detection and analysis

Curved or vertical lines of text

Noisy scans, damaged pages

Mixed alphabets and multi-languages

Left-to-right, right-to-left, top-to-bottom...

Use Cases

Have a try on your documents

Technical features

Supported languages

Features

Data input

Data output

Frequently Asked Questions

Can I use Calfa OCR for handwritten pages ?

Does it work with all kinds of handwritings ?

What is the recognition rate of Calfa OCR ?

Does the OCR also work with typed documents ?

Calfa OCR
Dedicated to oriental languages and manuscripts