...

Calfa OCR
Dedicated to oriental languages and manuscripts

Process massively and extract data from your scanned documents, archives, books, and push them to their full digital potential.

Supported languages

Arabic languages Armenian Chinese Hebrew Georgian Syriac Ancient Greek Modern Greek ...

Other languages on demand.

Features

  • Text Recognition on very complex writings (handwritten and printed)
  • Page layout analysis
  • Auto keywording and semantic classification
  • Curved text, vertical lines, damaged pages, noisy scans
  • Mixed alphabets and multi-languages
  • Data input

    • PDF
    • Image file (JPG, PNG, TIFF...)
    • Color and B&W
    • IIIF server

    Data output

  • TXT, DOC, ODT, PDF
  • IIIF server
  • PDF with text overlay
  • ALTO
  • PageXML
  • Others on demand
  • Specificities

    Powerful and unique features

    Text recognition on the most complex handwritings

    95% to 99% of accuracy on average

    Page layout detection and analysis

    Titles, subtitles, notes etc. are detected and labelled

    Curved or vertical lines of text

    All line orientations are natively supported

    Noisy scans, damaged pages

    We deal with it through customized trainings

    Mixed alphabets and multi-languages

    Processed even when mixed within the text

    Left-to-right, right-to-left, top-to-bottom...

    Powerful whatever the reading direction is

    Our offers

    Research Forfait Calfa Vision

    Research plan

    For researchers who would like to develop an OCR engine dedicated to their corpus, we designed a plan including :

    • access to a semi-automated annotation tool to prepare faster and easier the data required by the training
    • up to 3 trainings to adapt text recognition models to the corpus
    • recognition of 3 000 pages included, additional pages at attracting price

    Researchers Research labs Contact us
    Calfa OCR sur mesure

    Custom project

    We provide text-recognition as a service, with powerful, custom features, to address the challenges of your digitization projects:

    • Layout and text recognition models specially trained on your documents
    • Custom deliverables and output formats
    • Many features available in text analysis and document understanding
    • Data exchange through private online repository or API

    Businesses Institutions Governments Contact us

    They are using Calfa OCR

    Have a try

    Send us a sample of the document you would like to digitize to get a Calfa OCR demonstration

    Contact us

    Frequently Asked Questions

    1

    Can I use Calfa OCR for handwritten pages ?

    Yes, Calfa OCR is specially developed to recognize manuscripts. The oldest manuscripts we processed was from the 9th Century, the most recent from the 20th.

    2

    Does it work with all kinds of handwritings ?

    Calfa OCR can be run on many writing styles. When necessary, for very special handwritings, we include a training phase in the project to adapt the OCR recognition.

    3

    What is the recognition rate of Calfa OCR ?

    The recognition rate is the percentage of correctness in the text recognition compared to the document. It varies depending on the handwriting style, font layout and scan quality. Feel free to request a demo to get a view on the recognition rate Calfa OCR can reach on your documents.

    4

    Does the OCR also work with typed documents ?

    Yes, Calfa OCR also recognizes typed documents like newspapers pages, machine-typed, letters etc. in applicable languages.