Transformer-Based OCR
As you probably already know, Optical Character Recognition (OCR) is the electronic conversion of images of typed, handwritten, or printed text into machine-encoded text. The source can be a scanned document, a photo of a document, or a subtitle text imposed on an image. OCR converts such sources into machine-readable text. Let’s understand how an OCR pipeline works before we dig deeper into Transformer Based OCR. A typical OCR pipeline consists of two modules. 1. A Text Detection Module 2. A Text Recognition Module Text Detection Module Text Detection module as the name suggests detects where text is present in the source. It aims to localize all the text blocks within the text image, either at word level (individual words) or text line level. This task is comparable to an object detection problem only here the object of interest is the text blocks. Popular object detection algorithms include YOLOv4/5, Detectron, Mask-RCNN, etc. To understand Object Detection using YOLO cl...