Posts

Showing posts from September, 2019

Peeking into Graph Extraction using Infrrd’s IDC Platform

Image
Whether you’re presenting an annual report, comparing sales figures, or highlighting a trend, visual representations such as graphs or charts are a great help to understand data elements quickly. However, in today’s hyper-interactive world, it’s hard to understand why data still continues to be represented as colorful graphs. These data trapped visual representations that cause restrictions in harnessing the data or make better decisions. With an increase in document volumes and a growing number of layouts, graph extraction has become a complex process; its optimization is a struggle for many organizations. Manual data retyping is prone to up 90% error, is time-consuming, and is therefore not scalable. This grinding process also requires significant rework and is an area that could benefit from   AI-enabled automated platforms . Many enterprises already understood this and are ready to jump right into automated solutions, implement them into their business processes, and reap...

What is the best OCR extraction method on printed text?

Image
I spotted another interesting question on Quora related to  machine learning & OCR , here’s my answer: I will give you a consultant’s answer – you may not like it but here goes – “It depends”. The ‘best’  OCR extraction  method depends on the context of what you are trying to extract. My guess is that you are not talking about the OCR process itself. But, rather how to extract features out of the text that OCR spits out. ​There are two broad approaches for extraction depending on whether you know the kind of data you are dealing with (invoices, tax docs, grocery labels, etc) or you do not: DOMAIN-BASED OCR EXTRACTION This approach helps when you know beforehand the kind of data extraction you are after. Let’s say you were trying to extract features of wines from a set of wine ratings and notes that you have OCR-ed. Before you can do the feature extraction, you may consider running topic modeling algorithms on a large collection of existing wine notes to fig...