paint-brush
Machine Learning is the Wrong Way to Extract Data From Most Documentsby@sensible
6,083 reads
6,083 reads

Machine Learning is the Wrong Way to Extract Data From Most Documents

by Sensible6mJuly 26th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

In the late 1960s, the first OCR (optical character recognition) techniques turned scanned documents into raw text. Google, Microsoft, and Amazon provide high-quality OCR as part of their cloud services offerings. But documents remain underused in software toolchains, and valuable data languish in PDFs. The challenge has shifted from identifying text in documents to turning them into structured data suitable for direct consumption by software-based workflows or direct storage into a system of record. The best way to turn the vast majority of documents into. structured data is to use a next generation of powerful, flexible templates that find data in a document much as a person would.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Machine Learning is the Wrong Way to Extract Data From Most Documents
Sensible HackerNoon profile picture
Sensible

Sensible

@sensible

Fast & flexible data extraction from documents.

L O A D I N G
. . . comments & more!

About Author

Sensible HackerNoon profile picture
Sensible@sensible
Fast & flexible data extraction from documents.

TOPICS

Languages

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite