OCR and Extraction

“NextAce has enabled us to bring about automation thus resulting in staff optimization. It’s a great tool which reduces opportunity for human error and speedier processing.”

Jyoti S. Khandelwal | Team Lead, Title Insurance Services, Altisource

Document Scanning and Extraction

Patented processes deliver
accurate results

Optical Character Recognition (OCR) – Once all documents and images are retrieved, NextAce uses OCR to turn images into text. OCR is a process where the content of document images is transformed into computer-readable characters. Using a patented process, NextAce’s OCR process collects over a megabyte of information from an image (the metadata) and stores it in a location for other internal applications and process uses. The NextAce OCR process typically uses multiple unique algorithms and processes to achieve a very high accuracy level. Each process plays a role in identifying the characters in a document. The results of these processes are compared, weighted, and combined to increase accuracy.

Patterned Data ExtractionTM (PDE) – The NextAce Patterned Data Extraction (PDE) process takes OCR and document data extraction information to the next level. PDE recognizes and identifies the documents and all aspects of each variable to establish the relevance, role, and criticality while ranking the results of the information discovered. It sets the industry standard for accuracy while extracting data from unstructured OCR text.

This patented process uses patterns associated with each document type, state, and county which instructs PDE as to the exact conditions under which to extract information. A given variable can have any number of patterns and each pattern contains hundreds of associated conditions. NextAce may have to process a given document thousands of times to extract the relevant information correctly. It is this vast array of patterns and their associated conditions that allow for the review of virtually any document format and associate structured data with it. Of course, the time-consuming portion of the process lies in the initial setup and definition of the patterns for each variable. Fortunately, NextAce has been processing recorded and mortgage-related documents since its founding in 2003. Based on this experience, NextAce has substantially reviewed, defined, and created patterns for millions of documents and their associated variables.

NextAce utilizes several libraries and supporting databases to improve the overall accuracy of the extracted data including counties, cities, states, dates, gender and ethnicity identification, corrections tables for commonly misspelled words, regular expressions for improvement in locating accurate data, and more.

