Drake Pocsatko: We have HOW many documents? We have HOW many Documents? Architecting Modern Document Processing

awschicago 18 views 14 slides Jun 24, 2024
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

Drake Pocsatko
We have HOW many documents? We have HOW many Documents? Architecting Modern Document Processing
AWS Community Day Midwest 2024


Slide Content

MIDWEST | OHIO

We have HOW
many documents?
Architecting Modern Document Processing
Drake Pocsatko
Sr Consultant Cloud Enablement
Slalom Ohio

Agenda
•About me
•What are OCR, NLP, and IDP?
•Why automate document processing?
•The Tools for the Job
•Sample Architectures
•Extraction Demo

About Me
•B.S. Computer Science & Engineering – The Ohio State University
•B.A. Physics – Washington & Jefferson College
•Pittsburgh, PA born and raised
•8 years in Columbus, OH
•~3 years with Slalom
•Slalom is a next-generation professional services company
creating value at the intersection of business, technology, and
humanity. Markets all over the world, keeping local
connections.
•Wife, Lauren, of 6 years. 2 dogs & a cat

What are OCR, NLP, and IDP?
•Optical Character Recognition (OCR)
•The ability for software to recognize characters in an image and to convert those characters to
text.
[1]

•Natural Language Processing (NLP)
•A machine learning technology that gives computers the ability to interpret, manipulate, and
comprehend human language.
[2]

•Intelligent Document Processing (IDP)
•Automating the process of manual data entry from paper-based documents or document
images to integrate with other digital business processes.
[3]



[1] Getting started with optical character recognition – AWS
[2] What is Natural Language Processing (NLP)? – AWS
[3] What is Intelligent Document Processing? - AWS

Why Automate Document Processing?

The Problem with Document Processing
•Client “Z” – national healthcare payor
•Z still utilizes paper documents equal to the digital equivalent of petabytes of
data per year or billions of pieces of paper.
•Z annually hires numerous contingent workers to process and adjudicate
these documents.
•High variance across documents AND within document types.

The Benefits of Intelligent Document Processing
•Manual processing for contracts
•1 contract processed by 1 human per 1 day with an average of
~20-25% rate of rework
•Intelligent processing for contracts
•50K contracts processed by IDP every 8 hours with an average of ~7%
needing HIL (human in the loop) review.

The Tools for the Job

OCR NLP

Sample Architecture

Extraction Workflow
OCR no NLP

Redaction Workflow
OCR & NLP

Demo Time!