Web API for ethiopic script optical character recognition

girmashume1 12 views 13 slides Mar 01, 2025

Slide 1 of 13

About This Presentation

A web api to perform OCR on Ethiopic Scripts

Size: 87.85 KB

Language: en

Added: Mar 01, 2025

Slides: 13 pages

Slide Content

By Menelik Berhan

About Myself I’m from Addis Ababa, Ethiopia. Studied Civil Engineering @ Addis Ababa University. Learning @ALXSE to be a full stack developer. Learning Machine Learning & Deep Learning Specializations @Coursera I’m the sole member of the team.

Background I’m a long time, self- confessed history ‘fanatic’, speciﬁcally African history. In pursuing this interest of mine, I try to read old Ethiopian historical documents written in Amharic and Gee’z (both use Ethiopic Script). And most of the documents are only available in hard copy and are not easily accessible for research and analysis. For a long time, even before starting @ALX, I have wished for a way to digitize these documents and make them readily & easily accessible.

Background… And as fate would have it, I got an opportunity to enroll @ALX & study computer programming. And as my ﬁrst step in developing a solution for digitizing historical documents written in Ethiopic script, I had developed a basic CLI app that performs OCR for the foundations portfolio project. And it was my plan to add a web API for the app in the specializations portfolio project ( Github Link to CLI OCR App ) The CLI app was a basic implementation developed using Google’s open source Tesseract OCR engine that uses an LSTM (long short term memory) neural network model. Though I lacked the necessary knowledge base in the world of AI, during the implementation I got the chance to get to know the basics and use it for app.

Background… But the CLI app is a very basic implementation that lacked proper abstractions and structure. For example, most features are implemented using scripts and functions, and this signiﬁcantly affects the reusability & scalability of most features. In addition, due to my lack of deep knowledge in the world of AI, it didn’t fully utilize the whole capability provided by Tesseract OCR LSTM engine. For example, the Tesseract python wrapper has more than 600 conﬁgurable parameters that can be used to get a more efﬁcient and accurate results. But due to the time limit I had at the time I opted not to use them and implement the CLI app using defaults. This made the results satisfactory, but not the best they can be.

Background… Now after furthering my studies in Machine Learning by taking a couple of courses (thanks to Coursera’s ﬁnancial aid), I’m planning to: Overhaul the whole model by implementing logical abstractions & structure (using classes to abstract models (like images, pdfs..)) Add more features (like saving input & output data and accuracy for future model training, inplace output (writing outputs on the input image or pdf)...) And mainly implement a web API for the OCR app using concepts I’ve learnt in the specialization leg of ALXSE course.

Learning Objectives In this project I plan to learn & develop my skills in: Developing logical abstractions & app structure Conﬁguring, using and ﬁne tuning (training) LSTM neural networks Image processing Asynchronous processes in python Handle user Authentication & ﬁle transfer Manage NoSql dbs And mainly Implementing a fast & efﬁcient API

Technologies The API will be implemented using python & different libraries and frameworks. For the OCR process: pytesseract - a python wrapper for Google's open source text recognition (OCR) Tesseract- OCR. opencv- python - a library of Python bindings designed to solve computer vision problems. Will be used for preprocessing the input images for better results. pdf2imgaes - a python library that will be used for reading images from pdf ﬁles. fpdf2 - a python library that will be used for writing OCR outputs to PDF ﬁles. python- docx - a python library that will be used for writing OCR outputs to Microsoft Word (docx) ﬁles.

Technologies… For the web API: FastAPI - a modern, fast (high- performance), web framework for building APIs with Python 3.8+ based on standard Python type hints. Uvicorn - an ASGI (Asynchronous Server Gateway Interface) web server implementation for Python used for production. MongoDb - a NoSql database to be used for storing model & app data. SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper used to manage connections to database. Redis - python interface to the Redis key- value store, used for storing authentication related user data.

Features Perform OCR on image ﬁles (jpeg, jpg, png) Perform OCR on pdf ﬁles Accept a batch inputs Outputs OCR result as strings, plain text ﬁle (txt), MS Word ﬁle (docx) or pdf ﬁle. Option to join output into one ﬁle for multiple input ﬁles. Option to set tesseract conﬁguration and set output ﬁle formatting parameters Option to display average character recognition conﬁdence level. Option to select either simple or detailed image preprocessing. Set and get default parameters (like input/output directory, tesseract conﬁguration parameters and formatting and style for output ﬁles)

Challenges Learning new framework in a short time. (FastAPI) Handling large data in web request and response asynchronously using python. Conﬁguration of tesseract. Storage of input & outputs for future training & ﬁne tuning of the underlying AI model. Creating a scalable app logic and structure.

Web API for ethiopic script optical character recognition

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Web API for ethiopic script optical character recognition

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

TLE-9-Prepare-Salad-and-Dressing.pptxkkk

LESSON 1 ABOUT MEDIA AND INFORMATION.pptx

GRADE-8-AQUACULTURE-WEEKQ1.pdfdfawgwyrsewru

Feelings PP Game FOR CHILDREN IN ELEMENTARY SCHOOL.pptx

Jeopardy_Figures_of_Speech_Template.pptx [Autosaved].pptx

Jeopardy_Figures_of_Speech.pptxvdsvdsvsdvsd