Web API for ethiopic script optical character recognition
girmashume1
12 views
13 slides
Mar 01, 2025
Slide 1 of 13
1
2
3
4
5
6
7
8
9
10
11
12
13
About This Presentation
A web api to perform OCR on Ethiopic Scripts
Size: 87.85 KB
Language: en
Added: Mar 01, 2025
Slides: 13 pages
Slide Content
By Menelik Berhan
About Myself I’m from Addis Ababa, Ethiopia. Studied Civil Engineering @ Addis Ababa University. Learning @ALXSE to be a full stack developer. Learning Machine Learning & Deep Learning Specializations @Coursera I’m the sole member of the team.
Background I’m a long time, self- confessed history ‘fanatic’, specifically African history. In pursuing this interest of mine, I try to read old Ethiopian historical documents written in Amharic and Gee’z (both use Ethiopic Script). And most of the documents are only available in hard copy and are not easily accessible for research and analysis. For a long time, even before starting @ALX, I have wished for a way to digitize these documents and make them readily & easily accessible.
Background… And as fate would have it, I got an opportunity to enroll @ALX & study computer programming. And as my first step in developing a solution for digitizing historical documents written in Ethiopic script, I had developed a basic CLI app that performs OCR for the foundations portfolio project. And it was my plan to add a web API for the app in the specializations portfolio project ( Github Link to CLI OCR App ) The CLI app was a basic implementation developed using Google’s open source Tesseract OCR engine that uses an LSTM (long short term memory) neural network model. Though I lacked the necessary knowledge base in the world of AI, during the implementation I got the chance to get to know the basics and use it for app.
Background… But the CLI app is a very basic implementation that lacked proper abstractions and structure. For example, most features are implemented using scripts and functions, and this significantly affects the reusability & scalability of most features. In addition, due to my lack of deep knowledge in the world of AI, it didn’t fully utilize the whole capability provided by Tesseract OCR LSTM engine. For example, the Tesseract python wrapper has more than 600 configurable parameters that can be used to get a more efficient and accurate results. But due to the time limit I had at the time I opted not to use them and implement the CLI app using defaults. This made the results satisfactory, but not the best they can be.
Background… Now after furthering my studies in Machine Learning by taking a couple of courses (thanks to Coursera’s financial aid), I’m planning to: Overhaul the whole model by implementing logical abstractions & structure (using classes to abstract models (like images, pdfs..)) Add more features (like saving input & output data and accuracy for future model training, inplace output (writing outputs on the input image or pdf)...) And mainly implement a web API for the OCR app using concepts I’ve learnt in the specialization leg of ALXSE course.
Learning Objectives In this project I plan to learn & develop my skills in: Developing logical abstractions & app structure Configuring, using and fine tuning (training) LSTM neural networks Image processing Asynchronous processes in python Handle user Authentication & file transfer Manage NoSql dbs And mainly Implementing a fast & efficient API
Technologies The API will be implemented using python & different libraries and frameworks. For the OCR process: pytesseract - a python wrapper for Google's open source text recognition (OCR) Tesseract- OCR. opencv- python - a library of Python bindings designed to solve computer vision problems. Will be used for preprocessing the input images for better results. pdf2imgaes - a python library that will be used for reading images from pdf files. fpdf2 - a python library that will be used for writing OCR outputs to PDF files. python- docx - a python library that will be used for writing OCR outputs to Microsoft Word (docx) files.
Technologies… For the web API: FastAPI - a modern, fast (high- performance), web framework for building APIs with Python 3.8+ based on standard Python type hints. Uvicorn - an ASGI (Asynchronous Server Gateway Interface) web server implementation for Python used for production. MongoDb - a NoSql database to be used for storing model & app data. SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper used to manage connections to database. Redis - python interface to the Redis key- value store, used for storing authentication related user data.
Features Perform OCR on image files (jpeg, jpg, png) Perform OCR on pdf files Accept a batch inputs Outputs OCR result as strings, plain text file (txt), MS Word file (docx) or pdf file. Option to join output into one file for multiple input files. Option to set tesseract configuration and set output file formatting parameters Option to display average character recognition confidence level. Option to select either simple or detailed image preprocessing. Set and get default parameters (like input/output directory, tesseract configuration parameters and formatting and style for output files)
Challenges Learning new framework in a short time. (FastAPI) Handling large data in web request and response asynchronously using python. Configuration of tesseract. Storage of input & outputs for future training & fine tuning of the underlying AI model. Creating a scalable app logic and structure.