pythonpro.review.pptxffgvcvgvvvvvvgguuhbbb

ShanmugaPriya886411 10 views 8 slides Oct 03, 2024
Slide 1
Slide 1 of 8
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8

About This Presentation

5d8fusrduysduiff7d7ysfudcuuccucudfuucuci


Slide Content

PDF TO TEXT CONVERTER BY:- S.SHANMUGAPRIYA-23BAD086 S.AARUMUGAPAVITHRA-23BAD064 S.PRIYADHARSHINI-23BAD082

PROBLEM STATEMENT: As the modern world gets digitalized, it is more and more necessary to extract text from PDF documents for purposes such as data analysis or content processing. There is a versatile ecosystem of Python libraries that can work with different file formats including PDFs. Python Libraries for PDF Extraction PyPDF2 . Tika . Textract . PyMuPDF . PDFtotext . PDFminer . Tabula.

OBJECTIVE: To create a pdf to text converter to work with files that need to analyzed and modified based on one’s need. What is PyPDF2? PyPDF2 serves as a library for handling PDF files in Python language. It supports functionalities like extracting texts out of them, merging them, splitting them into smaller parts, cropping their pages, and manipulating them programmatically. This makes it easy for us to extract texts from those files and play around with it.

OUTCOME: The outcome of a PDF to text converter is primarily a plain text file containing the extracted textual content from the PDF. Output File : The converter typically generates a text file (e.g., output.txt) with the extracted content. The user may be prompted to download this file or it might be automatically saved to a specified location.

CURRENT STATUS:- Front-end codes and Back-end codes are implemented for a local server. The Code has been tested for a local pdf file and converted to text which automatically gets downloaded. The Simple web interface has been implemented using HTML. The Python Library ( PyPDF2) and (Flask) has been installed and imported for the Backend.

A Local Hosted-Simple Converter Interface Converted text file

PLATFORM AND LANGUAGES:- PROGRAMMING LANGUAGE : Python FRONTEND ENVIRONMENT : HTML BACKEND ENVIRONMENT: Flask(a light-weight web framework). FRAMEWORKS AND LIBRARIES: Flask (backend), PyPDF2 (PDF processing ). DEVELOPMENT ENVIRONMENT : Local machine (using Python), and possible cloud platforms for deployment.

REFERENCES :- https:// www.geeksforgeeks.org/convert-pdf-to-txt-file-using-python/ https://ironpdf.com/python/blog/using-ironpdf-for-python/how-to-convert-pdf-to-text-python / AI Support :- https ://chatgpt.com/
Tags