11
3. SYSTEM DESIGN AND DEVELOPMENT
3.1 File Design
The File Design outlines how the system organizes, manages, and processes
PDF documents within the application. The primary function of the file design is
to ensure that uploaded PDFs are efficiently stored, easily accessed, and processed
to extract relevant information for user queries. When a user uploads a PDF
document, the system stores it in a predefined directory, where it is saved in its
original format. A folder structure is implemented to categorize and store the PDFs
based on their metadata or user input (e.g., document name, date of upload).
To facilitate quick retrieval, the system extracts the text content from the
PDFs and stores it in a structured format, such as plain text or JSON, where the
text is segmented into logical sections, such as headings, paragraphs, tables, and
lists. This structured text is indexed to allow efficient searching and querying later
during the interaction with the chatbot. The file design also ensures that each
document undergoes a preprocessing step to clean the extracted text by removing
irrelevant content like images, footnotes, and non-text elements.
Furthermore, the system maintains logs and metadata associated with each
PDF (e.g., upload date, document size, file type) to ensure proper document
handling, troubleshooting, and auditing. In addition to text extraction, the system
might use a compression method for storing large documents to optimize disk
space usage. Overall, the file design emphasizes scalability, efficiency, and easy
access to documents, supporting a smooth user experience when querying large
volumes of data.
3.2 Input Design
The Input Design outlines how data is received and processed by the
system. Below are the 10 key steps involved in the input design for the PDF
Question Answering Chatbot:
1.User Authentication:
The user may first need to log in to the system, especially in environments
where multiple users need to access and upload documents. Authentication can be
through a simple username and password, or through an OAuth system for added
security.