Assignment-2 (Text summarization app)

Project Descrition

This project builds upon the foundation laid by Project 1(https://github.com/BigDataIA-Fall2023-Team2/Assignment1/blob/main/part1/Readme.md), leveraging the text extracted from PDF documents to facilitate summarization and question answering. Within our application, users can pose questions pertaining to the content of the PDF, to which the app will provide insightful responses. Utilizing OpenAI APIs, we extract and analyze the text to generate accurate answers.

A distinctive feature of our application is the implementation of a chunk-based data processing approach. By segmenting the text data into manageable chunks, we enhance the efficiency of the query process. If an answer is found within a particular chunk, there's no need to process the remaining text, thereby optimizing resource utilization and expediting response times. Through this focused, chunk-driven methodology, we aim to deliver a robust and responsive user experience that makes interacting with textual data both intuitive and insightful.

Application and Documentation Link

App link - https://team2assignment2.streamlit.app/

Fast API hosted on Railway link - https://team2assignment2.up.railway.app/docs

Project Resources

Google Codelab link - https://codelabs-preview.appspot.com/?file_id=1OEvGnQV7FttHE2BCg1mvAkjxIKbMMB9NSFUwHtGoqcA#0

Google Collab Notebook link - https://github.com/BigDataIA-Fall2023-Team2/Assignment2/blob/main/Assignment_2_PDF_QA_Cookbook.ipynb

Project Demo - https://www.youtube.com/watch?v=BtnFUVyTGwo

Tech Stack

Architecture diagram

Project Flow

The application allows users to upload a file or provide a link, after which they can choose between 'Nougat' and 'PyPDF' for processing. If 'Nougat' is selected, the user must provide a Google Collab localtunnel link. Streamlit then parses the PDF using the chosen method (Nougat or PyPDF). Users can subsequently pose questions about the extracted text. Upon receiving a question, Streamlit communicates with FastAPI, sending both the text and the query. FastAPI then breaks down the data into smaller chunks and sequentially queries OpenAI for an answer from each chunk until a satisfactory answer is located.

Repository Structure

Contributions

Name	Contribution
Chinmay Gandi	Embeddings, demo replication in ipynb
Dhawal Negi	Fast API, Railway, Streamlit, Text Chunking
Shardul Chavan	Fine Tuning, demo replication in ipynb

Additional Notes

WE ATTEST THAT WE HAVEN’T USED ANY OTHER STUDENTS’ WORK IN OUR ASSIGNMENT AND ABIDE BY THE POLICIES LISTED IN THE STUDENT HANDBOOK.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.devcontainer		.devcontainer
fastapi		fastapi
streamlit		streamlit
.env.example		.env.example
Assignment_2_PDF_QA_Cookbook.ipynb		Assignment_2_PDF_QA_Cookbook.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment-2 (Text summarization app)

Project Descrition

Application and Documentation Link

Project Resources

Tech Stack

Architecture diagram

Project Flow

Repository Structure

Contributions

Additional Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

BigDataIA-Fall2023-Team2/Assignment2

Folders and files

Latest commit

History

Repository files navigation

Assignment-2 (Text summarization app)

Project Descrition

Application and Documentation Link

Project Resources

Tech Stack

Architecture diagram

Project Flow

Repository Structure

Contributions

Additional Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages