Documentation to help setup and run the LangChain Document Analysis App. The app uses langchain ConversationalRetrievalChain with ChromaDB and PyPDF2 to load, store, and analyze pdf files. Google Docslinks are also supported with GoogleDriveLoader from LangChain, requiring additional setup via OAuth2.
Interface is designed with Streamlit, and a version of this application without Google Docs support is publically deployed. As of this point, Google Docs support is only available on the locally hosted version (due to needing to generate a credentials.json and token.json file for OAuth2), but I will try to make it publicly deployed ASAP!
graph TD
A[Start] --> B[Initialize Session States]
B --> C[Render User Interface]
C -->|Enter OpenAI Key| D[Process OpenAI Key]
D --> E[Check User Input]
C -->|Upload File| F[Load PDF Files]
F --> E
C -->|Enter Google Docs URL| G[Load Google Docs]
G --> E
E -->|Enter a Prompt| H{API Key Entered?}
H -->|Yes| I[Process Entered Prompt]
I --> J[Show Past Queries and Answers]
H -->|No| K[Show Error Message]
K --> J
J --> L[End]
- Python 3.6 or later
- OpenAI API key
- (for Google Docs support) Google Cloud Platform (GCP) account with a project
- Create a virtual environment
python -m venv lcdocsenv
- Activate it:
- Windows:
.\lcdocsenv\Scripts\activate
- Mac:
source lcdocsenv/bin/activate
- Windows:
- Clone this repo:
git clone https://github.com/spycoderyt/langchaindocanalysis
- Go into the directory
cd langchaindocanalysis
- Install necessary Python packages using pip:
pip install -r requirements.txt
- Start the app
streamlit run streamlit_app.py
- Have an existing Google Cloud Project or create a new one:
- Enable the Google Drive API
- Authorize credentials for a desktop application
- Move the secret credentials .json file to the
langchaindocanalysis
directory - Run the script to generate a token.json file
python setup_gdrive_api.py
. You won't have to sign-in again as long as this file exists in your project directory. - Start the app
streamlit run streamlit_app.py
- You can change the OpenAI GPT Model in line 41 of streamlit_app.py.
- Feel free to send a pull request for bug fixes and adding additional features :)
LG Chain used: ConversationalRetrievalChain
Inspiration: nicknochnack's Leveraging Your Own Documents in a Langchain Pipeline
👨🏾💻 Author: Jirat Chiaranaipanich