Fyllo Chatbase

This repository contains code for web scraping text data from a webpage and performing question answering using LangChain's tools and models. It provides scripts to extract information from a webpage and answer questions based on the extracted content using OpenAI's language models.

Introduction

The LangChain Web Scraping and Question Answering Repository is designed to showcase how to use LangChain's tools and models to perform web scraping and question answering tasks. It includes Python scripts that demonstrate how to:

Extract text data from a webpage using Trafilatura.
Split the extracted text data into smaller chunks for processing.
Utilize LangChain's question answering chains to answer user queries based on the extracted content.

Installation

To set up the repository and run the provided scripts, follow these steps:

Clone the repository to your local machine:

git clone https://github.com/anirudhjain26/fyllo-chatbase.git
cd fyllo-chatbase

Create a virtual environment (recommended) and activate it:

python3 -m venv venv
source venv/bin/activate

Install the required dependencies:

pip install -r requirements.txt

Usage

Web Scraping and Question Answering

Create a .env file in the root directory of the repository and set your OpenAI API key:

OPENAI_API_KEY="sk-your_key_here"

Replace your_key_here with your actual OpenAI API key.

1. Run the websiteLoader.py script to perform question answering and web scraping:

python websiteLoader.py

This script will extract information from the specified webpage and answer a predefined question using LangChain's models.

1. Run the textLoader.py script to perform question answering:

python textLoader.py

This script will use text from scrapedText.txt and answer a predefined question using LangChain's models.

Configuration

The repository includes the following files:

textLoader.py: Demonstrates how to load text data from a file and perform question answering using LangChain's tools. Requires a valid OpenAI API key and a .env file.
websiteLoader.py: Illustrates web scraping and question answering using LangChain's models. Scrapes live website data using trafilatura module. Requires a valid OpenAI API key and a .env file.
scrapedText.txt: Contains the text data extracted from the website https://www.fyllo.in/.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapedText.txt		scrapedText.txt
textLoader.py		textLoader.py
websiteLoader.py		websiteLoader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fyllo Chatbase

Table of Contents

Introduction

Installation

Usage

Web Scraping and Question Answering

Configuration

About

Releases

Packages

Languages

anirudhjain26/fyllo-chatbase

Folders and files

Latest commit

History

Repository files navigation

Fyllo Chatbase

Table of Contents

Introduction

Installation

Usage

Web Scraping and Question Answering

Configuration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages