Skip to content

AzkaSahar/AI-Web-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 AI-Powered Web Scraper + Q&A with Ollama + FAISS

Scrape website content, store it in a vector DB, and ask questions about it using a local LLM (Mistral via Ollama). Built with LangChain, FAISS, and Streamlit.


πŸ›  Features

  • 🌍 Web scraping using requests + BeautifulSoup
  • πŸ” Embedding text chunks via sentence-transformers
  • πŸ’Ύ Semantic search using FAISS vector database
  • πŸ€– Local LLM (Mistral via Ollama) for Q&A
  • πŸ–₯️ Easy-to-use Streamlit UI

πŸ“¦ Installation

1. Clone the repository

git clone https://github.com/AzkaSahar/AI-Web-Scraper.git
cd AI-Web-Scraper

2. Install dependencies

pip install -r requirements.txt

🧠 Prerequisites

  • Install and run Ollama on your machine
  • Pull the Mistral model:
ollama pull mistral

πŸš€ Usage

streamlit run ai_webscraper.py

πŸ’‘ How it works:

  1. Input a website URL
  2. It scrapes and stores text chunks in a FAISS index
  3. Ask a question β€” the app retrieves relevant content and passes it to the LLM
  4. The LLM answers based on that content

πŸ—‚οΈ Optional Folder Structure (if you want to organize it)


β”œβ”€β”€ ai_webscraper.py             # Main Streamlit script
β”œβ”€β”€ requirements.txt
└── README.md

πŸ“ License

MIT β€” free to use, modify, and distribute.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages