GitHub - avrtt/QASATIK: LLM-based Q&A on preloaded docs, raw data, Wikipedia articles and scraped web pages with knowledge graphs, analytics, charts and Streamlit interface

Documentation

QASATIK is an LLM-based Q&A app dedicated to helping you interrogate large volumes of documents, data files and web pages. Initially a part of my freelance project, it became a standalone fork thanks to the client's permission.

Built with Streamlit, QASATIK supports file uploads, online article scraping and querying using configurable language models (OpenAI, LangChain and LlamaIndex). In addition, it provides interactive knowledge graph visualizations, analytics and charting utilities to help you explore and understand your data.

Features

Document & web Q&A: ask questions based on your local documents, CSV/Excel spreadsheets, articles and web pages (ingest and store data in an SQLite database; your questions will be translated to SQL queries)
Direct Wikipedia Q&A: support for querying Wikipedia articles using the Wikipedia API
Active storage: uploaded files and scraped web content are persistently stored for fast future queries
Knowledge graph: automatically generate and display a knowledge graph based on your query results
Interactive analytics: visualize data trends with advanced charting features, explore statistics and cost estimates through Plotly dashboards
Huggingface QA demo: alternative Q&A pipeline using HF transformers with ensemble support
Extensible configuration: easily configure API keys, language models, cost estimation and more

Structure

.
├── .streamlit/
│   ├── config.toml
│   └── secrets.toml.sample
├── .vscode/
│   └── settings.json
├── components/
│   └── graph_vis.py
├── data/
│   ├── sample.csv
│   └── sample.xlsx
├── db/
│   └── gptdb.sqlite3
├── docs/
│   └── documentation.md
├── images/
│   └── logo.png
├── static/
│   ├── knowledge_graph.gv
│   └── knowledge_graph.png
├── storage/
├── src/
│   ├── __init__.py
│   ├── main.py
│   ├── config.py
│   ├── state_manager.py
│   ├── common.py
│   ├── prompts.py
│   ├── analytics.py
│   ├── web_scraper.py
│   ├── data_loader.py
│   ├── wikipedia_qa.py
│   ├── huggingface_qa.py
│   ├── llm_client.py
│   ├── query_data.py
│   ├── query_docs.py
│   ├── generate_knowledge_graph.py
│   └── visualization.py
├── tests/
│   ├── test_app_state.py # deprecated
│   ├── test_common.py # deprecated
│   ├── test_visualizations.py # deprecated
│   ├── test_data_loader.py
│   ├── test_web_scraper.py
│   ├── test_llm_client.py
│   └── test_query_data.py
├── debug/
│   └── streamlit_debug.py # deprecated
├── .gitignore
├── README.md
├── requirements.txt
└── run_app.sh

Dependencies

streamlit
langchain-experimental
llama-index
llama-cpp-python
sentence_transformers
weaviate-client
openai
sqlalchemy
debugpy
openpyxl
PyPDF2
pypdf
docx2txt
PyCryptodome
graphviz
networkx
beautifulsoup4
colorama
newspaper3k
htmldate
datefinder
retry
pandas
pytest
wikipedia
plotly
transformers
torch
huggingface_hub

Installation

Clone:

git clone git@github.com:avrtt/QASATIK.git
cd QASATIK

Create and activate a virtual environment (optional):

python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Configure secrets: copy .streamlit/secrets.toml.sample to .streamlit/secrets.toml and fill in your OpenAI and Weaviate API keys along with the deployment flag.

Run:

chmod +x run_app.sh
./run_app.sh

You can also run the app using:

streamlit run src/main.py --server.port=4010

Usage

Upload files or enter URLs in the "Document Q&A" tab. Click "Index Documents" to build the index and then ask questions to get answers and visualizations.

Toggle between interactive graph views, static graph images, or raw JSON of the knowledge graph.

Testing

Run the unit tests (from the project root):

pytest tests/

Contributing

PRs and issues are welcome.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Structure

Dependencies

Installation

Usage

Testing

Contributing

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.streamlit		.streamlit
components		components
data		data
debug		debug
docs		docs
images		images
src		src
static		static
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_app.sh		run_app.sh

License

avrtt/QASATIK

Folders and files

Latest commit

History

Repository files navigation

Features

Structure

Dependencies

Installation

Usage

Testing

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages