QASATIK is an LLM-based Q&A app dedicated to helping you interrogate large volumes of documents, data files and web pages. Initially a part of my freelance project, it became a standalone fork thanks to the client's permission.
Built with Streamlit, QASATIK supports file uploads, online article scraping and querying using configurable language models (OpenAI, LangChain and LlamaIndex). In addition, it provides interactive knowledge graph visualizations, analytics and charting utilities to help you explore and understand your data.
- Document & web Q&A: ask questions based on your local documents, CSV/Excel spreadsheets, articles and web pages (ingest and store data in an SQLite database; your questions will be translated to SQL queries)
- Direct Wikipedia Q&A: support for querying Wikipedia articles using the Wikipedia API
- Active storage: uploaded files and scraped web content are persistently stored for fast future queries
- Knowledge graph: automatically generate and display a knowledge graph based on your query results
- Interactive analytics: visualize data trends with advanced charting features, explore statistics and cost estimates through Plotly dashboards
- Huggingface QA demo: alternative Q&A pipeline using HF transformers with ensemble support
- Extensible configuration: easily configure API keys, language models, cost estimation and more
.
├── .streamlit/
│ ├── config.toml
│ └── secrets.toml.sample
├── .vscode/
│ └── settings.json
├── components/
│ └── graph_vis.py
├── data/
│ ├── sample.csv
│ └── sample.xlsx
├── db/
│ └── gptdb.sqlite3
├── docs/
│ └── documentation.md
├── images/
│ └── logo.png
├── static/
│ ├── knowledge_graph.gv
│ └── knowledge_graph.png
├── storage/
├── src/
│ ├── __init__.py
│ ├── main.py
│ ├── config.py
│ ├── state_manager.py
│ ├── common.py
│ ├── prompts.py
│ ├── analytics.py
│ ├── web_scraper.py
│ ├── data_loader.py
│ ├── wikipedia_qa.py
│ ├── huggingface_qa.py
│ ├── llm_client.py
│ ├── query_data.py
│ ├── query_docs.py
│ ├── generate_knowledge_graph.py
│ └── visualization.py
├── tests/
│ ├── test_app_state.py # deprecated
│ ├── test_common.py # deprecated
│ ├── test_visualizations.py # deprecated
│ ├── test_data_loader.py
│ ├── test_web_scraper.py
│ ├── test_llm_client.py
│ └── test_query_data.py
├── debug/
│ └── streamlit_debug.py # deprecated
├── .gitignore
├── README.md
├── requirements.txt
└── run_app.sh
streamlit
langchain-experimental
llama-index
llama-cpp-python
sentence_transformers
weaviate-client
openai
sqlalchemy
debugpy
openpyxl
PyPDF2
pypdf
docx2txt
PyCryptodome
graphviz
networkx
beautifulsoup4
colorama
newspaper3k
htmldate
datefinder
retry
pandas
pytest
wikipedia
plotly
transformers
torch
huggingface_hub
-
Clone:
git clone git@github.com:avrtt/QASATIK.git cd QASATIK
-
Create and activate a virtual environment (optional):
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure secrets: copy
.streamlit/secrets.toml.sample
to.streamlit/secrets.toml
and fill in your OpenAI and Weaviate API keys along with the deployment flag. -
Run:
chmod +x run_app.sh ./run_app.sh
You can also run the app using:
streamlit run src/main.py --server.port=4010
Upload files or enter URLs in the "Document Q&A" tab. Click "Index Documents" to build the index and then ask questions to get answers and visualizations.
Toggle between interactive graph views, static graph images, or raw JSON of the knowledge graph.
Run the unit tests (from the project root):
pytest tests/
PRs and issues are welcome.
MIT