Skip to content
/ QASATIK Public

LLM-based Q&A on preloaded docs, raw data, Wikipedia articles and scraped web pages with knowledge graphs, analytics, charts and Streamlit interface

License

Notifications You must be signed in to change notification settings

avrtt/QASATIK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Documentation


QASATIK is an LLM-based Q&A app dedicated to helping you interrogate large volumes of documents, data files and web pages. Initially a part of my freelance project, it became a standalone fork thanks to the client's permission.

Built with Streamlit, QASATIK supports file uploads, online article scraping and querying using configurable language models (OpenAI, LangChain and LlamaIndex). In addition, it provides interactive knowledge graph visualizations, analytics and charting utilities to help you explore and understand your data.

Features

  • Document & web Q&A: ask questions based on your local documents, CSV/Excel spreadsheets, articles and web pages (ingest and store data in an SQLite database; your questions will be translated to SQL queries)
  • Direct Wikipedia Q&A: support for querying Wikipedia articles using the Wikipedia API
  • Active storage: uploaded files and scraped web content are persistently stored for fast future queries
  • Knowledge graph: automatically generate and display a knowledge graph based on your query results
  • Interactive analytics: visualize data trends with advanced charting features, explore statistics and cost estimates through Plotly dashboards
  • Huggingface QA demo: alternative Q&A pipeline using HF transformers with ensemble support
  • Extensible configuration: easily configure API keys, language models, cost estimation and more

Structure

.
├── .streamlit/
│   ├── config.toml
│   └── secrets.toml.sample
├── .vscode/
│   └── settings.json
├── components/
│   └── graph_vis.py
├── data/
│   ├── sample.csv
│   └── sample.xlsx
├── db/
│   └── gptdb.sqlite3
├── docs/
│   └── documentation.md
├── images/
│   └── logo.png
├── static/
│   ├── knowledge_graph.gv
│   └── knowledge_graph.png
├── storage/
├── src/
│   ├── __init__.py
│   ├── main.py
│   ├── config.py
│   ├── state_manager.py
│   ├── common.py
│   ├── prompts.py
│   ├── analytics.py
│   ├── web_scraper.py
│   ├── data_loader.py
│   ├── wikipedia_qa.py
│   ├── huggingface_qa.py
│   ├── llm_client.py
│   ├── query_data.py
│   ├── query_docs.py
│   ├── generate_knowledge_graph.py
│   └── visualization.py
├── tests/
│   ├── test_app_state.py # deprecated
│   ├── test_common.py # deprecated
│   ├── test_visualizations.py # deprecated
│   ├── test_data_loader.py
│   ├── test_web_scraper.py
│   ├── test_llm_client.py
│   └── test_query_data.py
├── debug/
│   └── streamlit_debug.py # deprecated
├── .gitignore
├── README.md
├── requirements.txt
└── run_app.sh

Dependencies

streamlit
langchain-experimental
llama-index
llama-cpp-python
sentence_transformers
weaviate-client
openai
sqlalchemy
debugpy
openpyxl
PyPDF2
pypdf
docx2txt
PyCryptodome
graphviz
networkx
beautifulsoup4
colorama
newspaper3k
htmldate
datefinder
retry
pandas
pytest
wikipedia
plotly
transformers
torch
huggingface_hub

Installation

  1. Clone:

    git clone git@github.com:avrtt/QASATIK.git
    cd QASATIK
  2. Create and activate a virtual environment (optional):

    python -m venv venv
    source venv/bin/activate # Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure secrets: copy .streamlit/secrets.toml.sample to .streamlit/secrets.toml and fill in your OpenAI and Weaviate API keys along with the deployment flag.

  5. Run:

    chmod +x run_app.sh
    ./run_app.sh

    You can also run the app using:

    streamlit run src/main.py --server.port=4010

Usage

Upload files or enter URLs in the "Document Q&A" tab. Click "Index Documents" to build the index and then ask questions to get answers and visualizations.

Toggle between interactive graph views, static graph images, or raw JSON of the knowledge graph.

Testing

Run the unit tests (from the project root):

pytest tests/

Contributing

PRs and issues are welcome.

License

MIT