DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

简体中文 |Discord |Documents|Wechat|Community

What is DB-GPT?

DB-GPT is an open-source framework for large models in the databases fields. It's purpose is to build infrastructure for the domain of large models, making it easier and more convenient to develop applications around databases. By developing various technical capabilities such as:

SMMF(Service-oriented Multi-model Management Framework)
Text2SQL Fine-tuning
RAG(Retrieval Augmented Generation) framework and optimization
Data-Driven Agents framework collaboration
GBI(Generative Business intelligence)

etc, DB-GPT simplifies the construction of large model applications based on databases.

In the era of Data 3.0, enterprises and developers can build their own customized applications with less code, leveraging models and databases.

Demo

Run on an RTX 4090 GPU.

Chat Excel

Install

Usage Tutorial

Install
How to Use
How to Deploy LLM
How to Debug
FAQ

Features

Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:

Private Domain Q&A & Data Processing

The DB-GPT project offers a range of features to enhance knowledge base construction and enable efficient storage and retrieval of both structured and unstructured data. These include built-in support for uploading multiple file formats, the ability to integrate plug-ins for custom data extraction, and unified vector storage and retrieval capabilities for managing large volumes of information.
Multi-Data Source & GBI(Generative Business intelligence)

The DB-GPT project enables seamless natural language interaction with various data sources, including Excel, databases, and data warehouses. It facilitates effortless querying and retrieval of information from these sources, allowing users to engage in intuitive conversations and obtain insights. Additionally, DB-GPT supports the generation of analysis reports, providing users with valuable summaries and interpretations of the data.
Multi-Agents&Plugins

It supports custom plug-ins to perform tasks, natively supports the Auto-GPT plug-in model, and the Agents protocol adopts the Agent Protocol standard.
Automated Fine-tuning text2SQL

An automated fine-tuning lightweight framework built around large language models, Text2SQL data sets, LoRA/QLoRA/Pturning, and other fine-tuning methods, making TextSQL fine-tuning as convenient as an assembly line. DB-GPT-Hub
SMMF(Service-oriented Multi-model Management Framework)

Massive model support, including dozens of large language models such as open source and API agents. Such as LLaMA/LLaMA2, Baichuan, ChatGLM, Wenxin, Tongyi, Zhipu, etc.
- Vicuna
- vicuna-13b-v1.5
- LLama2
- baichuan2-13b
- baichuan-7B
- chatglm-6b
- chatglm2-6b
- falcon-40b
- internlm-chat-7b
- Qwen-7B-Chat/Qwen-14B-Chat
- Support API Proxy LLMs
  - ChatGPT
  - Tongyi
  - Wenxin
  - ChatGLM
Privacy and Security

The privacy and security of data are ensured through various technologies, such as privatized large models and proxy desensitization.
Support Datasources

DataSource	support	Notes
MySQL	Yes
PostgreSQL	Yes
Spark	Yes
DuckDB	Yes
Sqlite	Yes
MSSQL	Yes
ClickHouse	Yes
Oracle	No	TODO
Redis	No	TODO
MongoDB	No	TODO
HBase	No	TODO
Doris	No	TODO
DB2	No	TODO
Couchbase	No	TODO
Elasticsearch	No	TODO
OceanBase	No	TODO
TiDB	No	TODO
StarRocks	No	TODO

Introduction

The architecture of the entire DB-GPT is shown.

The core capabilities mainly consist of the following parts:

Multi-Models: Support multi-LLMs, such as LLaMA/LLaMA2、CodeLLaMA、ChatGLM, QWen、Vicuna and proxy model ChatGPT、Baichuan、tongyi、wenxin etc
Knowledge-Based QA: You can perform high-quality intelligent Q&A based on local documents such as PDF, word, excel, and other data.
Embedding: Unified data vector storage and indexing, Embed data as vectors and store them in vector databases, providing content similarity search.
Multi-Datasources: Used to connect different modules and data sources to achieve data flow and interaction.
Multi-Agents: Provides Agent and plugin mechanisms, allowing users to customize and enhance the system's behavior.
Privacy & Secure: You can be assured that there is no risk of data leakage, and your data is 100% private and secure.
Text2SQL: We enhance the Text-to-SQL performance by applying Supervised Fine-Tuning (SFT) on large language models

SubModule

DB-GPT-Hub Text-to-SQL performance by applying Supervised Fine-Tuning (SFT) on large language models.
DB-GPT-Plugins DB-GPT Plugins Can run autogpt plugin directly
DB-GPT-Web ChatUI for DB-GPT

Image

🌐 AutoDL Image

Language Switching

In the .env configuration file, modify the LANGUAGE parameter to switch to different languages. The default is English (Chinese: zh, English: en, other languages to be added later).

Contribution

Please run black . before submitting the code. Contributing guidelines, how to contribute

RoadMap

KBQA RAG optimization

Multi Documents
- PDF
- Excel, CSV
- Word
- Text
- MarkDown
- Code
- Images
RAG
Graph Database
- Neo4j Graph
- Nebula Graph
Multi-Vector Database
- Chroma
- Milvus
- Weaviate
- PGVector
- Elasticsearch
- ClickHouse
- Faiss
Testing and Evaluation Capability Building
- Knowledge QA datasets
- Question collection [easy, medium, hard]:
- Scoring mechanism
- Testing and evaluation using Excel + DB datasets

Multi Datasource Support

Multi Datasource Support
- MySQL
- PostgreSQL
- Spark
- DuckDB
- Sqlite
- MSSQL
- ClickHouse
- Oracle
- Redis
- MongoDB
- HBase
- Doris
- DB2
- Couchbase
- Elasticsearch
- OceanBase
- TiDB
- StarRocks

Multi-Models And vLLM

Cluster Deployment
Fastchat Support
vLLM Support
Cloud-native environment and support for Ray environment
Service Registry(eg:nacos)
Compatibility with OpenAI's interfaces
Expansion and optimization of embedding models

Agents market and Plugins

multi-agents framework
custom plugin development
plugin market
Integration with CoT
Enrich plugin sample library
Support for AutoGPT protocol
Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards

Cost and Observability

debugging
Observability
cost & budgets

Text2SQL Finetune

support llms
- LLaMA
- LLaMA-2
- BLOOM
- BLOOMZ
- Falcon
- Baichuan
- Baichuan2
- InternLM
- Qwen
- XVERSE
- ChatGLM2
SFT Accuracy As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!

More Information about Text2SQL finetune

Licence

The MIT License (MIT)

Contact Information

We are working on building a community, if you have any ideas for building the community, feel free to contact us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

What is DB-GPT?

Contents

Demo

Chat Excel

Install

Features

Introduction

SubModule

Image

Language Switching

Contribution

RoadMap

KBQA RAG optimization

Multi Datasource Support

Multi-Models And vLLM

Agents market and Plugins

Cost and Observability

Text2SQL Finetune

Licence

Contact Information

Files

README.md

Latest commit

History

README.md

File metadata and controls

DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

What is DB-GPT?

Contents

Demo

Chat Excel

Install

Features

Introduction

SubModule

Image

Language Switching

Contribution

RoadMap

KBQA RAG optimization

Multi Datasource Support

Multi-Models And vLLM

Agents market and Plugins

Cost and Observability

Text2SQL Finetune

Licence

Contact Information