ERNIEBot Researcher

中文｜ English

ERNIEBot Researcher

The web page shown in the figure below is used for topic research. Users can input keywords or natural language sentences. The backend searches for relevant content based on the given literature, and then uses the ERNIE to generate a research report.

The download link for the generated report：Report download

ERNIEBot Researcher is an Autonomous Agent designed to conduct comprehensive online research for various tasks. It can carefully compile detailed, authentic, and unbiased Chinese research reports, while providing deep customization services for specific resources, structured outlines, and valuable experiences and lessons as needed. Drawing on the essence of the recently notable Plan-and-Solve technology, and combining the advantages of the currently popular RAG technology. ERNIEBot Researcher effectively overcomes challenges such as speed bottlenecks, decision certainty, and reliability of results through multi-agent collaboration and efficient parallel processing mechanisms.

Why do we need ERNIEBot Researcher?

Forming objective conclusions through manual research tasks can be time-consuming, sometimes taking weeks to find the correct resources and information.
Current LLMs (Large Language Models) are trained on past and outdated information, which carries a high risk of generating hallucinations, making the produced reports almost irrelevant to the research tasks.
Reports generated by LLMs generally do not include paragraph-level or sentence-level citations of literature sources, making the generated content difficult to trace and verify.

News and Updates

On April 4, 2024, ERNIEBot Researcher was released, supporting ERNIEBot and ChatGPT in completing research tasks, as well as supporting OpenAI Embedding and ERNIE-Embedding.

Architecture

The main idea is to operate "planner" and "execution" agents. The planner generates questions for research, and the execution agents seek the most relevant information for each generated research question. Finally, the planner filters and aggregates all relevant information and creates a research report.

Agents utilize ERNIE-4.0 and ERNIE-LongText to complete research tasks. ERNIE-4.0 is primarily used for decision-making and planning, while ERNIE-LongText is mainly used for writing reports.

Application Features

Create domain-specific agents based on research queries or tasks.
Generate a diverse set of research questions based on the content of the existing knowledge base, which collectively form an objective opinion on any given task.
For each research question, select information from the knowledge base that is relevant to the given question.
Filter and aggregate all information sources and generate the final research report.
Multiple report agents generate reports in parallel while maintaining a certain level of diversity.
Use chain-of-thought techniques to evaluate and rank multiple reports, overcoming pseudo-randomness, and selecting the optimal report.
Revise and refine the report using a reflection mechanism.
Verify facts using retrieval-augmented techniques and chain of verification.
Enhance the overall readability of the report using a polishing mechanism, integrating more detailed descriptions.

Note

Generating a report takes more than 10 minutes, and the more research agents are set up, the longer it takes, consuming a large number of tokens.
The quality of the generated report is related to the quality of the documents input into the application. It is suitable for scenarios such as web pages, journals, and corporate office documents, but not suitable for scenarios with less text and excessive useless information in the documents.

Quick Start

Step 1: Download the project source code

git https://github.com/PaddlePaddle/ERNIE-SDK.git
cd ernie-agent/applications/erniebot_researcher

Step 2: Install dependencies

pip install -r requirements.txt

If the above command fails, please run the following command:

conda create -n researcher39 -y python=3.9 && conda activate researcher39
pip install -r requirements.txt

Instal ernie-agent from source code:

cd ernie-agent
pip install -e .

Step 3: Download Chinese fonts

wget https://paddlenlp.bj.bcebos.com/pipelines/fonts/SimSun.ttf

Step 4: Build the document index

Support for two vector types: azure openai_embedding and ernie_embedding. For ernie-embedding, you need to register and log in to an account on the AI Studio Galaxy Community, then obtain the Access Token from the Access Token page on AI Studio, and finally set the environment variable:

export EB_AGENT_ACCESS_TOKEN=<aistudio-access-token>
export AISTUDIO_ACCESS_TOKEN=<aistudio-access-token>
export EB_AGENT_LOGGING_LEVEL=INFO

To set up Azure OpenAI embedding, you need to configure the relevant OpenAI environment variables.

export AZURE_OPENAI_ENDPOINT="<your azure openai endpoint>"
export AZURE_OPENAI_API_KEY="<your azure openai api key>"

We support file formats such as docx, pdf, and txt. Users can place these files in the same folder and then run the following command to create an index. Subsequent reports will be generated based on these files.

For convenience in testing, we provide sample data. Sample data:

wget https://paddlenlp.bj.bcebos.com/pipelines/erniebot_researcher_example.tar.gz
tar xvf erniebot_researcher_example.tar.gz

URL Data: If users have URLs corresponding to their files, they can provide a txt file containing these URLs. In the txt file, each line should store the URL link and the corresponding file path, for example:

https://zhuanlan.zhihu.com/p/659457816 erniebot_researcher_example/Ai_Agent的起源.md

If the user does not provide a URL file, the default file path will be used as the URL link.

Abstract Data: Users can use the path_abstract parameter to provide the storage path of the abstracts corresponding to their files. The abstracts need to be stored in a JSON file. The JSON file contains multiple dictionaries, and each dictionary has three key-value pairs.

page_content : str, file abstract.
url : str, file URL link.
name : str, file name.

For example,

[{"page_content":"文件摘要","url":"https://zhuanlan.zhihu.com/p/659457816","name":Ai_Agent的起源},
...]

If the user does not provide an abstract path, there is no need to change the default value of path_abstract. We will use ernie-4.0 to automatically generate the abstracts, and the generated abstracts will be stored in abstract.json.

Next, run:

python ./tools/preprocessing.py \
--index_name_full_text <the index name of your full text> \
--index_name_abstract <the index name of your abstract text> \
--path_full_text <the folder path of your full text> \
--url_path <the path of your url text> \
--path_abstract <the json path of your abstract text>

Step 5: Run

python demo.py --num_research_agent 1 \
                                --index_name_full_text <your full text> \
                                --index_name_abstract <your abstract text>

index_name_full_text: Path to the full-text knowledge base index
index_name_abstract: Path to the abstract knowledge base index
index_name_citation: Path to the citation index
num_research_agent: Number of agents generating the report
iterations: Number of reflection iterations
chatbot: Type of LLM, currently supports erniebot and chatgpt
report_type: Type of report, currently supports research_report
embedding_type: Type of embedding used, currently supports ernie_embedding and openai_embedding (azure)
save_path:Path to save the report
server_name: IP address of the web UI
server_port: Port number of the web UI
log_path: Path to save the logs
use_ui: Whether to use the web UI
use_reflection: Whether to use the reflection process
fact_checking:Whether to use the fact-checking process
framework: Underlying framework, currently supports langchain

Reference

[1] Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, Ee-Peng Lim: Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. ACL (1) 2023: 2609-2634

[2] Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, Zhaochun Ren: Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. EMNLP 2023: 14918-14937

❤️ Acknowledge

We learn form the excellent framework design of Assaf Elovic GPT Researcher, and we would like to express our thanks to the authors of GPT Researcher and their open source community.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
tools		tools
.flake8		.flake8
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CH.md		README_CH.md
chatgpt.py		chatgpt.py
demo.py		demo.py
editor_actor_agent.py		editor_actor_agent.py
fact_check_agent.py		fact_check_agent.py
group_agent.py		group_agent.py
polish_agent.py		polish_agent.py
ranking_agent.py		ranking_agent.py
requirements.txt		requirements.txt
research_agent.py		research_agent.py
research_team.py		research_team.py
reviser_actor_agent.py		reviser_actor_agent.py
user_proxy_agent.py		user_proxy_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ERNIEBot Researcher

Why do we need ERNIEBot Researcher?

News and Updates

Architecture

Application Features

Quick Start

Reference

❤️ Acknowledge

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

w5688414/ERNIEBot-Researcher

Folders and files

Latest commit

History

Repository files navigation

ERNIEBot Researcher

Why do we need ERNIEBot Researcher?

News and Updates

Architecture

Application Features

Quick Start

Reference

❤️ Acknowledge

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages