Analysis of Arize Phoenix in RAG Systems

Project Description

The project focuses on evaluating the performance of Retrieval Augmented Generation (RAG) systems using the Arize Phoenix platform. The goal was to monitor, optimize, and analyze the effectiveness of a question-answering (QA) system by applying various text segmentation techniques, metadata extraction methods, answer generation engines, and reranking models.

System Architecture

The system is built on three key components:

LlamaIndex – a framework for implementing RAG systems,
RAGAS – a library for generating test sets and evaluation metrics,
Arize Phoenix – a real-time monitoring platform enabling the analysis of data processing pipelines.

The input data consisted of 12 PDF files (205 pages) containing information about the Polish-Japanese Academy of Information Technology. To protect privacy, sensitive data was removed.

Verification Procedure

Test Set: 20 queries were generated (including simple, multi-context, and reasoning-based questions) using the GPT-4o-mini model.
Evaluation: Metrics such as context precision, faithfulness, and answer relevance were used.
Monitoring: Arize Phoenix was employed to track query processing and response time.

Experiments and Results

1. Text Segmentation

Best method: Token Splitter – achieved the highest context precision (95%) with good performance.
Hierarchical Node Parser – had the shortest response time but lower context accuracy (76%).

2. Metadata Extraction

Enhancing metadata improved faithfulness (from 97% to 98%) and answer relevance (from 90% to 92%) but slightly reduced context precision (from 95% to 93%).

3. Answer Generation Engines

SubQuestionQueryEngine outperformed HyDE in speed and response relevance, particularly for reasoning-based queries.

4. Reranking Models

bge-reranker-large provided better context precision (100%) and reduced response time for reasoning tasks.

Conclusion

Arize Phoenix enabled an in-depth analysis of RAG processes and optimization of the QA system.
Best configuration: Token Splitter, metadata extraction, SubQuestionQueryEngine, and bge-reranker-large.
The results indicate that real-time monitoring is crucial for identifying and improving the efficiency of RAG systems.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
content		content
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of Arize Phoenix in RAG Systems

Project Description

System Architecture

Verification Procedure

Experiments and Results

1. Text Segmentation

2. Metadata Extraction

3. Answer Generation Engines

4. Reranking Models

Conclusion

About

Releases

Packages

Languages

ZG3Z/pjatk_rag

Folders and files

Latest commit

History

Repository files navigation

Analysis of Arize Phoenix in RAG Systems

Project Description

System Architecture

Verification Procedure

Experiments and Results

1. Text Segmentation

2. Metadata Extraction

3. Answer Generation Engines

4. Reranking Models

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages