Skip to content

Commit

Permalink
rev
Browse files Browse the repository at this point in the history
  • Loading branch information
Edouard-Legoupil committed May 23, 2024
1 parent 252523b commit 786807f
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ <h2 class="anchored" data-anchor-id="executive-summary">Executive Summary</h2>
</div>
</div>
<p>Retrieval-augmented generation (RAG) is an AI Question-Answering framework that <a href="https://arxiv.org/pdf/2005.11401">surfaced in 2020</a> and that synergizes the capabilities of Large Language Models (LLMs) and information retrieval systems from specific domain of expertise (hereafter “evaluation reports”). This paper is presenting the challenges and opportunities associated with this approach in the context of evaluation. It then suggests a potential solution and way forward.</p>
<p>First, we explain how to create an initial <a href="https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_recursivecharactertext_bert.docx" target="_blank">two-pagers evaluation brief</a> using an orchestration of functions and models from <a href="https://huggingface.co/docs/hub/index" target="_blank">Hugging Face Hub</a>. Rather than relying on ad-hoc user interactions through a <em>black-box point &amp; click</em> chat interface, a relevant alternative is to use a data science approach with documented and <strong>reproducible scripts</strong> that can directly output a word document. The same approach could actually be applied to other textual analysis needs, for instance: extracting causal chains from the transcriptions of Focus Group Discussions, performing Quality Assurance review on key document, generating potential theories of change from needs assessment reports or assessing sufficient usage of programmatic evidence when developing Strategic Plan for Operation.</p>
<p>First, we explain how to create an initial <a href="https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_recursivecharactertext_bert.docx" target="_blank">two-pagers evaluation brief</a> using an orchestration of functions and models from <a href="https://huggingface.co/docs/hub/index" target="_blank">Hugging Face Hub</a>. Rather than relying on ad-hoc user interactions through a <em>black-box point &amp; click</em> chat interface, a relevant alternative is to use a data science approach with documented and <strong>reproducible scripts</strong> that can directly output a word document. The same approach could actually be applied to other textual analysis needs, for instance: extracting causal chains from the transcriptions of Focus Group Discussions, performing Quality Assurance review on key documents, generating potential theories of change from needs assessment reports or assessing sufficient usage of programmatic evidence when developing Strategic Plan for Operation.</p>
<p>Second, we review the techniques that can be used to <strong>evaluate the performance</strong> of summarisation scripts both to optimize them but also to minimize the risk of AI hallucinations. We generate alternative briefs (<a href="https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_mmr_recursivecharactertext_bge.docx" target="_blank">#2</a>, <a href="https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_parent_recursivecharactertext_bge.docx" target="_blank">#3</a>, <a href="https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_ensemble_recursivecharactertext_bge.docx" target="_blank">#4</a>) and then create an specific <a href="https://github.com/Edouard-Legoupil/rag_extraction/tree/main/dataset" target="_blank">test dataset</a> to explore the different metrics that can be used to evaluate the information retrieval process.</p>
<p>Last we discuss how such approach can actually inform decisions and strategies for an <strong>efficient AI deployment</strong>: While improving RAG pipeline is the first important step, creating training dataset with human-in-the-loop allows to “ground truth” and “fine-tune” an existing model. This not only further increase its performance and but also ensure its reliability both for evidence retrieval and at latter stage for learning across systems and contexts.</p>
<p>A short presentation is also <a href="https://edouard-legoupil.github.io/rag_extraction/prez/prez.html" target="_blank">available here</a></p>
Expand Down
2 changes: 1 addition & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The key deliverable from an evaluation is usually a long report (*often over 60

Retrieval-augmented generation (RAG) is an AI Question-Answering framework that [surfaced in 2020](https://arxiv.org/pdf/2005.11401) and that synergizes the capabilities of Large Language Models (LLMs) and information retrieval systems from specific domain of expertise (hereafter "evaluation reports"). This paper is presenting the challenges and opportunities associated with this approach in the context of evaluation. It then suggests a potential solution and way forward.

First, we explain how to create an initial [two-pagers evaluation brief](https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_recursivecharactertext_bert.docx){target="_blank"} using an orchestration of functions and models from [Hugging Face Hub](https://huggingface.co/docs/hub/index){target="_blank"}. Rather than relying on ad-hoc user interactions through a *black-box point & click* chat interface, a relevant alternative is to use a data science approach with documented and **reproducible scripts** that can directly output a word document. The same approach could actually be applied to other textual analysis needs, for instance: extracting causal chains from the transcriptions of Focus Group Discussions, performing Quality Assurance review on key document, generating potential theories of change from needs assessment reports or assessing sufficient usage of programmatic evidence when developing Strategic Plan for Operation.
First, we explain how to create an initial [two-pagers evaluation brief](https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_recursivecharactertext_bert.docx){target="_blank"} using an orchestration of functions and models from [Hugging Face Hub](https://huggingface.co/docs/hub/index){target="_blank"}. Rather than relying on ad-hoc user interactions through a *black-box point & click* chat interface, a relevant alternative is to use a data science approach with documented and **reproducible scripts** that can directly output a word document. The same approach could actually be applied to other textual analysis needs, for instance: extracting causal chains from the transcriptions of Focus Group Discussions, performing Quality Assurance review on key documents, generating potential theories of change from needs assessment reports or assessing sufficient usage of programmatic evidence when developing Strategic Plan for Operation.

Second, we review the techniques that can be used to **evaluate the performance** of summarisation scripts both to optimize them but also to minimize the risk of AI hallucinations. We generate alternative briefs ([#2](https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_mmr_recursivecharactertext_bge.docx){target="_blank"}, [#3](https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_parent_recursivecharactertext_bge.docx){target="_blank"}, [#4](https://github.com/Edouard-Legoupil/rag_extraction/raw/main/generated/Evaluation_Brief_response_ensemble_recursivecharactertext_bge.docx){target="_blank"}) and then create an specific [test dataset](https://github.com/Edouard-Legoupil/rag_extraction/tree/main/dataset){target="_blank"} to explore the different metrics that can be used to evaluate the information retrieval process.

Expand Down

0 comments on commit 786807f

Please sign in to comment.