Integrating the Re-prompting pipeline into the SDK #70

ashah-aanya · 2025-07-30T00:42:03Z

This PR introduces a configurable, framework‑agnostic re‑prompting pipeline for iteratively improving LLM responses using AIMon’s detectors. The pipeline evaluates model outputs for instruction adherence, groundedness, and toxicity, and automatically generates corrective prompts until predefined stopping conditions (max iterations, latency limit, or adherence achieved) are met. This enables developers to wrap any black‑box LLM in an automated feedback loop to improve response instruction adherence by an average of 22%!

Here is the design doc for this API: Integrating Re-prompting Pipeline into the Python SDK
Here is a blogpost that details the pipeline and experimental results:Re-Prompting: A Smarter Loop for Smarter Models

Key Features & Highlights:

Configurable API: RepromptingConfig lets users define max iterations, latency limits, telemetry inclusion, and more.
Telemetry: TelemetryLogger tracks per‑iteration scores, failed instructions, stop reasons, and cumulative latency.
Corrective Prompts: Reprompter generates targeted feedback‑driven prompts template based on AIMon detector results (instruction adherence, groundedness, and toxicity). The return value is type Template and enables the user to substitute values in their llm_fn and thus have flexibility over reprompt template and LLM calls.
LLM Agnostic: Users provide their own LLM function (e.g., OpenAI, Anthropic, Together AI).
Retrieval‑Independent: Users can pass in any context string (they can independently retrieve context from any retrieval frameworks like LangChain or LlamaIndex, convert it to a string, and pass it into the pipeline).

Files Added:

runner.py: High‑level run_reprompting_pipeline() function.
pipeline.py: RepromptingPipeline orchestrates the iterative workflow, including retries and stop‑condition checks.
config.py: RepromptingConfig class for setting key pipeline parameters (iterations, latency limits, telemetry).
telemetry.py: TelemetryLogger for structured, per‑iteration logging.
reprompter.py: Reprompter for constructing corrective prompts with detailed feedback.
utils.py: Extracts failed instructions, counts violations, and computes residual error scores.

Testers:

test_reprompting_cases.py: tests different facets and configs for the pipeline. Passes if generated_text exists in the returned dictionary, but print statements can show the progression of reprompting over iterations. It tests the following cases with various contexts and query types:

Pipeline runs with latency limits (100ms & 5000ms).
Iteration limit enforced.
Works with empty context/instructions.
Return_telemetry = False properly excludes telemetry from the final dictionary
Test_reprompting_success.py: sample successful test run with llm_fn calling Mistral 7B instruct, provided system_prompt, context, instructions, and user_query, and telemetry returned.
Test_reprompting_failures.py: tests failures modes. Each test case should trigger an error.
Empty query.
Invalid AIMon API key.
llm_fn failures (raises exception, wrong return type, or None).
Implementation guide Colab notebook: https://colab.research.google.com/drive/1s6lup_4_v2YpE2vlPz-fkP6_BLloKouj?usp=sharing

Key questions:

I couldn't find the exponential backoff based retry in the sdk but I still tried to implement it independently. Does that look okay?
ongoing discussions about llm_fn implementation and UX

…contexts

…k). updating residual error score calculation to have no penalty for adhereed instructions (follow probability >= 0.5) as per Alex's recommendation. also updated the description of a test case

… prompt.

… construct an example reprompting flow.

re_prompting_pipeline_demo.ipynb

aimon/reprompting_api/utils.py

aimon/reprompting_api/pipeline.py

aimon/reprompting_api/tests/_init_.py

Deleting Colab notebook demo as it is already linked elsewhere

…olving typos

…ree/main/tests

…or to safely call the user-provided LLM function and the AIMon Detect function. Re-raises last encountered exception upon failure. Also handled 0 division error in get_penalized_average by returning -1 if the list of follow probabilities was empty and added this info to documentation

…se for pytest

pjoshi30

Looks good to me! Looking forward to shipping this.

ashah-aanya added 2 commits July 29, 2025 17:03

Integrating the Re-prompting pipeline into the SDK

a77e81f

updating tests_reprompting_cases.py to contain different queries and …

904320e

…contexts

ashah-aanya requested a review from pjoshi30 July 30, 2025 05:15

ashah-aanya and others added 3 commits July 30, 2025 10:50

cleaning up toxicity visbility in telemetry (scores, response_feedbac…

e7a4410

…k). updating residual error score calculation to have no penalty for adhereed instructions (follow probability >= 0.5) as per Alex's recommendation. also updated the description of a test case

updating the success test case to include complete context and system…

dc068fd

… prompt.

Adding a Colab notebook to guide re-promping pipeline integration and…

c2bac4b

… construct an example reprompting flow.

ashah-aanya marked this pull request as ready for review July 30, 2025 20:31