This is a lightweight, research‑grade agent framework for building multi‑stage prompt pipelines with GPT‑5.
With this framework, we produced a 1st Place Weekly result besting all other agents in the first month of October and several 2nd Place results prior to that. The first place finish resulted in a 23% performance boost over GPT-5 Pro and an incredible 47% performance boost over Grok-4 Search.
This framework enables defining custom pipelines, sequential prompt stages, with custom tools. While this pipeline was developed for the FutureX competition, there is enough flexibility for defining pipelines for other tasks.
FutureX victories validated the core of my prompt engineering work.
The core of my global agentic prompt protocol is:
- A context generation stage before reasoning can boost reasoning capability. This stage uses minimal reasoning for both cost and as a result of my observation that minimal/no reasoning modes seem to have better "understanding". It has been my observation there is a reasoning/understanding bias/variance style trade-off when using reasoning models, i.e. stronger reasoning results in reduced understanding.
- A deep reasoning stage with specific instructions to iteratively surface uncertaintities and assess self-confidence as part of an iterative loop. It has been my observation that many seemingly disparate disconnected techniques in prompt engineering can be understood act as "uncertainty resolvers"-- uncertainty can be resolved via generation, tools, code, etc thus the concept of uncertatiny resolution can be thought of as a universal technique for boosting LLM performance, a central technique in intelligence.
- Optionally, a formatting stage for ensuring answer is in the proper context.
Additionally, I enhanced my protocol with access to custom tools:
- Polymarket for odds
- Deribit for market data
- Odds for sports
- Built-in search
Initial testing showed SOTA performance in reasoning and predicting certain kinds of events where custom tools helped. However, the leading other competitor, MiroFlow, scored higher in wide-search capability. After conducting an AI review of their project, I extended my agnetic pipeline with Serper API for wide search capabilities. It was my observation models may be search constrained.
In addition, I added multiple drafts to the first stage only, configured to 2 drafts in our winning competition. This simulates the kind of multiple drafts reasoning present in multi-draft models like GPT-5 Pro, Grok-4 Heavy, and the leading other competitor MiroFlow uses multi-draft consensus. Adopting a multi-draft consensus approach was considered but would have increased API costs-- by adopting the approach of multiple drafts only in the context generation state it minimized additional costs.
We achieved multiple best-in-class 2nd place results and a 1st place finish for the last week we tested it-- which was the first week with all capabilities working. My protocol is best in class in reasoning but the wide search capability of our agents is still rudimentary-- without extensive "sub-agent" style planning.
I hypothesize combining my Global Agentic Pipeline with MiroFlow improved wide search capabilities would result in a new overall SOTA result as my method already proven itself best-in-class reasoning but improved sub-agent style planning for wide search is hypothesized to produce SOTA results.
Unfortunately, costs weren't published so it is not possible to assess on performance to cost basis but running the pipeline costs in the range of 30-45 cents per query for OpenAI costs plus SERPER costs (trivial)
Of note, during technical testing of the pipeline, I discovered that GPT-5 would easily get confused with structured data in ranges when the ranges are out-of-order, example A 90-100, B 80-90, C 120-150. I added specific instructions to correct this. This is an area OpenAI may want to research for improvement.
Build and run multiple custom pipelines:
- Custom per‑stage model settings: choose model, reasoning effort, temperature, token limits, tool_choice, parallel tool calls, and timeouts.
- Pipelines authored with simple Markdown prompts per stage; define/swap pipelines in a few lines of Python or via a JSON POST to the API.
- Multi‑draft Stage 1 for stronger context plans (configurable via
STAGE1_DRAFT_COUNT). - Pluggable tools: built‑in web search, Serper/serp.dev stubs, Polymarket, Deribit, Odds API; plus add your own Python function tools with JSON schema.
- Global Reasoning pipelines offering enhanced reasoning
- OpenAI‑compatible endpoints and SSE streaming (with optional heartbeats) for easy client integration.
- Structured artifacts and logs in
logs/(stage_responses.jsonl,pipeline_trace.md), toggled byFILE_LOGGING_ENABLED. - Works offline for demos/tests (stubbed search) and deploys cleanly to Cloud Run.
- Agent runner for running against FutureX pipelines against cloud or local deployments. Agent runner can process many queries in parallel up to 7-10 have been tested but downstream APIs may throttle.
This agent framework was developed rapidly, as a means toward proving out my prompt engineering work aimed at boosting the capabilites of GPT-5 and other LLMs in the real-world FutureX event competition. Built for speed of prototpying: it’s not yet production‑hardened; review and harden before use in sensitive environments.
The codebase was AI generated for rapidly testing my ideas in the competition without extensive human review. While, the pipelines have worked for my puporses in the competition and was tested: code was not developed for production use but may nonetheless prove invaluable for indepedent researchers or as a local research agentic pipeline.
It is able to be deployed to Google Cloud for running workloads against it. However, it is not recommended for production deployment without a thorough review. It is recommended that a thorough code and security review be conducted before use and deployment.
As well, the technical documentation below was mostly AI generated and is assumed-to-be generally correct but hasn't been thoroughly human reviewed.
- Clone the repository and enter the workspace:
git clone <repository-url>cd global_agent_framework
- Create and activate a fresh Python 3.10+ virtual environment.
- Install requirements:
pip install -r requirements.txt - Create a
.envfile in the repo root and populate API keys (seedocs/ENVIRONMENT.md). - Run a smoke test:
python test_pipeline_simple.py - Launch the FastAPI server when ready:
python main.py
If the smoke test succeeds, hit http://localhost:8000/health or invoke /execute with the futuerex pipeline to verify an end-to-end run.
api.py— FastAPI surface exposing pipeline execution endpoints and OpenAI-compatible routes.config.py— Centralized configuration loader (env vars, logging switches, validation helpers).agent.py,agent_pipeline_declarations.py,pipeline.py— Core orchestration layer for agents, stage sequencing, and execution loop.prompt_pipelines/— Prompt templates for each stage; FutureX-tuned variants live underprompt_pipelines/futurerx/.tools/— Optional integrations (serper, serp.dev stubs, Deribit, Polymarket, Odds APIs) automatically registered when available.scripts/— Convenience commands for smoke testing, import validation, and pipeline demos.docs/— Showcase, deployment, and environment references.tests/— Automated checks covering pipelines, API surfaces, and tool adapters.logs/(generated) — Pipeline trace artifacts written when file logging is enabled; ignored by git.queries/— Intentionally empty folder reserved for private or competition-specific user prompts; keep this out of version control when sharing the framework.agent_framework/— Minimal shim package offering stable imports (e.g.,from agent_framework.pipeline import Pipeline).apps/agent_runner/— Runner app for batch processing FutureX datasets.
To keep the root tidy, tests and runner code are placed under tests/ and apps/agent_runner/. Existing top-level files remain for backward compatibility.
- Store any sensitive prompts or internal research questions inside the
queries/directory so they remain separate from distributable prompt pipelines. run_query_pipeline.pyand API clients can reference files fromqueries/by filename via a positional argument. Example CLI usage is shown below.- We do not ship sample queries; add your own Markdown or JSON templates locally.
- Unfortunately processing the pipeline via API is slow-- it can take 4 to 8 minutes per query.
Use the convenience script to run a local query file through a pipeline:
# Research pipeline (generalized research/synthesis)
python scripts/run_query_pipeline.py research queries/my_prompt.md
# FutureX pipeline (full production flow)
python scripts/run_query_pipeline.py prediction queries/my_prompt.md
# Stubbed FutureX pipeline (no external API calls)
python scripts/run_query_pipeline.py prediction_stub queries/my_prompt.md
# Optional: write to a specific output file
python scripts/run_query_pipeline.py research queries/my_prompt.md --output out.mdNotes:
- The first positional arg is the pipeline key; the second is the path or filename under
queries/. - Outputs are written to
responses/with per-stage artifacts; traces go tologs/whenFILE_LOGGING_ENABLED=true. - To use live web search, set
SERPER_API_KEYand ensureSERPER_USE_STUB=false(or unset). Back-compat:SERP_DEV_USE_STUBis also honored. - Authentication failures (e.g., invalid
OPENAI_API_KEY) cause the CLI to exit non‑zero with a clear error message.
Monorepo note:
- New projects in this repo can import via the shim, e.g.:
from agent_framework.agent_pipeline_declarations import create_prediction_pipeline- This avoids changing existing files while giving a stable package-style path.
- Install and authenticate the Google Cloud CLI:
gcloud auth login. - Set your project and preferred region (example
us-east1):gcloud config set project <PROJECT_ID> gcloud config set run/region us-east1
- Deploy from repo root, assigning an authenticated invoker:
gcloud run deploy agent-endpoint --source . --allow-unauthenticated --timeout 1200 --service-account <SERVICE_ACCOUNT_EMAIL>
- Alternatively, keep the service private by omitting
--allow-unauthenticatedand grantingroles/run.invokerto specific identities via:gcloud run services add-iam-policy-binding agent-endpoint ^ --member=serviceAccount:<INVOKER_SERVICE_ACCOUNT> ^ --role=roles/run.invoker
- Configure secrets in Cloud Run using
OPENAI_API_KEY, plus optional tooling keys (SERPER_API_KEY,SERP_DEV_API_KEY,ODDS_API_KEY, etc.). - Supply a
SERVER_API_KEYwhen exposing the FastAPI/executeendpoint and require clients to send it viaAuthorization: Bearer. - The deployed service exposes a URL; hit
/healthfor a basic check and use authenticated requests for/execute. - We previously attempted an Azure deployment, but outbound firewall timeouts blocked reliable execution.
Our global protocol runs as a three-stage loop, tuning reasoning effort and tool usage per phase.
- Multi-draft
context_generatoragent rephrases the task, enumerates unknowns, codifies first principles, and plans tool queries. - Outputs: task restatement, [U1]–[U7] unknowns, entailments, prioritized search plan, and tool-specific brainstorming.
- Objective: ensure downstream agents pursue high-signal evidence only.
evidence_builderprimes the pipeline with targeted serp.dev or serper.dev packs (stubbed offline when required).deep_researcherruns iterative research using web search, financial markets, and probabilistic tools until confidence ≥95% or bounded by uncertainty ceilings.- Confidence loop: after each tool call, the agent recalculates belief intervals, logs residual unknowns, and decides whether additional evidence is warranted.
- Benefits: measurable confidence, explicit gaps, and graceful handling of sparse or conflicting data.
formatteragent converts research artifacts into the requested deliverable (executive summary, boxed forecasts, risk notes).- Enforces tone, schema (JSON or Markdown), and compliance requirements demanded by the FutureX judging rubric.
- Produces reproducible outputs: methodology, evidence base, residual risks, and confidence rationale.
The deep research stage augments GPT reasoning with specialized tools:
- serper_search / serp_dev_search_stub — Bundled evidence builder supporting up to five curated queries per call, returning scraped snippets with metadata; stubbed mode enables offline competitions or audit trails.
- polymarket_gamma_get_odds — Real-time Polymarket odds (via Gamma API) for crowd-sourced forecast calibration; filters by query, market ID, or event ID.
- deribit_weekly_snapshot / ladder / inputs — Crypto options intelligence (BTC/ETH/SOL) delivering IV surfaces and strike ladders aligned with target resolution dates.
- odds_find — Sportsbook consensus probabilities with de-vigged pricing, used when FutureX tasks touched sports or entertainment outcomes.
- Built-in web_search — Recency-focused web instrumentation for authoritative sources; seamlessly blended with file search when internal corpora were available.
- Custom Function Hooks — Any Python callable can register via
ToolDefinition, enabling domain-specific calculators, regressors, or policy databases.
Tool selection is governed by the Stage 1 TOOL_PLAN, ensuring every call maps to a tracked unknown and yields structured artifacts for the confidence loop.
- Stage 1 logs capture the original query, unknown enumeration, and planned tool actions. These details are written to
logs/stage_responses.jsonlandlogs/pipeline_trace.mdwhen logging is enabled. - Before sharing logs externally, redact personally identifiable information, secrets, and sensitive market positions mentioned in queries or tool payloads.
- Clone the repository and enter the workspace:
git clone <repository-url>cd global_agent_framework
- Install Python 3.10+ dependencies:
pip install -r requirements.txt - Copy
.env.exampleand provide secrets:OPENAI_API_KEY(required)- Optional:
SERPER_API_KEY,SERP_DEV_API_KEY,ODDS_API_KEY
- Optional tooling flags:
SERP_DEV_USE_STUB=truefor offline serp.dev evidenceDEBUG=trueandLOG_VERBOSITY=infoto increase observability
- Duplicate
.env.exampleto.env. - Populate required secrets:
OPENAI_API_KEY
- Populate optional secrets as needed for extended tooling:
SERPER_API_KEYorSERP_DEV_API_KEYfor search providersODDS_API_KEYfor sportsbook integrationsPOLYMARKET_API_KEY,DERIBIT_CLIENT_ID,DERIBIT_CLIENT_SECRETif you enable those tools
- Set
SERVER_API_KEYwhen exposing the FastAPI server to external clients. - Avoid committing populated secrets files; rely on
.gitignoredefaults or your cloud platform's secret manager.
- OpenAI API (Required)
- Purpose: Powers GPT-5 reasoning agents across all pipeline stages
- Obtain key: https://platform.openai.com/api-keys
- Cost: ~$0.30-0.45 per query for full FutureX pipeline
-
Serper.dev (Optional - Web Search)
- Purpose: Multi-query web search with scraped content for evidence gathering
- Obtain key: https://serper.dev (free tier: 2,500 searches/month)
- Set:
SERPER_API_KEYin.env
-
The Odds API (Optional - Sports Betting Data)
- Purpose: Sportsbook consensus probabilities with de-vigged pricing
- Obtain key: https://the-odds-api.com (free tier: 500 requests/month)
- Set:
ODDS_API_KEYin.env
-
Polymarket Gamma API (Optional - Prediction Markets)
- Purpose: Real-time prediction market odds for crowd-sourced forecasts
- No key required (public API)
-
Deribit API (Optional - Crypto Options)
- Purpose: BTC/ETH/SOL options data and implied volatility surfaces
- No key required for public endpoints
Set SERP_DEV_USE_STUB=true to use offline search stubs instead of live APIs during development.
- When
FILE_LOGGING_ENABLED=true(default), the framework emits structured artifacts:logs/stage_responses.jsonl— JSON entries for each stage request/response, including user queries and tool payloads.logs/pipeline_trace.md— Human-readable Markdown trace summarizing stage progress.pipeline_test_results.logandtest_results.json— Aggregated outcomes from integration tests.
- Disable local artifact logging by setting
FILE_LOGGING_ENABLED=falsein.envor your deployment settings. This is recommended for sensitive workloads or regulated data. - Hosted environments (FastAPI/Uvicorn, Cloud Run) may capture full request bodies in access logs. Always use HTTPS clients and avoid embedding secrets in prompt text.
- Protect
/executeand related endpoints withSERVER_API_KEY(Bearer auth) and, for Cloud Run, restrict to approved identities (roles/run.invoker). - Review and purge log directories routinely, or forward them to a managed storage solution with strict IAM controls.
futuerex(production) — CLI key:prediction— Full FutureX pipeline: multi-draft context → evidence builder → deep researcher → formatter; uses live tools when keys are present.futuerex_stub(offline) — CLI key:prediction_stub— Mirrors production flow but relies solely on baked serp.dev stubs and deterministic responses; ideal for demos, scoring sandboxes, or CI.research_pipeline(generalized) — CLI key:research— Context planning and synthesis pipeline for non-forecast research deliverables.- API Server Mode — Launch
python main.pyto expose FastAPI endpoints (/pipelines,/execute,/health). - Direct Test Mode — From repo root, run:
python -m pytest -qorpython tests/test_pipeline_simple.py. - Monorepo Runner — Use
apps/agent_runner/or create additional apps underapps/that import fromagent_framework.*without modifying core files.
- Local — Start the FastAPI server (
python main.py), then invoke/v1/chat/completionsor/executewithSERVER_API_KEYset. Use PowerShell or shell environment variables perDEPLOYMENT.md. - Cloud Run (Google Cloud) — Authenticate via
gcloud, then deploy withgcloud run deploy agent-endpoint --source . --region us-east1 --allow-unauthenticated --timeout 1200. Ensure environment variables and timeouts cover research stages. - Monitoring & Logs — Check
pipeline_test_results.log,test_results.json, and HTTP logs for stage durations (Stage 1 ≈8s, Stage 2 ≈32s, Stage 3 ≈5s at FutureX settings).
With this protocol, the Agent Framework continues to deliver competition-grade predictions, combining structured planning, tool-augmented reasoning, and rigorous finalization for high-stakes forecasting.
A minimal, standalone runner is included under apps/agent_runner to batch‑process the public FutureX dataset end‑to‑end. It supports three modes:
- stub-local: no HTTP, deterministic boxed answers for testing
- stub-http: call a tiny local stub server that mimics OpenAI
- real: call a real OpenAI‑compatible endpoint (this repo’s FastAPI
/v1/chat/completions)
Quick start:
cd apps/agent_runner
python -m venv .venv; .\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
# Preview the dataset
python cli.py verify-data --head 3
# Run a short local stub batch
python cli.py run --mode stub-local --limit 3
# Option A: Run against the local API from this repo
# In another terminal (repo root):
# python main.py
# Then from agent_runner:
python cli.py run --mode stub-http --limit 3 # or real with endpoint+key
# Real endpoint (example)
$env:AGENT_RUNNER_ENDPOINT = "http://127.0.0.1:8000/v1/chat/completions"
$env:AGENT_RUNNER_API_KEY = "<your_server_api_key_if_enabled>"
python cli.py run --mode real --limit 3Outputs are written incrementally to apps/agent_runner/out/futurex_<commitsha>/:
predictions.jsonl— one{id, prediction}JSON per row (prediction is the TeX\boxed{...}content)manifest.json— run metadata and file pointerslogs/— per‑request HTTP logs (stub‑http/real modes)
Configuration can be passed via flags or environment variables; copy apps/agent_runner/demo.env to .env and set:
AGENT_RUNNER_ENDPOINT=
AGENT_RUNNER_API_KEY=
Note I encourage and welcome additional testing, research, and adoption of my prompting engineering techniques. I received no funding, OpenAI credits, or support in development and testing of these protocols.
As such, I simply ask for attribution when using the Global Agentic Protocol prompts, deriviatives, or this pipeline in research, papers, demos, or production derivatives, please provide clear attribution to Curtis White and the Global Agent Framework FutureX project alongside any published results.
MIT License
Copyright (c) 2025 Curtis White
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. NOT RESPONSIBLE FOR ANY LOSSES.
When using the Global Agentic Protocol prompts or this pipeline in research, demos, or production derivatives, please provide clear attribution to Curtis White and the Global Agent Framework FutureX project alongside any published results.
