Skip to content

Conversation

@qanagattandyr
Copy link
Contributor

Summary

Fixes #100: pdd Setup should use llm_invoke and give access to all models

This PR refactors the pdd setup command to use llm_invoke for API key testing and removes hardcoded limitations that only supported 3 providers (OpenAI, Google, Anthropic). Now the setup tool dynamically discovers and supports all LLM providers configured in llm_model.csv.

Changes Made

setup_tool.py - edited
tests/test_setup_tool.py - added tests

Core Improvements

  • Replaced hardcoded API testing with llm_invoke

    • Removed test_openai_key(), test_google_key(), and test_anthropic_key() functions
    • Added new test_api_key_with_llm_invoke() that works with any provider
    • Uses the same LLM invocation mechanism as the rest of PDD
  • Dynamic provider discovery

    • get_csv_variable_names() now reads all unique API keys from CSV (not just 3 hardcoded ones)
    • discover_api_keys() checks environment for all providers dynamically
    • No longer limited to OpenAI, Google, and Anthropic
  • Removed model filtering

    • save_configuration() no longer filters models by hardcoded prefixes (gpt-*, gemini/*, anthropic/*)
    • All models from llm_model.csv are now included if their API key is available
    • Users get access to Fireworks, Groq, Vertex AI, and all other configured providers

Additional Changes

  • Updated help text to mention all supported providers (Fireworks, Groq, Vertex AI, etc.)
  • Removed requests dependency (no longer needed since we use llm_invoke)
  • Added comprehensive test suite (test_setup_tool.py) with 14 tests covering all changes

Testing

All tests pass (14/14):

  • Tests verify llm_invoke is used for API key validation
  • Tests confirm all providers from CSV are discovered (not just 3)
  • Tests ensure no hardcoded model filtering
  • Regression tests prevent the original issue from recurring

@qanagattandyr qanagattandyr marked this pull request as ready for review November 10, 2025 00:37
@gltanaka gltanaka requested a review from Copilot November 10, 2025 17:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the pdd setup command to support all LLM providers dynamically rather than being limited to three hardcoded providers (OpenAI, Google, Anthropic). The changes enable the setup tool to discover all providers from the CSV configuration and use the existing llm_invoke function for API key validation, replacing custom HTTP-based testing functions.

Key changes:

  • Replaced provider-specific API testing functions with a unified test_api_key_with_llm_invoke() function
  • Made provider discovery dynamic by reading all unique API keys from llm_model.csv instead of hardcoding three providers
  • Removed model filtering logic that excluded non-OpenAI/Google/Anthropic models from the user's configuration

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pdd/setup_tool.py Refactored to use llm_invoke for API testing, removed hardcoded provider lists, and enabled dynamic provider discovery from CSV
tests/test_setup_tool.py Added comprehensive test suite with 14 tests covering dynamic provider discovery, llm_invoke usage, and regression tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

response = llm_invoke(
prompt="Say hello",
input_json={},
strength=0.0, # Use cheapest model
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment 'Use cheapest model' is misleading. The strength parameter represents model capability/power, not cost. A value of 0.0 selects the weakest/least capable model, which typically costs less but may not always be the absolute cheapest option. Consider clarifying: strength=0.0, # Use least capable model (typically cheapest)

Suggested change
strength=0.0, # Use cheapest model
strength=0.0, # Use least capable model (typically cheapest)

Copilot uses AI. Check for mistakes.
return response.status_code != 401 and response.status_code != 403

# If we get here without exception and have a result, the key works
return response is not None and 'result' in response
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition response is not None is redundant. If response were None, the 'result' in response check would raise a TypeError, but this is already caught by the exception handler. Simplify to return 'result' in response.

Suggested change
return response is not None and 'result' in response
return 'result' in response

Copilot uses AI. Check for mistakes.
@gltanaka
Copy link
Contributor

gltanaka commented Nov 10, 2025

make test pass
regression tests not run yet

@gltanaka gltanaka added the enhancement New feature or request label Nov 11, 2025
@gltanaka
Copy link
Contributor

@qanagattandyr can you please resolve the copilot issues?

@qanagattandyr
Copy link
Contributor Author

will do tonight

@gltanaka
Copy link
Contributor

Test Results - FAIL

Pull Request: #123

Overall Summary:

  • Passed: 950
  • Failed: 2
  • Skipped: 3
  • Duration: 4667.2s

Regression Tests - FAIL

Results:

  • Passed: 2
  • Failed: 1
  • Duration: 306.4s

Errors:

  • Command failed with exit code 2.
  • Preprocess failed: Web tag not processed
  • Test 3 failed (see logs)

Unit Tests (pytest) - FAIL

Results:

  • Passed: 938
  • Failed: 1
  • Skipped: 3
  • Duration: 1520.0s

Sync Regression Tests - PASS

Results:

  • Passed: 10
  • Failed: 0
  • Duration: 2840.9s

Errors:

  • Command failed with exit code 1.

View Full Logs

Test run completed at: 2025-11-25T01:22:16.195957

Command failed with exit code 2.

Preprocess failed: Web tag not processed

Test 3 failed (see logs)

Regression Tests

egression_cost.csv --local --strength 0.3 --temperature 1.5 generate --output gen_high_temp.py /home/runner/work/pdd/pdd/prompts/simple_math_python.prompt
[INFO] Command completed successfully.
[INFO] 'generate' high temp output file exists and is not empty: gen_high_temp.py
[INFO] 1b. Testing 'generate' with environment variable output path
[INFO] Running: /home/runner/work/pdd/pdd/pdd-local.sh --force --output-cost /home/runner/work/pdd/pdd/staging/regression_20251125_012216/regression_cost.csv --strength 0.3 --temperature 0.0 --context envonly --local generate /home/runner/work/pdd/pdd/prompts/simple_math_python.prompt
[INFO] Command completed successfully.
[INFO] 'generate' output via env var file exists and is not empty: env_out_generate/simple_math.py
[INFO] 2. Testing 'example' command
[INFO] Running: /home/runner/work/pdd/pdd/pdd-local.sh --force --output-cost /home/runner/work/pdd/pdd/staging/regression_20251125_012216/regression_cost.csv --strength 0.3 --temperature 0.0 example --output simple_math_example.py /home/runner/work/pdd/pdd/prompts/simple_math_python.prompt simple_math.py
[INFO] Command completed successfully.
[INFO] 'example' output file exists and is not empty: simple_math_example.py
[INFO] 3. Testing 'preprocess' command
[INFO] Running: /home/runner/work/pdd/pdd/pdd-local.sh --force --output-cost /home/runner/work/pdd/pdd/staging/regression_20251125_012216/regression_cost.csv --strength 0.3 --temperature 0.0 preprocess --output preprocessed_simple_math_python.prompt /home/runner/work/pdd/pdd/prompts/simple_math_python.prompt
[INFO] Command completed successfully.
[INFO] 'preprocess' basic output file exists and is not empty: preprocessed_simple_math_python.prompt
[INFO] Running: /home/runner/work/pdd/pdd/pdd-local.sh --force --output-cost /home/runner/work/pdd/pdd/staging/regression_20251125_012216/regression_cost.csv --strength 0.3 --temperature 0.0 preprocess --xml --output simple_math_xml.prompt /home/runner/work/pdd/pdd/prompts/simple_math_python.prompt
[INFO] Command completed successfully.
[INFO] 'preprocess --xml' output file exists and is not empty: simple_math_xml.prompt
[INFO] 3a. Testing complex 'preprocess' features
[INFO] Running: /home/runner/work/pdd/pdd/pdd-local.sh --force --output-cost /home/runner/work/pdd/pdd/staging/regression_20251125_012216/regression_cost.csv --strength 0.3 --temperature 0.0 preprocess --output complex_features_python_preprocessed.prompt complex_features_python.prompt
[INFO] Command completed successfully.
[INFO] 'preprocess' complex output file exists and is not empty: complex_features_python_preprocessed.prompt
[ERROR] Preprocess failed: Web tag not processed
[INFO] Running cleanup...
[INFO] Skipping cleanup as CLEANUP_ON_EXIT is false. Files remain in: /home/runner/work/pdd/pdd/staging/regression_20251125_012216
[INFO] Cleanup finished.
make: *** [Makefile:410: regression] Error 1
failed to wait for command termination: exit status 2
[01:27:22] [regression_tests] finished (rc=1)

Unit Tests (pytest)

eal_fix_command
  /usr/share/miniconda/envs/pdd/lib/python3.12/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 7: Expected `Message` - serialized value may not be as expected [input_value=Message(content='Looking ...mprehensive analysis:'}), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

tests/test_cli.py::test_real_fix_command
  /usr/share/miniconda/envs/pdd/lib/python3.12/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 7: Expected `Message` - serialized value may not be as expected [input_value=Message(content='Looking ...pparent import issue.'}), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

tests/test_cli.py::test_real_verify_command
  /usr/share/miniconda/envs/pdd/lib/python3.12/site-packages/pydantic/main.py:463: UserWarning: Pydantic serializer warnings:
    PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [input_value=Message(content='{"detail...er_specific_fields=None), input_type=Message])
    PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
    return self.__pydantic_serializer__.to_python(

tests/test_cli.py::test_cli_core_dump_flag_sets_ctx_true
tests/test_cli.py::test_cli_core_dump_does_not_propagate_exception
  /home/runner/work/pdd/pdd/pdd/cli.py:202: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
    timestamp = datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
ERROR tests/test_setup_tool.py::test_api_key_with_llm_invoke
ERROR tests/test_setup_tool.py::test_api_keys
====== 938 passed, 3 skipped, 32 warnings, 2 errors in 1508.01s (0:25:08) ======
ERROR conda.cli.main_run:execute(127): `conda run PDD_RUN_REAL_LLM_TESTS=1 PDD_RUN_LLM_TESTS=1 PDD_PATH=. PYTHONPATH=./pdd: python -m pytest -vv -n auto ./tests` failed. (See above for error)
make: *** [Makefile:178: test] Error 1
failed to wait for command termination: exit status 2
[01:47:36] [unit_tests] finished (rc=1)

Command failed with exit code 1.

Sync Regression Tests

[01:22:16] [sync_regression_tests] starting…
�[90m2025-11-25T01:22:16Z�[0m �[32mINF�[0m Injecting 33 Infisical secrets into your application process
Running sync regression tests
Running sync regression suite in parallel
[sync-regression] Case 6 completed successfully
[sync-regression] Case 2 completed successfully
[sync-regression] Case 4 completed successfully
[sync-regression] Case 7 completed successfully
[sync-regression] Case 8 completed successfully
[sync-regression] Case 3 completed successfully
[sync-regression] Case 1 completed successfully
[sync-regression] Case 10 completed successfully
[sync-regression] Case 5 completed successfully
[sync-regression] Case 9 completed successfully
[02:09:37] [sync_regression_tests] finished (rc=0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pdd Setup should use llm_invoke and give access to all models

2 participants