Add automated deployment and benchmark infrastructure #62
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a comprehensive automated deployment and benchmark infrastructure for NVIDIA Dynamo inference workloads on AWS, including fixes for critical ETCD validation issues in nixlbench.
Key Features
1. Non-Interactive Deployment Automation
./scripts/quick-start.sh2. Auto-Detection and Configuration
config/environment.conffor consistent settings across all scripts3. Comprehensive Script Suite (6,378 lines)
setup-all.sh,configure-environment.shvalidate-deployment.sh,validate-build.shdeploy-dynamo-vllm.shbenchmark-trtllm.sh,benchmark-vllm-native.sh,benchmark-genai-perf.shnixlbench-test.sh,test-dynamo-modules.sh,efa-test.shdebloat-container.sh,trtllm-helpers.sh,env-info.shFixed Issues
NIXL Benchmark ETCD Validation Error
ETCD_CPP_API_DISABLE_URI_VALIDATION=1environment variableTest Results:
How to Use
Quick Start (Recommended)
# Full automated setup with skip flags for non-blocking execution SKIP_ETCD=true SKIP_OPERATOR_CHECK=true ./scripts/quick-start.shStep-by-Step
Environment Configuration
All scripts now source a central config file at
config/environment.confcontaining:Testing Status
✓ Configuration Script Test - PASSED
✓ NIXL Benchmark ETCD Test - PASSED
⚠ Validation Script Test - HANGS
kubectl version --shortflag deprecated and causes hangkubectl version --output=jsoninstead⚠ Quick Start Test - PARTIAL
⚠ Docker Build Tests - FAILED
Files Changed
148 files changed, 28,279 insertions(+)
Major Additions:
Next Steps
After merge, team should:
Fix Remaining Issues
Testing
Documentation
Enhancements
Credits
This infrastructure builds upon work from:
Ready for Review and Testing
This PR provides a solid foundation for automated deployment and benchmarking. While some edge cases remain (kubectl deprecations, interactive prompts), the core functionality is working and the configuration detection is robust.