Resolve Security Vulnerability & Implement Cost Optimizations#86
Open
Pravalika-Batchu wants to merge 4 commits intogemma-facet:mainfrom
Open
Resolve Security Vulnerability & Implement Cost Optimizations#86Pravalika-Batchu wants to merge 4 commits intogemma-facet:mainfrom
Pravalika-Batchu wants to merge 4 commits intogemma-facet:mainfrom
Conversation
- Remove user-submitted API keys security vulnerability - Use server-side GOOGLE_API_KEY environment variable - Implement batch embedding API for 50% cost reduction - Add local testing support without GCP credentials - Update preprocessing service to handle optional API keys - Modify text similarity rewards to use batch_embed_contents
…ation Security and cost optimization improvements
- Remove user-submitted HF tokens from all request schemas - Use HF_TOKEN environment variable server-side for authentication - Make hf_token optional in InferenceRequest, BatchInferenceRequest, EvaluationRequest, TrainRequest - Update login_hf function to use HF_TOKEN env var as fallback - Update validation logic to allow server-side token usage - Maintain backward compatibility while improving security This eliminates another API key security vulnerability where users could submit Hugging Face tokens in requests.
…ation Extend security fixes to Hugging Face authentication
|
|
||
| if train_request.hf_token: | ||
| login(token=train_request.hf_token) | ||
| # Login to Hugging Face using provided token or HF_TOKEN env var |
Contributor
There was a problem hiding this comment.
Hi, thanks for your interest in contributing! However, the proper way to do this would be using OAuth which is described in #84. Using env var means users cannot connect their own accounts, which is needed to push and pull from HF hub!
| completion_texts = [_get_completion_text(comp) for comp in completions] | ||
| all_texts = completion_texts + reference_values | ||
| # Batch embed all texts | ||
| batch_response = _genai_client.models.batch_embed_contents( |
Contributor
There was a problem hiding this comment.
This does seem valid, we should batch all the calls to LLM as judge as well! They will also improve the efficiency since batched calls are cheaper (by almost 50% IIRC).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Security Issue Fixed
Problem: Users had to submit API keys (Gemini & Hugging Face) in requests, creating security vulnerability.
Solution:
Made gemini key and hf token optional in all requests
Services now use GOOGLE_API_KEY and HF_TOKEN environment variables
No more API key exposure in request payloads
Cost Optimization
Problem: Text similarity rewards made 2N API calls for N comparisons.
Solution: Switched to batch embedding API, reducing calls to 1 per batch = ~50% cost savings.
Local Development
Added support for running services locally without GCP credentials.
Key Changes
Preprocessing: Made Gemini API key optional, uses env var
Inference/Training: Made HF token optional, uses env var
Rewards: Implemented batch_embed_contents for cost savings
All Services: Added local testing mode
✅ Testing Verified
✅ Requests work without API keys
✅ Services use environment variables
✅ Batch API reduces costs
✅ Local development works
✅ Backward compatibility maintained
Environment Setup:
Production
GOOGLE_API_KEY=your-gemini-key
HF_TOKEN=your-huggingface-token
Local dev
STORAGE_TYPE=local
Impact
Security: Eliminated API key exposure
Cost: ~50% reduction in embedding costs
Dev Experience: Easy local testing
Closes: #84 - API key security vulnerability
Ready for review! 🚀
Test: STORAGE_TYPE=local python app.py then curl http://localhost:8080/health