MicrosoftLearning · v-vfarias · Mar 2, 2026 · Mar 2, 2026 · Mar 2, 2026 · Mar 2, 2026
diff --git a/.github/workflows/evaluate-agent.yml b/.github/workflows/evaluate-agent.yml
@@ -2,10 +2,10 @@ name: Evaluate Trail Guide Agent
 
 on:
   # Uncomment the lines below to enable automatic evaluation on pull requests
-  # pull_request:
-  #   branches: [main]
-  #   paths:
-  #     - 'src/agents/trail_guide_agent/**'
+  pull_request:
+    branches: [main]
+    paths:
+      - 'src/agents/trail_guide_agent/**'
   workflow_dispatch:
 
 permissions:
@@ -44,9 +44,14 @@ jobs:
         env:
           AZURE_AI_PROJECT_ENDPOINT: ${{ secrets.AZURE_AI_PROJECT_ENDPOINT }}
           MODEL_NAME: ${{ vars.MODEL_NAME || 'gpt-4.1' }}
+          AZURE_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
+          AZURE_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
+          AZURE_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
         run: |
-          python src/evaluators/evaluate_agent.py > evaluation_results.txt
+          python src/evaluators/evaluate_agent.py > evaluation_results.txt 2>&1 || true
           cat evaluation_results.txt
+          # Fail the step if the script wrote an error marker
+          grep -q "Evaluation FAILED" evaluation_results.txt && exit 1 || exit 0
 
       - name: Comment PR with results
         if: github.event_name == 'pull_request'
@@ -55,6 +60,7 @@ jobs:
           script: |
             const fs = require('fs');
             const results = fs.readFileSync('evaluation_results.txt', 'utf8');
+            const reportUrl = '${{ steps.run.outputs.report_url }}' || 'Not available';
 
             const body = `## 🎯 Agent Evaluation Results
 
@@ -69,7 +75,7 @@ jobs:
 
             </details>
 
-            📊 [View full results in Azure AI Foundry Portal](${{ steps.run.outputs.report_url }})
+            📊 [View full results in Azure AI Foundry Portal](${reportUrl})
 
             **Evaluation Criteria:**
             - Intent Resolution (score ≥ 3)

diff --git a/docs/02-prompt-management.md b/docs/02-prompt-management.md
@@ -74,6 +74,13 @@ Now you'll use the Azure Developer CLI to deploy all required Azure resources.
 
     Sign in with your Azure credentials when prompted.
 
+    > ⚠️ **Important**
+    > In some environments, the VS Code integrated terminal may crash or close during the interactive login flow.
+    > If this happens, authenticate using explicit credentials instead:
+    > ```powershell
+    > az login --username <your-username> --password <your-password>
+    > ```
+
 1. Provision resources:
 
     ```powershell
@@ -98,6 +105,15 @@ Now you'll use the Azure Developer CLI to deploy all required Azure resources.
     azd env get-values > .env
     ```
 
+    > ⚠️ **Important – File Encoding**
+    >
+    > After generating the `.env` file, make sure it is saved using **UTF-8** encoding.
+    >
+    > In editors like **VS Code**, check the encoding indicator in the bottom-right corner.  
+    > If it shows **UTF-16 LE** (or any encoding other than UTF-8), click it, choose **Save with Encoding**, and select **UTF-8**.
+    >
+    > Using the wrong encoding may cause environment variables to be read incorrectly.
+
     This creates a `.env` file in your project root with all the provisioned resource information.
 
 1. Add the agent configuration to your `.env` file:

diff --git a/docs/03-design-optimize-prompts.md b/docs/03-design-optimize-prompts.md
@@ -74,6 +74,13 @@ Now you'll use the Azure Developer CLI to deploy all required Azure resources.
 
     Sign in with your Azure credentials when prompted.
 
+    > ⚠️ **Important**
+    > In some environments, the VS Code integrated terminal may crash or close during the interactive login flow.
+    > If this happens, authenticate using explicit credentials instead:
+    > ```powershell
+    > az login --username <your-username> --password <your-password>
+    > ```
+
 1. Provision resources:
 
     ```powershell
@@ -98,6 +105,15 @@ Now you'll use the Azure Developer CLI to deploy all required Azure resources.
     azd env get-values > .env
     ```
 
+    > ⚠️ **Important – File Encoding**
+    >
+    > After generating the `.env` file, make sure it is saved using **UTF-8** encoding.
+    >
+    > In editors like **VS Code**, check the encoding indicator in the bottom-right corner.  
+    > If it shows **UTF-16 LE** (or any encoding other than UTF-8), click it, choose **Save with Encoding**, and select **UTF-8**.
+    >
+    > Using the wrong encoding may cause environment variables to be read incorrectly.
+
     This creates a `.env` file in your project root with all the provisioned resource information.
 
 ### Install Python dependencies

diff --git a/docs/04-automated-evaluation.md b/docs/04-automated-evaluation.md
@@ -14,9 +14,9 @@ This exercise takes approximately **40 minutes**.
 
 ## Introduction
 
-In this exercise, you'll use Microsoft Foundry's cloud evaluators to automatically assess quality at scale for the Adventure Works Trail Guide Agent. You'll run evaluations against a large test dataset (200 query-response pairs) to validate quality metrics and establish an automated evaluation pipeline for future changes.
+In this exercise, you'll use Microsoft Foundry's cloud evaluators to automatically assess quality at scale for the Adventure Works Trail Guide Agent. You'll run evaluations against a large test dataset (89 query-response pairs) to validate quality metrics and establish an automated evaluation pipeline for future changes.
 
-**Scenario**: You're operating the Adventure Works Trail Guide Agent. You want to evaluate it against a large test dataset (200 query-response pairs) to validate quality metrics and establish an automated evaluation pipeline that can scale as your agent evolves.
+**Scenario**: You're operating the Adventure Works Trail Guide Agent. You want to evaluate it against a large test dataset (89 query-response pairs) to validate quality metrics and establish an automated evaluation pipeline that can scale as your agent evolves.
 
 You'll use the following evaluation criteria—automated at scale:
 
@@ -80,6 +80,13 @@ Now you'll use the Azure Developer CLI to deploy all required Azure resources.
 
     Sign in with your Azure credentials when prompted.
 
+    > ⚠️ **Important**
+    > In some environments, the VS Code integrated terminal may crash or close during the interactive login flow.
+    > If this happens, authenticate using explicit credentials instead:
+    > ```powershell
+    > az login --username <your-username> --password <your-password>
+    > ```
+
 1. Provision resources:
 
     ```powershell
@@ -104,6 +111,15 @@ Now you'll use the Azure Developer CLI to deploy all required Azure resources.
     azd env get-values > .env
     ```
 
+    > ⚠️ **Important – File Encoding**
+    >
+    > After generating the `.env` file, make sure it is saved using **UTF-8** encoding.
+    >
+    > In editors like **VS Code**, check the encoding indicator in the bottom-right corner.  
+    > If it shows **UTF-16 LE** (or any encoding other than UTF-8), click it, choose **Save with Encoding**, and select **UTF-8**.
+    >
+    > Using the wrong encoding may cause environment variables to be read incorrectly.
+
     This creates a `.env` file in your project root with all the provisioned resource information.
 
 ### Install Python dependencies
@@ -159,7 +175,7 @@ Cloud evaluation follows a structured workflow:
 
 ### Dataset preparation
 
-The repository includes `data/trail_guide_evaluation_dataset.jsonl` with 200 pre-generated query-response pairs covering diverse hiking scenarios. Each entry includes:
+The repository includes `data/trail_guide_evaluation_dataset.jsonl` with 89 pre-generated query-response pairs covering diverse hiking scenarios. Each entry includes:
 
 - `query`: User question
 - `response`: Agent-generated answer
@@ -206,7 +222,7 @@ First, examine the prepared dataset structure.
     (Get-Content data/trail_guide_evaluation_dataset.jsonl).Count
     ```
 
-    Expected: 200 entries
+    Expected: 89 entries
 
 ### Understand the evaluation pipeline
 
@@ -219,7 +235,7 @@ The script performs all evaluation steps automatically:
 1. **Upload Dataset** - Uploads the JSONL dataset to Microsoft Foundry
 2. **Define Evaluation** - Creates evaluation definition with quality evaluators (Intent Resolution, Relevance, Groundedness)
 3. **Run Evaluation** - Starts the cloud evaluation run
-4. **Poll for Completion** - Waits for evaluation to complete (5-10 minutes for 200 items)
+4. **Poll for Completion** - Waits for evaluation to complete (5-10 minutes for 89 items)
 5. **Display Results** - Retrieves and shows scoring statistics
 
 This single-script approach makes it easy to run evaluations both locally during development and automatically in CI/CD pipelines.
@@ -279,7 +295,7 @@ Execute the complete evaluation pipeline with one command.
       Run ID: run-ghi789rst
       Status: running
 
-    This may take 5-10 minutes for 200 items...
+    This may take 5-10 minutes for 89 items...
 
     ================================================================================
     Step 4: Polling for completion
@@ -297,9 +313,9 @@ Execute the complete evaluation pipeline with one command.
       Report URL: https://<account>.services.ai.azure.com/projects/<project>/evaluations/...
 
     Average Scores (1-5 scale, threshold: 3)
-      Intent Resolution: 4.52 (n=200)
-      Relevance:         4.41 (n=200)
-      Groundedness:      4.18 (n=200)
+      Intent Resolution: 4.52 (n=89)
+      Relevance:         4.41 (n=89)
+      Groundedness:      4.18 (n=89)
 
     Pass Rates (score >= 3)
       Intent Resolution: 96.0%
@@ -316,7 +332,7 @@ Execute the complete evaluation pipeline with one command.
       3. Document key findings and recommendations
     ```
 
-    > **Note**: Evaluation runtime varies based on dataset size and model capacity. 200 items typically takes 5-15 minutes.
+    > **Note**: Evaluation runtime varies based on dataset size and model capacity. 89 items typically takes 5-15 minutes.
 
 1. **Commit the results file**
 
@@ -374,27 +390,54 @@ The evaluation script integrates seamlessly into GitHub Actions for automated PR
 
 1. **Configure Azure authentication**
 
-    Create a service principal with Foundry project access:
+    Create a service principal for GitHub Actions:
 
     ```powershell
-    # Create service principal
-    az ad sp create-for-rbac --name "github-agent-evaluator" `
-      --role "Azure AI Developer" `
-      --scopes /subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.MachineLearningServices/workspaces/<workspace> `
-      --sdk-auth
+    az ad sp create-for-rbac --name "github-agent-evaluator"
     ```
 
-    Configure federated identity for GitHub OIDC:
+    Note the `appId` value from the output — you will use it in the next steps.
+
+    Assign the **Azure AI User** role at the account scope. This role has `Microsoft.CognitiveServices/*` wildcard data actions, which covers the `AIServices/agents/write` action required by the Foundry project evaluation API:
+
+    ```powershell
+    az role assignment create `
+      --assignee "<appId>" `
+      --role "Azure AI User" `
+      --scope "/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<ai-account-name>"
+    ```
+
+    > **Important**: Use the `AZURE_AI_ACCOUNT_NAME` value from your `.env` file as `<ai-account-name>`. The `Azure AI Developer` role is **not sufficient** — it only covers `OpenAI/*`, `SpeechServices/*`, `ContentSafety/*`, and `MaaS/*` data actions, but not `AIServices/agents/write` which the Foundry project API requires.
+
+    > **Tip**: If you set the optional `githubActionsPrincipalId` parameter when running `azd up`, the infrastructure deployment will create this role assignment automatically for future environments.
+
+    Configure federated identity for GitHub OIDC so the workflow can authenticate without a secret.
+
+    Create a file named `federated-credential.json` in your repository root:
+
+    ```json
+    {
+      "name": "github-actions",
+      "issuer": "https://token.actions.githubusercontent.com",
+      "subject": "repo:<your-org>/<your-repo>:ref:refs/heads/main",
+      "audiences": ["api://AzureADTokenExchange"]
+    }
+    ```
+
+    > **Note**: Replace `<your-org>/<your-repo>` with your exact GitHub username and repository name. The subject is case-sensitive and must match exactly.
+
+    Register the federated credential using the file:
 
     ```powershell
     az ad app federated-credential create `
-      --id <app-id> `
-      --parameters '{
-        "name": "github-actions",
-        "issuer": "https://token.actions.githubusercontent.com",
-        "subject": "repo:<your-org>/<your-repo>:ref:refs/heads/main",
-        "audiences": ["api://AzureADTokenExchange"]
-      }'
+      --id "<appId>" `
+      --parameters @federated-credential.json
+    ```
+
+    Once the credential is created successfully, delete the file — it contains no secrets but there is no reason to keep it in the repository:
+
+    ```powershell
+    Remove-Item federated-credential.json
     ```
 
 1. **Review the PR evaluation workflow**
@@ -483,7 +526,7 @@ Document your findings and create an analysis report.
 
     ## Evaluation Summary
 
-    Evaluated: 200 test cases  
+    Evaluated: 89 test cases  
     Time: ~10 minutes  
     Scoring: GPT-4.1 as LLM judge (1-5 scale)
 
@@ -521,7 +564,7 @@ Document your findings and create an analysis report.
 
     - **Scales** to hundreds/thousands of items efficiently
     - **Consistent** scoring criteria across all evaluations
-    - **Fast** turnaround (10 minutes for 200 items)
+    - **Fast** turnaround (10 minutes for 89 items)
     - **Repeatable** and trackable over time
     - **CI/CD ready** for integration into deployment pipelines
     - **Detailed reasoning** provided for each score
@@ -580,7 +623,7 @@ Compare evaluation results between GPT-4.1 and GPT-4.1-mini to understand qualit
 
 ### Run evaluation on GPT-4.1-mini responses
 
-1. Generate 200 responses from GPT-4.1-mini for the same queries.
+1. Generate 89 responses from GPT-4.1-mini for the same queries.
 
 1. Run cloud evaluation on both sets.
 
@@ -610,7 +653,7 @@ Create `experiments/automated/model_comparison.md` with:
 
 **Resolution**:
 - Run `az login` to refresh Azure credentials
-- Verify you have **Azure AI User** role on the Foundry project
+- Verify the service principal has the **Azure AI User** role at the CognitiveServices account scope — this role has `Microsoft.CognitiveServices/*` wildcard data actions required for `AIServices/agents/write`. `Azure AI Developer` alone is **not sufficient**
 - Check `AZURE_AI_PROJECT_ENDPOINT` in `.env` file is correct and includes `/api/projects/<project>`
 
 ### Evaluator scoring seems inconsistent