Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions notebooks/analyzer_training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,18 @@
"\n",
"Labeled data consists of samples that have been tagged with one or more labels to add context or meaning. This additional information is used to improve the analyzer's performance.\n",
"\n",
"In your own projects, you can use [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/quickstart/use-ai-foundry) to annotate your data with the labeling tool.\n",
"For your own projects, you can use [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-services/content-understanding/quickstart/use-ai-foundry) to annotate your data with the labeling tool.\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity]
    • change: Changed the phrase "In your own projects" to "For your own projects"
    • rationale: "For your own projects" sounds more natural and clear in this context, improving the flow of the sentence.
    • impact: Enhances readability and makes the instruction feel more approachable and user-friendly.

"This notebook demonstrates how to create an analyzer using your labeled data and how to analyze your files afterward.\n",
"\n",
"\n",
"## Prerequisites\n",
"1. Ensure your Azure AI service is configured by following the [configuration steps](../README.md#configure-azure-ai-service-resource).\n",
"2. Set environment variables related to training data by following the steps in [Set env for training data](../docs/set_env_for_training_data_and_reference_doc.md) and adding them to the [.env](./.env) file.\n",
" - You can either set `TRAINING_DATA_SAS_URL` directly with the SAS URL for your Azure Blob container,\n",
" - You can either set `TRAINING_DATA_SAS_URL` directly with the SAS URL for your Azure Blob container.\n",
" - Or set both `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and `TRAINING_DATA_CONTAINER_NAME` to generate the SAS URL automatically during later steps.\n",
" - Also set `TRAINING_DATA_PATH` to specify the folder path within the container where the training data will be uploaded.\n",
"3. Install the packages required to run the sample:\n"
"3. Please install the packages required to run the sample:\n"
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar]

    • change: Replaced a comma with a period at the end of the first list item.
    • rationale: The first item in the list was a complete sentence, so ending it with a period is grammatically correct.
    • impact: This change improves the grammatical correctness and professional tone of the documentation.
  • categories: [Clarity]

    • change: Changed "Install the packages required to run the sample:" to "Please install the packages required to run the sample:"
    • rationale: Adding "Please" makes the instruction more polite and reader-friendly.
    • impact: Enhances the readability and user engagement by making the instruction sound more courteous.

},
{
Expand Down Expand Up @@ -67,11 +67,11 @@
"## Create Azure Content Understanding Client\n",
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class that contains helper functions. Before the official release of the Content Understanding SDK, please consider it a lightweight SDK.\n",
">\n",
"> Fill in the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with the information from your Azure AI Service.\n",
"> Please fill in the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with the information from your Azure AI Service.\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity]
    • change: Modified the instruction from "> Fill in the constants ..." to "> Please fill in the constants ...".
    • rationale: Adding "Please" makes the sentence more polite and reader-friendly, improving the tone of the instruction.
    • impact: Enhances the readability and approachability of the documentation, encouraging better user engagement.

"> ⚠️ Important:\n",
"You must update the code below to match your Azure authentication method.\n",
"Look for the `# IMPORTANT` comments and modify those sections accordingly.\n",
"Look for the `# IMPORTANT` comments and please modify those sections accordingly.\n",
"If you skip this step, the sample may not run correctly.\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity]
    • change: Added the word "please" to the sentence "Look for the # IMPORTANT comments and modify those sections accordingly."
    • rationale: The inclusion of "please" makes the instruction more polite and clearer in tone.
    • impact: Enhances the readability and tone of the documentation, making the request sound more courteous and user-friendly.

"\n",
"> ⚠️ Note: While using a subscription key works, using a token provider with Azure Active Directory (AAD) is safer and highly recommended for production environments."
Expand Down Expand Up @@ -153,18 +153,18 @@
"\n",
"> **💡 Note:** This step is only required **once per Azure Content Understanding resource**, unless the GPT deployment has been changed. You can skip this section if:\n",
"> - This configuration has already been run once for your resource, or\n",
"> - Your administrator has already configured the model deployments for you\n",
"> - Your administrator has already configured the model deployments for you.\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar]
    • change: Added a period at the end of the sentence "Your administrator has already configured the model deployments for you."
    • rationale: Proper punctuation was missing, and adding a period completes the sentence correctly.
    • impact: Improves the professionalism and readability of the documentation by adhering to standard grammar rules.

"Before using prebuilt analyzers, you need to configure the default model deployment mappings. This tells Content Understanding which model deployments to use.\n",
"\n",
"**Model Requirements:**\n",
"- **GPT-4.1** - Required for most prebuilt analyzers (e.g., `prebuilt-invoice`, `prebuilt-receipt`, `prebuilt-idDocument`)\n",
"- **GPT-4.1-mini** - Required for RAG analyzers (e.g., `prebuilt-documentSearch`, `prebuilt-audioSearch`, `prebuilt-videoSearch`)\n",
"- **text-embedding-3-large** - Required for all prebuilt analyzers that use embeddings\n",
"- **GPT-4.1** - Required for most prebuilt analyzers (e.g., `prebuilt-invoice`, `prebuilt-receipt`, `prebuilt-idDocument`).\n",
"- **GPT-4.1-mini** - Required for RAG analyzers (e.g., `prebuilt-documentSearch`, `prebuilt-audioSearch`, `prebuilt-videoSearch`).\n",
"- **text-embedding-3-large** - Required for all prebuilt analyzers that use embeddings.\n",
"\n",
"**Prerequisites:**\n",
"1. Deploy **GPT-4.1**, **GPT-4.1-mini**, and **text-embedding-3-large** models in Azure AI Foundry\n",
"2. Set `GPT_4_1_DEPLOYMENT`, `GPT_4_1_MINI_DEPLOYMENT`, and `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` in your `.env` file with the deployment names"
"1. Deploy **GPT-4.1**, **GPT-4.1-mini**, and **text-embedding-3-large** models in Azure AI Foundry.\n",
"2. Set `GPT_4_1_DEPLOYMENT`, `GPT_4_1_MINI_DEPLOYMENT`, and `TEXT_EMBEDDING_3_LARGE_DEPLOYMENT` in your `.env` file with the deployment names."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Consistency]

    • change: Added periods at the end of list items describing model requirements (e.g., after prebuilt-idDocument and prebuilt-videoSearch).
    • rationale: Ensured that all complete sentences in the list end with proper punctuation, maintaining grammatical correctness.
    • impact: Enhances readability and presents a professional, polished appearance in the documentation.
  • categories: [Grammar, Consistency]

    • change: Added periods at the end of numbered prerequisite steps.
    • rationale: Maintained uniformity by concluding each numbered item with appropriate punctuation.
    • impact: Improves clarity and consistency, making the instructions easier to follow and visually coherent.

},
{
Expand Down Expand Up @@ -193,12 +193,12 @@
" print(f\" - {deployment}\")\n",
" print(\"\\n Prebuilt analyzers require GPT-4.1, GPT-4.1-mini, and text-embedding-3-large deployments.\")\n",
" print(\" Please:\")\n",
" print(\" 1. Deploy all three models in Azure AI Foundry\")\n",
" print(\" 1. Deploy all three models in Azure AI Foundry.\")\n",
" print(\" 2. Add the following to notebooks/.env:\")\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar]
    • change: Added a period at the end of the sentence "Deploy all three models in Azure AI Foundry"
    • rationale: To complete the sentence with proper punctuation according to standard grammatical rules
    • impact: Improves the professionalism and readability of the printed instructions by adhering to grammatical conventions

" print(\" GPT_4_1_DEPLOYMENT=<your-gpt-4.1-deployment-name>\")\n",
" print(\" GPT_4_1_MINI_DEPLOYMENT=<your-gpt-4.1-mini-deployment-name>\")\n",
" print(\" TEXT_EMBEDDING_3_LARGE_DEPLOYMENT=<your-text-embedding-3-large-deployment-name>\")\n",
" print(\" 3. Restart the kernel and run this cell again\")\n",
" print(\" 3. Restart the kernel and run this cell again.\")\n",
"else:\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar]
    • change: Added a period at the end of the printed instruction string "Restart the kernel and run this cell again."
    • rationale: The period provides proper punctuation to complete the sentence, improving grammatical correctness.
    • impact: Enhances readability and professionalism of the output message by following standard sentence structure.

" print(f\"📋 Configuring default model deployments...\")\n",
" print(f\" GPT-4.1 deployment: {GPT_4_1_DEPLOYMENT}\")\n",
Expand All @@ -220,8 +220,8 @@
" except Exception as e:\n",
" print(f\"❌ Failed to configure defaults: {e}\")\n",
" print(f\" This may happen if:\")\n",
" print(f\" - One or more deployment names don't exist in your Azure AI Foundry project\")\n",
" print(f\" - You don't have permission to update defaults\")\n",
" print(f\" - One or more deployment names don't exist in your Azure AI Foundry project.\")\n",
" print(f\" - You don't have permission to update defaults.\")\n",
" raise\n"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Consistency]
    • change: Added periods at the end of two printed error message lines.
    • rationale: Ensures proper sentence punctuation and maintains consistent formatting across error messages.
    • impact: Enhances the professionalism and readability of the output messages, making them clearer for users.

]
},
Expand All @@ -231,7 +231,7 @@
"source": [
"## Prepare Labeled Data\n",
"In this step, we will:\n",
"- Use the environment variables `TRAINING_DATA_PATH` and SAS URL related variables set in the Prerequisites step.\n",
"- Use the environment variables `TRAINING_DATA_PATH` and SAS URL related variables set in the Prerequisites section.\n",
"- Attempt to get the SAS URL from the environment variable `TRAINING_DATA_SAS_URL`.\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Consistency]
    • change: Replaced the word "step" with "section" when referring to the Prerequisites.
    • rationale: The term "section" is more appropriate and consistent within documentation contexts, especially when referring to document divisions rather than processes.
    • impact: Enhances consistency and clarity in documentation, making it easier for readers to locate and understand referenced content.

"- If `TRAINING_DATA_SAS_URL` is not set, try generating it automatically using `TRAINING_DATA_STORAGE_ACCOUNT_NAME` and `TRAINING_DATA_CONTAINER_NAME` environment variables.\n",
"- Verify that each document file in the local folder has corresponding `.labels.json` and `.result.json` files.\n",
Expand Down Expand Up @@ -311,7 +311,7 @@
"metadata": {},
"source": [
"## Create Analyzer with Defined Schema\n",
"Before creating the analyzer, fill in the constant `ANALYZER_ID` with a relevant name for your task. In this example, we generate a unique suffix so that this cell can be run multiple times to create different analyzers.\n",
"Before creating the analyzer, please fill in the constant `ANALYZER_ID` with a relevant name for your task. In this example, we generate a unique suffix so that this cell can be run multiple times to create different analyzers.\n",
"\n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Clarity]
    • change: Added the word "please" to the instruction "fill in the constant ANALYZER_ID".
    • rationale: Including "please" makes the instruction more polite and reader-friendly, improving the tone of the documentation.
    • impact: Enhances the clarity and approachability of the guidance, making it more likely to engage the reader positively.

"We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** as set in the [.env](./.env) file and used in the previous step."
]
Expand Down Expand Up @@ -493,7 +493,7 @@
" elif val.get('type') == 'number':\n",
" print(f\" {key}: {val.get('valueNumber')}\")\n",
" else:\n",
" print(\"No fields extracted\")\n",
" print(\"No fields extracted.\")\n",
" \n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar, Consistency]
    • change: Added a period at the end of the sentence in the print statement ("No fields extracted." instead of "No fields extracted")
    • rationale: To complete the sentence with proper punctuation, ensuring consistent and grammatically correct output messages
    • impact: Enhances the professionalism and readability of the output, providing a more polished user experience

" # Display content metadata\n",
" print(f\"\\n📋 Content Metadata:\")\n",
Expand Down Expand Up @@ -527,24 +527,24 @@
" col_count = table.get(\"columnCount\", 0)\n",
" print(f\" Table {idx}: {row_count} rows x {col_count} columns\")\n",
" else:\n",
" print(\"\\n📚 Document Information: Not available for this content type\")\n",
" print(\"\\n📚 Document Information: Not available for this content type.\")\n",
" else:\n",
" print(\"No contents available in analysis result\")\n",
" print(\"No contents available in analysis result.\")\n",
" \n",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar]

    • change: Added a period at the end of the sentence "Document Information: Not available for this content type."
    • rationale: To correct punctuation and complete the sentence properly.
    • impact: Improves readability and professionalism of the printed message.
  • categories: [Grammar]

    • change: Added a period at the end of the sentence "No contents available in analysis result."
    • rationale: To ensure proper punctuation at the end of the sentence.
    • impact: Enhances clarity and consistency in output messages.

" # Save the analysis result to a file\n",
" saved_file_path = save_json_to_file(analysis_result, filename_prefix=\"analyzer_training_result\")\n",
" # Print the full analysis result as a JSON string\n",
" print(json.dumps(analysis_result, indent=2))\n",
"else:\n",
" print(\"No analysis result available\")"
" print(\"No analysis result available.\")"
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Grammar]
    • change: Added a period at the end of the print statement string ("No analysis result available." instead of "No analysis result available")
    • rationale: Adding proper punctuation improves the grammatical correctness of the output message.
    • impact: Enhances the professionalism and readability of the message displayed to users.

},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete Existing Analyzer in Content Understanding Service\n",
"This snippet is optional and is included to prevent test analyzers from remaining in your service. Without deletion, the analyzer will stay in your service and may be reused in subsequent operations."
"This snippet is optional and is included to help prevent test analyzers from remaining in your service. Without deletion, the analyzer will stay in your service and may be reused in subsequent operations."
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Clarity]
    • change: Added the phrase "to help" before "prevent test analyzers"
    • rationale: The insertion clarifies that the snippet assists in preventing test analyzers from remaining, rather than asserting it as an absolute action.
    • impact: Improves the accuracy and readability of the documentation by making the statement less absolute and more precise.

},
{
Expand Down Expand Up @@ -580,4 +580,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • categories: [Formatting]
    • change: Added a newline after the closing brace }.
    • rationale: Ensures the file ends with a newline character, following standard formatting conventions.
    • impact: Improves compatibility with various tools and editors, prevents potential warnings, and adheres to common style guidelines.

Loading