Skip to content

docs: add reassignZinggId & model diff guides#1283

Open
padam-prakash wants to merge 2 commits intozinggAI:mainfrom
padam-prakash:Issue#1272DocumentationForDiffandReassign
Open

docs: add reassignZinggId & model diff guides#1283
padam-prakash wants to merge 2 commits intozinggAI:mainfrom
padam-prakash:Issue#1272DocumentationForDiffandReassign

Conversation

@padam-prakash
Copy link
Contributor

Expand and improve documentation for explainOutput, persistent ZINGG ID reassignments, and model diff. Changes include:

  • docs/explainoutput/README.md: rename incrementalRun->runIncremental and add an "Important Notes" section clarifying that explainOutput focuses on probabilistic matches (deterministic matches are not included and may produce empty results for some clusters).
  • docs/persistentZinggId.md: fix title typo and replace placeholder with a full guide for the reassignZinggId phase (motivation, how it works, CLI and Python usage examples, configuration wrappers, example JSON configs, and output/usage notes).
  • docs/stepbystep/createtrainingdata/modeldiff.md: replace placeholder with a comprehensive model diff guide (purpose, how it works, CLI and Python examples, configuration samples, output format, and use cases).

closes #1272

Expand and improve documentation for explainOutput, persistent ZINGG ID reassignments, and model diff. Changes include:

- docs/explainoutput/README.md: rename incrementalRun->runIncremental and add an "Important Notes" section clarifying that explainOutput focuses on probabilistic matches (deterministic matches are not included and may produce empty results for some clusters).
- docs/persistentZinggId.md: fix title typo and replace placeholder with a full guide for the reassignZinggId phase (motivation, how it works, CLI and Python usage examples, configuration wrappers, example JSON configs, and output/usage notes).
- docs/stepbystep/createtrainingdata/modeldiff.md: replace placeholder with a comprehensive model diff guide (purpose, how it works, CLI and Python examples, configuration samples, output format, and use cases).
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands Enterprise documentation for understanding model evolution and operational continuity by adding full guides for model diff, persistent ZINGG ID reassignment, and clarifying explainOutput behavior.

Changes:

  • Added a comprehensive guide for the diff phase, including CLI/Python usage and config examples.
  • Replaced the placeholder persistent ID page with a full reassignZinggId guide (motivation, workflow, usage examples, configs).
  • Updated explainOutput docs to use runIncremental and added notes clarifying its current focus on probabilistic matches.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
docs/stepbystep/createtrainingdata/modeldiff.md Replaces placeholder with an end-to-end “diff” guide (usage, configs, output).
docs/persistentZinggId.md Fixes page title and adds a detailed “reassignZinggId” guide (usage, configs, behavior).
docs/explainoutput/README.md Renames incremental phase reference and documents explainOutput scope/limitations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +366 to +379
"deterministicMatching": [
{
"matchCondition": [
{"fieldName": "fname"},
{"fieldName": "stNo"},
{"fieldName": "add1"}
]
},
{
"matchCondition": [
{"fieldName": "ssn"}
]
}
],
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The narrative/Python example says deterministic matching rules are added in the new configuration, but in the JSON samples the deterministicMatching block appears under the Original Configuration (configBaseline.json). Please align the examples (either move deterministicMatching to the updated config sample or adjust the explanation) to avoid misleading readers about what changed between the models.

Suggested change
"deterministicMatching": [
{
"matchCondition": [
{"fieldName": "fname"},
{"fieldName": "stNo"},
{"fieldName": "add1"}
]
},
{
"matchCondition": [
{"fieldName": "ssn"}
]
}
],

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

Update documentation to clarify the required wrapper configuration for CLI operations and update example files/names. Key changes: use the sparkIncremental reassign example in persistentZinggId.md; explain that --conf must point to a wrapper config containing transformedOutputPath and that its name is an arbitrary internal identifier; change transformedOutputPath names to reassignedOutput and diffOutput in examples; add explanatory notes about wrapper usage; add deterministicMatching to the updated model example and remove the duplicate block from the baseline example; and clarify wording for ZINGG_ID_UPDATED/ORIGINAL in the diff output description.
@padam-prakash padam-prakash requested a review from sonalgoyal March 5, 2026 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update documentation for new features: Cluster stats, ExplainOutput, outputDiff, outputReassign

2 participants