docs: add reassignZinggId & model diff guides#1283
docs: add reassignZinggId & model diff guides#1283padam-prakash wants to merge 2 commits intozinggAI:mainfrom
Conversation
Expand and improve documentation for explainOutput, persistent ZINGG ID reassignments, and model diff. Changes include: - docs/explainoutput/README.md: rename incrementalRun->runIncremental and add an "Important Notes" section clarifying that explainOutput focuses on probabilistic matches (deterministic matches are not included and may produce empty results for some clusters). - docs/persistentZinggId.md: fix title typo and replace placeholder with a full guide for the reassignZinggId phase (motivation, how it works, CLI and Python usage examples, configuration wrappers, example JSON configs, and output/usage notes). - docs/stepbystep/createtrainingdata/modeldiff.md: replace placeholder with a comprehensive model diff guide (purpose, how it works, CLI and Python examples, configuration samples, output format, and use cases).
There was a problem hiding this comment.
Pull request overview
This PR expands Enterprise documentation for understanding model evolution and operational continuity by adding full guides for model diff, persistent ZINGG ID reassignment, and clarifying explainOutput behavior.
Changes:
- Added a comprehensive guide for the
diffphase, including CLI/Python usage and config examples. - Replaced the placeholder persistent ID page with a full
reassignZinggIdguide (motivation, workflow, usage examples, configs). - Updated
explainOutputdocs to userunIncrementaland added notes clarifying its current focus on probabilistic matches.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/stepbystep/createtrainingdata/modeldiff.md | Replaces placeholder with an end-to-end “diff” guide (usage, configs, output). |
| docs/persistentZinggId.md | Fixes page title and adds a detailed “reassignZinggId” guide (usage, configs, behavior). |
| docs/explainoutput/README.md | Renames incremental phase reference and documents explainOutput scope/limitations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "deterministicMatching": [ | ||
| { | ||
| "matchCondition": [ | ||
| {"fieldName": "fname"}, | ||
| {"fieldName": "stNo"}, | ||
| {"fieldName": "add1"} | ||
| ] | ||
| }, | ||
| { | ||
| "matchCondition": [ | ||
| {"fieldName": "ssn"} | ||
| ] | ||
| } | ||
| ], |
There was a problem hiding this comment.
The narrative/Python example says deterministic matching rules are added in the new configuration, but in the JSON samples the deterministicMatching block appears under the Original Configuration (configBaseline.json). Please align the examples (either move deterministicMatching to the updated config sample or adjust the explanation) to avoid misleading readers about what changed between the models.
| "deterministicMatching": [ | |
| { | |
| "matchCondition": [ | |
| {"fieldName": "fname"}, | |
| {"fieldName": "stNo"}, | |
| {"fieldName": "add1"} | |
| ] | |
| }, | |
| { | |
| "matchCondition": [ | |
| {"fieldName": "ssn"} | |
| ] | |
| } | |
| ], |
Update documentation to clarify the required wrapper configuration for CLI operations and update example files/names. Key changes: use the sparkIncremental reassign example in persistentZinggId.md; explain that --conf must point to a wrapper config containing transformedOutputPath and that its name is an arbitrary internal identifier; change transformedOutputPath names to reassignedOutput and diffOutput in examples; add explanatory notes about wrapper usage; add deterministicMatching to the updated model example and remove the duplicate block from the baseline example; and clarify wording for ZINGG_ID_UPDATED/ORIGINAL in the diff output description.
Expand and improve documentation for explainOutput, persistent ZINGG ID reassignments, and model diff. Changes include:
closes #1272