Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tree) Fix document corruption due to loss of change info for transient nodes in sequence field #16190

Merged
merged 24 commits into from
Jul 14, 2023

Conversation

yann-achard-MS
Copy link
Contributor

@yann-achard-MS yann-achard-MS commented Jun 29, 2023

Motivation

Fixes the fact that we were dropping all nested changes to a subtree when composing the (re)creation of that subtree, changes to it, and the deletion of the subtree. See added/updated tests for details. Note that this is for sequence field only.

Dropping the nested changes used to produce unexpected results as well as potential divergence in state between clients, which eventually leads to document corruption.

Changes

This is accomplished by adding information on Insert and Revive marks to convey whether the content has been deleted (in addition to inserted/revived), and a ChangeAtomId describing the deletion.

The ChangeAtomId info is used to properly order and match up (in terms of cell index) different marks. For example, by detecting that a later revive should be composed with the transient insert.

This PR also disables some composition tests. These tests check that left-associative and right-associative compositions produce the same results. The changes in this PR do not introduce the discrepancy in how those compositions are processed, but they make the existing discrepancy obvious by making the output of the composition different. This reduction in the correctness of compose is considered an acceptable price to pay for the benefit of handling transient node changes better, especially when considering that the prior compose logic, while consistent w.r.t. to associativity, was producing results that are consistently wrong (this is because repair data needs to be created for pairs of inserts and deletes).

@yann-achard-MS yann-achard-MS requested a review from a team as a code owner June 29, 2023 20:40
@github-actions github-actions bot added area: dds Issues related to distributed data structures area: dds: tree base: main PRs targeted against main branch labels Jun 29, 2023
@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Jun 29, 2023

@fluid-example/bundle-size-tests: +2.6 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 455.09 KB 455.09 KB No change
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 244.52 KB 244.52 KB No change
loader.js 151.41 KB 151.41 KB No change
map.js 47.19 KB 47.19 KB No change
matrix.js 147.86 KB 147.86 KB No change
odspDriver.js 93.68 KB 93.68 KB No change
odspPrefetchSnapshot.js 44.51 KB 44.51 KB No change
sharedString.js 164.77 KB 164.77 KB No change
sharedTree2.js 240.38 KB 242.98 KB +2.6 KB
Total Size 1.71 MB 1.71 MB +2.6 KB

Baseline commit: b04b832

Generated by 🚫 dangerJS against 02ba46c

/**
* @returns `true` iff the given delta has a visible impact on the document tree.
*/
export function isDeltaVisible(delta: Delta.MarkList): boolean {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it difficult to avoid creating "invisible" deltas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, yes, because it might require following a sequence of moves to see if they span the boundary between transient and non-transient stuff.

For the tests that use this function, we could avoid it in more cases if NodeChangeComposer had the ability to return undefined in order to signify that the outcome of the composition turned out to be a no-op.

Comment on lines 439 to 442
case Delta.MarkType.MoveOut:
case Delta.MarkType.MoveIn:
case Delta.MarkType.Delete:
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't these marks count towards visibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, they indeed should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my mistake. They do now.

Comment on lines 187 to 189
if (newMarkIsTransient) {
return withNodeChange(baseMark, nodeChange);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this could be moved before the outer if-block.

if (newMarkIsTransient) {
return withNodeChange(baseMark, nodeChange);
}
// TODO: Make `withNodeChange` preserve type information so we don't need to cast here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just need to merge the Modify and Noop mark types.

Comment on lines 191 to 195
const nonTransient = withNodeChange(baseMark, nodeChange) as
| Insert<TNodeChange>
| Revive<TNodeChange>;
delete nonTransient.detachedBy;
return nonTransient;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to make assumptions about what kinds of marks newMark could be, and I think we should make those explicit. Our format allows composing a transient insert with a modify to the inserted node, which would behave incorrectly here.

// TODO: Create modify mark for transient node.
return { count: 0 };

assert(isDeleteMark(newMark), "Unexpected mark type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also assert isGenerativeMark(baseMark) instead of doing the cast below.

const nonTransient = withNodeChange(
baseMark,
nodeChange,
) as GenerativeMark<TNodeChange>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe because markIsTransient implies the mark is a GenerativeMark, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Comment on lines 212 to 215
// - MoveIn marks are invalid in an existing cell.
assert(newMark.type !== "MoveIn", "Invalid MoveIn after transient");
// - ReturnTo marks are invalid in for a cell whose node has not been moved out.
assert(newMark.type !== "ReturnTo", "Invalid ReturnTo after transient");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely understand these assumptions.

Copy link
Contributor Author

@yann-achard-MS yann-achard-MS Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I now realize it's more complicated than that.

The existence of the rebase sandwich means we can have a MoveIn mark that targets an already existing cell. This would have to be in a scenario where local changes with a move get rebased over something else. The something else can't be targetting the cell that the MoveIn is targetting though, because no concurrent change can know about that cell.

For the ReturnTo, I think it's possible for it to occur after a transient, but only a muted ReturnTo.
Why possible: if the transient is a revive, then it's possible that the newMark comes from a client that knew about the node, and tried to move it out and return it.
Why muted: only a single specific node will ever occupy a given cell, and the presence of a transient mark tells us that node just got deleted. Return marks that attempt to move a deleted node end up being muted.

Let me know if you think that makes sense. If so, I'll update the code (to check that the ReturnTo mark is muted), and if not, we should probably just chat.

@@ -223,9 +223,22 @@ export interface HasRevisionTag {
}
export const HasRevisionTag = Type.Object({ revision: Type.Optional(RevisionTagSchema) });

export interface Transient {
/**
* When populated, the content is inserted/revived then deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems slightly confusing to say "when populated," as the field is not optional on this interface.

return [inverse];
if (mark.detachedBy !== undefined) {
assert(revision !== undefined, "Unable to revert to undefined revision");
return [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is saying that the inverse of a transient insert is an active revive. Are you sure this is right?

Comment on lines 92 to 103
const detachEvent = { revision: detachEventRev, index: computedDetachIndex };
const mark: Reattach<never> = {
type: "Revive",
content: reviver(detachedBy, computedDetachIndex, count),
content: reviver(detachEventRev, computedDetachIndex, count),
count,
detachEvent,
};
if (!isIntention) {
mark.inverseOf = detachedBy;
mark.inverseOf = detachEventRev;
}
if (detachedBy !== undefined) {
mark.detachedBy = detachedBy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this made me think we should consider renaming detachEvent and detachedBy to be more distinct somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I had trouble comming up with better names and then I forgot about it. delete may be clearer than detachedBy. Do you have suggestions?

Ultimately we're not going to care for very long if we switch to the anchor & effects format.

Comment on lines 34 to 35
this.advanceForRevision(markLength - inLength, mark.revision);
this.advanceForRevision(outLength - markLength, mark.detachedBy.revision);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are inLength and outLength always 0 for transient marks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@@ -50,9 +51,34 @@ describe("SequenceField - Compose", () => {
title.startsWith("((delete, insert), revive)") ||
title.startsWith("((move, insert), revive)") ||
!SF.areComposable([taggedA, taggedB, taggedC])
) {
// These changes do not form a valid sequence of composable changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems inconsistent with the more detailed one below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need help seeing how. I admittedly don't recall what's wrong with ((delete, insert), revive) and ((move, insert), revive), but the fact that they were grouped with !SF.areComposable([...]) makes me think they similarly didn't make sense together.

@yann-achard-MS yann-achard-MS merged commit 1d22ad9 into microsoft:main Jul 14, 2023
27 checks passed
@yann-achard-MS yann-achard-MS deleted the transient-in-sequence branch July 14, 2023 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dds: tree area: dds Issues related to distributed data structures base: main PRs targeted against main branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants