Skip to content
This repository has been archived by the owner on Jul 23, 2019. It is now read-only.

Latest commit

 

History

History
65 lines (37 loc) · 8.48 KB

2018_07_31.md

File metadata and controls

65 lines (37 loc) · 8.48 KB

Update for July 31, 2018

Contributions

@Aerijo encountered some confusion and took it upon himself to update our contributing guide to ensure others wouldn't suffer the same fate. We really appreciate these kinds of improvements.

Batched conflict resolution

As I mentioned in last week's update, having achieved convergence for replicated directory trees, last week we started down the path of mirroring changes from our internal CRDT-based representation to the underlying file system. After implementing file system reads and starting on randomized tests, we quickly realized that our previous mental model was incomplete.

In our previous tests of convergence, we applied operations to our in-memory representation one at a time, moving, inserting, and deleting each directory in serial. However, when scanning changes from the disk, this serial approach is impossible. We only see a snapshot of the file system's latest state, which could have been produced by a variety of different sequences of individual operations.

Consider the following directory structure, with two different directories that are both named "b". We'll label them b(1) and b(2) in our example to clearly identify them:

a/
a/b(1)
b(2)/

The next time we scan the file system, we observe that the directory structure has changed to the following:

a/
a/b(2)
b(1)/

The two directories named b have swapped their positions. If we naively apply the operations derived from this swap one at a time on a remote replica, we'll end up creating name conflicts. As I've discussed previously, we resolve name conflicts created by concurrent operations by appending a tilde character to one of the conflicting names. But in this case, appending a tilde would be incorrect, because the final state of the tree that we are trying to produce contains no actual conflicts.

To avoid spurious conflict resolutions, we moved from resolving conflicts after each operation to resolving conflicts after applying arbitrary batches of operations. It took a couple days to iron out all of the new issues and edge cases with this new approach in randomized testing, which took us until last Thursday. Finally, we managed to achieve convergence with the new approach to conflict resolution in a million randomized trials of 5 different peers applying 20 operations.

Batched writes to the file system

The batched nature of operation application presented a puzzle for file system writes as well. Previously, we had planned on applying the effects of each operation as it arrived, but now we realized that wouldn't work. We needed to apply a batch of operations to the tree, resolve conflicts, and only then write changes in the new state of tree to the file system. Unlike our internal representation, which can temporarily tolerate intermediate states containing conflicts and cycles, each operation applied to the file system must ensure that the tree remains acyclic and free of name conflicts.

We ended up converging on the following approach: We maintain a set of the internal identifiers of all files we have inserted, moved, or removed in the course of applying a batch of operations. We then sort insertions and moves by the depth of the inserted path in the new tree and sort deletions last. By performing shallower insertions first, we ensure that the parent of any directory we are trying to insert always exists.

By performing shallower moves first, we ensure that we don't accidentally create cycles while rearranging directories. We don't have a formal proof, and we may need more empirical verification to be completely confident, but the intuition is as follows: A cycle can only be created by moving a directory downward to become one of its own descendants. Because we've broken cycles in the new tree, we should never encounter a situation in which a directory has been moved to become its own descendant in the final state of the new tree. A combination of moves could end up creating a cycle momentarily, but this cycle could only be created by moving a directory deeper in the tree. If we perform upward moves first, by the time we would be attempting to move a directory into one of its own descendants, we should have already moved that descendant to an equal or shallower depth. At least that's our intuition, and evidence so far is that it works.

Finally, we need to deal with temporary name conflicts that can occur when directories are shuffled around. We've opted to take an extremely simple approach. When performing a move on the file system would create a name conflict, we append tildes until we find a free name and record the fact that we have done so. When all operations have been applied, we go back and clean up, renaming directories with appended tildes back to their desired names. At this point, all of the conflicts should be resolved, and so we can do this without risk of conflict.

Dealing with concurrency

The above approach worked in randomized trials at the end of last week, but we knew we were only solving part of the problem. Our initial implementation assumed that we were the only process writing to the file system. In reality, the file system can change out from under us at any time, meaning that we could be attempting to update the file system based on an outdated understanding of the file system's state.

To deal with this, before we integrate a batch of operations into our tree, we clone the tree's current state and as the old_tree. This represents our best guess as to the current state of the underlying file system. We then update the new tree, resolve conflicts, and start writing. For each file we need to update, we use the old_tree to determine the current location of the relevant directories on disk. Assuming a directory still exists at the path in question, we compare inode numbers to ensure it has the proper identity. Assuming our understanding of all the relevant paths is up to date, we can proceed with the file system write and update the old_tree accordingly.

If anything goes wrong, such as the path not existing, the path's inode not matching, or the write operation returning some kind of error, we need to pause the entire process and update our understanding of the old tree via a file system scan. As we integrate changes to the old tree, we produce operations which need to be applied to the new tree. Moves, deletions or conflict resolutions could end up changing the nature of operations we have still yet to write, requiring us to refresh and re-sort our pending writes after the old tree is updated.

At the time of writing, we have yet to achieve convergence in the presence of full concurrency with the underlying file system, but it seems like we are getting close. Hopefully we'll get there by the end of this week.

Q3 Demo

Our focus during Q2 has been figuring out how to achieve optimistic replication on the entire file system as well as persistence of all operations, and we've nearly done it. Once we achieve this abstraction, we plan to shift our focus to showcasing its capabilities in a new demo.

We're still not clear on the details, but the basic idea is that you should be able to open a repository in Xray, then open a "streams" panel to view the latest state of all other working copies of that repository from other developers working in Xray, whether or not they are currently online. If a stream is being actively edited, you'll be able to collaborate. If that stream's author is offline, you'll be able to pick up where they left off. You'll also be able to fork a stream, though we probably won't finish merging before the end of the quarter.

We feel confident we can achieve that basic experience, but if we have time, we'd like to restore the conversation panel now that we will be able to persist anchors over the lifetime of a repository. We'd also like to find other ways to show off our operation-level history, such as the ability to play back operations.

We still plan for Eon (or whatever we end up calling it; I'm not sure if I like the name) to be a standalone tool that can integrate with other editors. But we need to drive its development with a real product experience, and the best way to do that is by producing a working demo.

Vacation

Antonio is on vacation this week and next week, and I'll also be out next week to spend some quality time with my family. Due to this, expect a 2-week communication gap. We'll come back recharged to slash through randomized test failures and produce a demo of a whole new approach to collaboration. Thanks for reading!