![]() Without the Synced Tree, we wouldn’t be able to distinguish this scenario from if the user had actually deleted /foo/fum locally. A merge base allows us to derive the direction of a change, answering the question: “did the user edit the file locally or was it edited on ?” In the graphic above, we can derive that the file at path /foo/fum was added remotely because the Local Tree (on disk) matches the merge base expressed by the Synced Tree. If you’re familiar with version control, you can think of each node in the Synced Tree as a merge base. The Synced Tree is the key innovation that lets us unambiguously derive the correct sync result. The Synced Tree expresses the last known “fully synced” state between the Remote Tree and Local Tree.The Local Tree is the last observed state of the user’s Dropbox on disk.The Remote Tree is the latest state of the user’s Dropbox in the cloud.Instead of representing the outstanding sync activity directly, it maintains just three trees, each of which represents an individually-consistent filesystem state, from which the right sync behavior can be derived: By comparison, Nucleus persists observations. For example, it stores whether a given file needs to be created locally or if it needs to be uploaded to the server. The legacy system persists the outstanding work required to sync each file or folder to disk. Sync Engine Classic and Nucleus have fundamentally distinct data models. ![]() The additional strictness affords a new, testable invariant: in the database no file or folder can exist (even transiently) without a parent directory. The persisted data model and higher-level components need not consider such a possibility. In Nucleus, the protocol prevents us from getting into this state to begin with! In fact, one of our core architectural principles is “Design away invalid system states.” (In a future post, we’ll discuss how we also leverage Rust’s type system to support this principle.) In the case of the stray cat, we report a critical error at the protocol level before the client can enter such a state. This in turn made it impossible to distinguish many types of serious inconsistencies (for example, orphaned files) from a client merely being in some acceptable transient state. Correspondingly, the client’s local database (SQLite) needed to represent this orphaned state, and any component that processed filesystem metadata needed to support it. For example, a client could receive metadata from the server about a file at /baz/cat before receiving its parent directory at /baz. Sync Engine Classic’s client-server protocol often resulted in a set of possible sync states far too permissive for us to be able to test effectively. ![]() Dropbox has evolved quite a bit in the 12+ years since we designed the original system, and the requirements have changed greatly. Why wasn’t the old system testable? What made it so hard to avoid regressions and maintain correctness in that system? And what did we learn that informed the architecture of our new system? Protocol and data modelįor one, the server-client protocol and data model of Sync Engine Classic were designed for a simpler time and a simpler product, before Dropbox had sharing, before Dropbox had comments and annotations, and before Dropbox was used by thousand-person enterprise teams. We can look to Sync Engine Classic, the legacy system, for insights. When we embarked on the rewrite, one thing was clear: to have a robust testing strategy, the new system would have to be testable! Emphasizing testability early, even before implementing the associated testing frameworks, was critical to ensuring that our architecture was informed appropriately. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |