A Prolly Tree is a hybrid data structure that combines the features of B-trees and Merkle trees to provide both efficient data access and verifiable integrity. It is especially useful in distributed systems and peer-to-peer networks, allowing fast diffs and structural sharing.
Put differently, a prolly tree is like Merkle Tree built by recursively applying Content-Defined Chunking, such as with a rolling hash function.
Key properties
- Searchable - They can store ordered data and perform lookups based on ordered keys or offsets.
- History independence - Any data set has a unique representation, regardless of the sequence of operations that led to the current state.
- Self-balancing - The tree is probabilistically balanced.
- Structural sharing - Storing multiple versions of the data only requires additional storage proportional to their diff: duplicate parts of the data are deduplicated.
- Efficient diffing - It’s possible to compare two prolly trees (or two snapshots of the same prolly tree) and identify only the parts of the data that are different between the two trees. The time complexity is logarithmic on the total data size, and is only linear on the size of the diff.
- Efficient mutation - Applying changes to the data stored in a prolly tree produces a new prolly tree. Like with diffing, the time complexity is logarithmic on the total data size, and is only linear on the size of the diff.
Implementations
Go
- https://github.com/dolthub/dolt
- https://github.com/attic-labs/noms
- https://github.com/RangerMauve/ipld-prolly-indexer/
- https://github.com/ABresting/Prolly-Tree-Waku-Message
JavaScript
Zig
TypeScript
- https://github.com/tabcat/prollipop
- https://github.com/canvasxyz/okra-js
- https://github.com/dialog-db/dialog-db/
Rust
Tools
- KenCloud-Tech/prolly-tree-oracle - Enable smart contracts to create/read/update/delete to prolly trees.
Learning Resources
- Prolly Trees - Introduction to prolly-trees.
- Merklizing the key/value store for fun and profit - Content-defined merkle trees:
- Peer-to-peer ordered search indexes - Peer-to-Peer Ordered Search Indexes and how Prolly Trees are used to implement them with comparison to Merkle Search Trees.
- Efficient Diff on Prolly-trees - Visual examples for prolly-tree diffs.
- Probabilistic Merkle B-Trees in Noms - One of the first design documents on prolly-trees from Noms.
- Probabilistic Merkle B-Trees in Noms (slides) - Slides for the prolly-trees from Noms.
- Prolly Tree PoC and technical writeup - Custom implementation of prolly-tree with right side backbone.
- Only Consider Keys - Only consider keys for chunk boundary, not keys + values.
- Range-Based Set Reconciliation - Introduction to Range-Based Set Reconciliation with a mention of how it relates to Prolly Trees (Negantrophy section uses
[number, hash]tuples as keys). - IPLD Prolly Tree Analysis - Analysis of IPLD Prolly Trees.
- Some notes and questions on Prolly Trees
- Prolly Trees - An overview of Prolly Trees and how they compare to Merkle Search Trees.
- Lab note #027 Breadcrumb to Prolly Trees - A look at how Prolly Trees are used in Breadcrumb.
- People Keep Inventing Prolly Trees
Credits
- Aaron Boodman - Inventor of the Prolly Tree.
- Mikeal Rogers - Prolly Tree implementation in JavaScript.
- tabcat - Creator of prollipop and originally compiled this list.
- Tim Sehn - Prolly Tree implementation in Dolt.
- Aaron Son - Prolly Tree implementation in Dolt.
- Mauve Signweaver - IPLD Prolly Tree Analysis.
- Joel Gustafson - Zig implementation of Prolly Trees and author of blog posts.
- Wil Chung - The Interjected Future Blog explaining many related concepts.
- ABresting - Go implementation of prolly-tree with right side backbone.
- Doug Hoyte - Author of blog post on range-based set reconciliation.
- jchris - Contributor to js prolly-tree implementation.