This commit is contained in:
Doug Hoyte
2023-03-08 12:43:47 -05:00
parent bb3ca7c709
commit 050e590c1a

View File

@ -9,9 +9,6 @@ If both sides of the sync have common events, then this protocol will use less b
We'll call the two sides engaged in the sync the client and the relay (even though the initiating party may be another relay, not a regular client).
1. Client selects the parameters of a reconcilliation query and sends it to the relay:
* A subscription ID that will be used by each side to identify which query a message refers to
* A nostr filter, as described in [NIP-01](https://github.com/nostr-protocol/nips/blob/master/01.md), or an event ID whose `content` contains the JSON-encoded filter
* The truncation byte size for IDs, called `idSize. An integer between 8 and 32, inclusive.
1. Both sides collect all the events they have stored/cached that match this filter.
1. Each side sorts the events by their `created_at` timestamp. If the timestamps are equivalent, events are sorted lexically by their `id` (the canonicalised hash of the event).
1. Client creates an initial reconcilliation message and sends it to relay.
@ -31,6 +28,11 @@ We'll call the two sides engaged in the sync the client and the relay (even thou
]
```
* The subscription ID is used by each side to identify which query a message refers to. It only needs to be long enough to distinguish it from an other concurrent XOR requests on this websocket connection (an integer that increases once per `XOR-OPEN` is fine)
* The nostr filter is as described in [NIP-01](https://github.com/nostr-protocol/nips/blob/master/01.md), or an event ID whose `content` contains the JSON-encoded filter, or an array of filters
* `idSize` indicates the truncation byte size for IDs, and should be an integer between 8 and 32, inclusive.
* `initialMessage` is described below.
### Error message (relay to client):
If a request cannot be serviced, an error is returned (relay to client):
@ -47,6 +49,7 @@ Current reason codes are:
* `RESULTS_TOO_BIG`
* `FILTER_NOT_FOUND`
* `EXPIRED`
### Subsequent messages (bi-drectional):
@ -55,12 +58,12 @@ Current reason codes are:
"XOR-MSG",
<subscription ID string>,
<message, lowercase hex-encoded>,
<have IDs, lowercase hex-encoded>,
<need IDs, lowercase hex-encoded>
<have IDs, lowercase hex-encoded - optional>,
<need IDs, lowercase hex-encoded - optional>
]
```
The have and need ID fields are event IDs truncated to `idSize`, concatenated together, then hex encoded. If there are `n` IDs, then the length after hex encoding is `n * idSize * 2`.
The have and need ID fields are event IDs truncated to `idSize`, concatenated together. If there are `n` IDs in a field, then its length after hex encoding is `n * idSize * 2`. If one or both of these fields are omitted, they are assumed to be the empty string.
### Close message (client to relay):
@ -88,9 +91,13 @@ Each bound is specified by a timestamp (see below), and a disambiguating prefix
Bound := <encodedTimestamp (Varint)> <length (Varint)> <id (Byte)>*
The timestamp is encoded specially. The value `0` is reserved for the maximum allowable timestamp (such that all valid events preceed it). All other values are `1 + offset`, where offset is the difference between this timestamp and the previously encoded timestamp. The initial offset starts at 0 and resets at the beginning of each message.
* The timestamp is encoded specially. The value `0` is reserved for the "infinity timestamp" (such that all valid events preceed it). All other values are encoded as `1 + offset`, where offset is the difference between this timestamp and the previously encoded timestamp. The initial offset starts at 0 and resets at the beginning of each message.
Offsets are always positive numbers since the upper bound's timestamp is always `>=` to the lower bound's, ranges in a message are always encoded in ascending order, and ranges never overlap.
Offsets are always non-negative since the upper bound's timestamp is always `>=` to the lower bound's timestamp, ranges in a message are always encoded in ascending order, and ranges never overlap.
* The `id` parameter's size is encoded in `length`, and can be between `0` and `idSize` bytes. Efficient implementations will use the shortest possible length to separate the first element of this range from the last element of the previous range. If these elements' timestamps differ, then the length will be 0, otherwise it will be the byte-length of their common prefix plus 1.
If the `id` length is less than `idSize` then the unspecified trailing bytes are filled with 0 bytes.
### Range
@ -135,9 +142,15 @@ Upon receipt of an XOR range, the receiver should compute and compare the XOR of
However, if a receiver's set of events within the range is large, it should instead "split" the range into multiple sub-ranges, each of which gets appended onto the next outgoing message. These sub-ranges must totally cover the original range, meaning that the start of the first sub-range must match the original range's start (*not* the first matching element in its set). Likewise for the end of the range. Also, each sub-range's lower bound must be equal to the previous range's upper bound. The overhead of this duplication is mostly erased by the timestamp offset encoding described above.
How to split the range is implementation-dependent. The simplest way is to divide the elements that fall within the range into N equal-sized buckets, and emit an XOR-mode sub-range for each bucket. However, an implementation could choose different grouping criteria. For example, events with similar timestamps could be grouped into a single bucket. If the implementation believes the other side is less likely to have recent events, it could make the most recent bucket a base-case-mode.
How to split the range is implementation-defined. The simplest way is to divide the elements that fall within the range into N equal-sized buckets, and emit an XOR-mode sub-range for each bucket. However, an implementation could choose different grouping criteria. For example, events with similar timestamps could be grouped into a single bucket. If the implementation believes the other side is less likely to have recent events, it could make the most recent bucket a base-case-mode.
Note that if alternate grouping strategies are used, an implementation should never reply to an XOR-range with another single XOR-range, otherwise the protocol may never terminate.
Note that if alternate grouping strategies are used, an implementation should never reply to a range with another single XOR-range, otherwise the protocol may never terminate (if the other side does the same).
## Initial Message
The initial message should have at least one range, and the first range's lower bound should have timestamp 0 and an empty (all zeros) `id`. The last range's upper bound should have the infinity timestamp (and the `id` doesn't matter, so should be empty also).
How many ranges are used depends on the implementation. The most obvious implementation is to use the same logic as described above, either using the base case or splitting, depending on set size. However, an implementation may choose to use fewer or more buckets in its initial message, and may use different grouping strategies.
### Termination
@ -145,10 +158,52 @@ If a receiver has looped over all incoming ranges and has added no new ranges to
## Analysis
If you are searching for a single element in an ordered array, binary search allows you to find the element with a logarithmic number of operations. This is because each operation cuts the search space in half. So, searching a list of 1 million items will take about 20 operations:
log(1e6)/log(2) = 19.9316
For effective performance, this protocol requires minimising the number of "round-trips" between the two sides. A sync that takes 20 back-and-forth communications to determine a single difference would take unacceptably long. Fortunately we can expend a small amount of extra bandwidth by splitting our ranges in more than 2 ranges. This has the effect of increasing the base of the logarithm. For example, if we split it into 16 pieces instead:
log(1e6)/log(16) = 4.98289
Additionally, each direction of the protocol's communication can result in a split, so since we are measuring round-trips, we divide this by two:
log(1e6)/log(16)/2 = 2.49145
This means that in general, three round-trip communications will be required to synchronise two sets of 1 million elements that differ by 1 element. With an `idSize` of 16, each communication will take `16*16 + overhead` bytes -- roughly 300. So total bandwidth in one direction would be about 900 bytes and the other direction about 600 bytes.
What if they differ by multiple elements? Because communication is batched, the splitting of multiple differing ranges can happen in parallel. So, the number of round-trips will not be affected (assuming that every message can be delivered in exactly one packet transmission, independent of size, which is of course not entirely true on real networks).
The amount of bandwidth consumed will grow linearly with the number of differences, but this would of course be true even assuming a perfect synchronisation method that had no overhead other than transmitting the differing elements.
## Extensions
Don't need to push all your ranges right away if message is getting too big
## Implementation Enhancements
Don't need to send need or have IDs if you're not interested in syncing in that direction
### Deferred Range Processing
If there are too many differences and/or they are too randomly distributed throughout the range, then message sizes may become unmanageably large. This may be undesirable because of the memory required for buffering, and also because limiting batch sizes allows work pipelining, where the synchronised elements can be processed while additional syncing is occurring.
Because of this, an implementation may choose to defer the processing of ranges. Rather than transmit all the ranges it has detected need syncing, it can transmit a smaller number and keep the remaining for subsequent message rounds. This will decrease the message size at the expense of increasing the number of messaging round-trips.
An implementation could target fixed size messages, or could dynamically tune the message sizes based on its throughput metrics.
### Unidirection Syncing
The protocol above describes bi-directional syncing. This means at at the end of a sync, each side knows exactly which elements it has that the other doesn't, and what elements the other has that it doesn't. Because it (obviously) knows what elements it has itself, this implies it knows the precise set of the other side's elements too.
In many cases this is not necessary. For example, in most cases a nostr client will be the one coordinating the sync. After it has determined which elements it has or needs, it will either download or upload the elements according to how it is configured.
FIXME: expand on this. need protocol addition?
## Internal Representation
While a general-purpose relay implementation must be able to run a query on its DB and store a temporary data-structure of matching events for as long as the `XOR-OPEN` subscription is active, some optimisations are possible in certain cases.
In particular, a relay might like to support full-DB syncing. This is where another relay is configured to replicate another relay, and so it should ensure it always has exactly the same data-set. For this, a match-all filter of `{}` would be used.
In this case, or in the case of filters that contain only `until` and/or `since`, a relay may choose to maintain a persistent data-structure that reflects all stored events. This data-structure can be stored as a pre-computed tree, meaning that getting the XOR of the full data-set or a range of the data-set could be done very efficiently.
A production version of this should implement the tree as a copy-on-write data-structure so that inexpensive historical snapshots can be taken and stored alongside the `XOR-OPEN` subscription. Otherwise, any writes to the DB would corrupt the currently running protocol.