From 5e375fb9df4de83c71bc873b2617a730aeb889d6 Mon Sep 17 00:00:00 2001 From: Doug Hoyte Date: Thu, 5 Sep 2024 15:55:27 -0400 Subject: [PATCH] readme --- README.md | 226 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 179 insertions(+), 47 deletions(-) diff --git a/README.md b/README.md index 0f50aa0..c8fce58 100644 --- a/README.md +++ b/README.md @@ -4,32 +4,67 @@ strfry is a relay for the [nostr protocol](https://github.com/nostr-protocol/nostr) -* Supports most applicable NIPs: 1, 2, 4, 9, 11, 12, 15, 16, 20, 22, 28, 33, 40 +* Supports most applicable NIPs: 1, 2, 4, 9, 11, 22, 28, 40, 70 * No external database required: All data is stored locally on the filesystem in LMDB * Hot reloading of config file: No server restart needed for many config param changes * Zero downtime restarts, for upgrading binary without impacting users -* Websocket compression: permessage-deflate with optional sliding window, when supported by clients +* Websocket compression using permessage-deflate with optional sliding window, when supported by clients. Optional on-disk compression using zstd dictionaries. * Built-in support for real-time streaming (up/down/both) events from remote relays, and bulk import/export of events from/to jsonl files -* [negentropy](https://github.com/hoytech/negentropy)-based set reconcilliation for efficient syncing with remote relays +* [negentropy](https://github.com/hoytech/negentropy)-based set reconcilliation for efficient syncing with clients or between relays, accurate counting of events between relays, and more If you are using strfry, please [join our telegram chat](https://t.me/strfry_users). Hopefully soon we'll migrate this to nostr. +
-## Syncing + + + -The most original feature of strfry is a set reconcillation protocol based on [negentropy](https://github.com/hoytech/negentropy). This is implemented over a [nostr protocol extension](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) that allows two parties to synchronise their sets of stored messages with minimal bandwidth overhead. Although primarily designed for relay-to-relay communication, this can also be used by clients. +* [Setup](#setup) + * [Compile](#compile) + * [Linux](#linux) + * [FreeBSD](#freebsd) +* [Operating](#operating) + * [Running a relay](#running-a-relay) + * [Importing data](#importing-data) + * [Exporting data](#exporting-data) + * [Fried Exports](#fried-exports) + * [Stream](#stream) + * [Sync](#sync) +* [Advanced](#advanced) + * [DB Upgrade](#db-upgrade) + * [Zero Downtime Restarts](#zero-downtime-restarts) + * [Plugins](#plugins) + * [Router](#router) + * [Syncing](#syncing) + * [Compression Dictionaries](#compression-dictionaries) +* [Architecture](#architecture) + * [Database](#database) + * [Threads and Inboxes](#threads-and-inboxes) + * [Websocket](#websocket) + * [Compression](#compression) + * [Ingester](#ingester) + * [Writer](#writer) + * [ReqWorker](#reqworker) + * [Filters](#filters) + * [DBScan](#dbscan) + * [ReqMonitor](#reqmonitor) + * [ActiveMonitors](#activemonitors) + * [Negentropy](#negentropy) + * [Cron](#cron) +* [Testing](#testing) + * [Fuzz tests](#fuzz-tests) +* [Author and Copyright](#author-and-copyright) -Either the full set of messages in the DB can be synced, or the results of one or more nostr filter expressions. If the two parties to the sync share common subsets of identical events, then there will be significant bandwidth savings compared to downloading the full set. + - - -## Usage +## Setup ### Compile -A C++20 compiler is required, along with a few other common dependencies. On Debian/Ubuntu use these commands: +A C++20 compiler is required, along with a few other common dependencies. -#### Linux +On Debian/Ubuntu use these commands: sudo apt install -y git g++ make libssl-dev zlib1g-dev liblmdb-dev libflatbuffers-dev libsecp256k1-dev libzstd-dev git clone https://github.com/hoytech/strfry && cd strfry/ @@ -37,7 +72,7 @@ A C++20 compiler is required, along with a few other common dependencies. On Deb make setup-golpe make -j4 -#### FreeBSD +FreeBSD has slightly different commands (warning, possibly out of date): pkg install -y gcc gmake cmake git perl5 openssl lmdb flatbuffers libuv libinotify zstr secp256k1 zlib-ng git clone https://github.com/hoytech/strfry && cd strfry/ @@ -45,6 +80,14 @@ A C++20 compiler is required, along with a few other common dependencies. On Deb gmake setup-golpe gmake -j4 +To upgrade strfry, do the following: + + git pull + make update-submodules + make -j4 + + +## Operating ### Running a relay @@ -54,7 +97,7 @@ Here is how to run the relay: For dev/testing, the config file `./strfry.conf` is used by default. It stores data in the `./strfry-db/` directory. -In production, you'll probably want a systemd unit file and a reverse proxy such as nginx (details coming soon). +By default, it listens on port 7777 and only accepts connections from localhost. In production, you'll probably want a systemd unit file and a reverse proxy such as nginx to support SSL and other features. ### Importing data @@ -70,41 +113,12 @@ The `strfry export` command will print events from the DB to standard output in Optionally, you can limit the time period exported with the `--since` and `--until` flags. +#### Fried Exports -### DB Upgrade +If you pass the `--fried` argument to `strfry export`, then the outputed JSON lines will include `fried` elements. This is precomputed data that strfry can use to re-import these events more quickly. To take advantage of this, use the `--fried` flag on import as well. -In the past, incompatible changes have been made to the DB format. If you try to use a `strfry` binary with an incompatible DB version, an error will be thrown. Only the `strfry export` command will work. +This can be especially useful for upgrading strfry to a new, incompatible database version. See the [fried exports](https://github.com/hoytech/strfry/blob/master/docs/fried.md) documentation for more details on the format. -In order to upgrade the DB, you should export and then import again: - - ./strfry export > dbdump.jsonl - mv strfry-db/data.mdb data.mdb.bak - ./strfry import < dbdump.jsonl - -After you have confirmed everything is working OK, the `dbdump.jsonl` and `data.mdb.bak` files can be deleted. - - -### Zero Downtime Restarts - -strfry can have multiple different running instances simultaneously listening on the same port, because it uses the `REUSE_PORT` linux socket option. One of the reasons you may want to do this is to restart the relay without impacting currently connected users. This allows you to upgrade the strfry binary, or perform major configuration changes (for the subset of config options that require a restart). - -If you send a `SIGUSR1` signal to a strfry process, it will initiate a "graceful shutdown". This means that it will no longer accept new websocket connections, and after its last existing websocket connection is closed, it will exit. - -So, the typical flow for a zero downtime restart is: - -* Record the PID of the currently running strfry instance. - -* Start a new relay process using the same configuration as the currently running instance: - - strfry relay - - At this point, both instances will be accepting new connections. - -* Initiate the graceful shutdown: - - kill -USR1 $OLD_PID - - Now only the new strfry instance will be accepting connections. The old one will exit once all its connections have been closed. ### Stream @@ -125,6 +139,8 @@ Both of these operations can be concurrently multiplexed over the same websocket `strfry stream` will compress messages with permessage-deflate in both directions, if supported by the server. Sliding window compression is not supported for now. +If you want to open many concurrent streams, see the [strfry router] command for an easier and more efficient approach. + ### Sync @@ -150,6 +166,122 @@ Instead of a "full DB" sync, you can also sync the result of a nostr filter (or Warning: Syncing can consume a lot of memory and bandwidth if the DBs are highly divergent (for example if your local DB is empty and your filter matches many events). +By default strfry keeps a precomputed BTree to speed up full-DB syncs. You can also cache BTrees for arbitrary filters, see the [syncing](#syncing) section for more details. + + + +## Advanced + +### DB Upgrade + +In the past, incompatible changes have been made to the DB format. If you try to use a `strfry` binary with an incompatible DB version, an error will be thrown. Only the `strfry export` command will work. + +In order to upgrade the DB, you should export and then import again using [fried exports](#fried-exports): + + ./strfry export --fried > dbdump.jsonl + mv strfry-db/data.mdb data.mdb.bak + ./strfry import --fried < dbdump.jsonl + +After you have confirmed everything is working OK, the `dbdump.jsonl` and `data.mdb.bak` files can be deleted. + +The `strfry compact` command creates a raw dump of the LMDB file (after compaction) so it cannot be used for DB upgrade purposes. It can however be useful for reclaiming space or for a migration of a DB to a new server running the same version of strfry. + + +### Zero Downtime Restarts + +strfry can have multiple different running instances simultaneously listening on the same port, because it uses the `REUSE_PORT` linux socket option. One of the reasons you may want to do this is to restart the relay without impacting currently connected users. This allows you to upgrade the strfry binary, or perform major configuration changes (for the subset of config options that require a restart). + +If you send a `SIGUSR1` signal to a strfry process, it will initiate a "graceful shutdown". This means that it will no longer accept new websocket connections, and after its last existing websocket connection is closed, it will exit. + +So, the typical flow for a zero downtime restart is: + +* Record the PID of the currently running strfry instance. + +* Start a new relay process using the same configuration as the currently running instance: + + strfry relay + + At this point, both instances will be accepting new connections. + +* Initiate the graceful shutdown: + + kill -USR1 $OLD_PID + + Now only the new strfry instance will be accepting connections. The old one will exit once all its connections have been closed. + + +### Plugins + +When hosting a relay, you may not want to accept certain events. To avoid having to encode that logic into strfry itself, we have a plugin system. Any programming language can be used to build a plugin, using a simple line-based JSON interface. + +In addition to write-policy plugins, plugins can also be used inside [strfry router](#router) to determine which events to stream up/down to other relays. + +See the [plugin documentation](https://github.com/hoytech/strfry/blob/master/docs/plugins.md) for details and examples. + + + + +### Router + +If you are building a complicated "mesh" topology of routers, or mirroing events to neighbour relays (up and/or down), you can use [strfry stream](#stream) to stream the events as the come in. However, when handling multiple streams, the efficiency and convenience of this can be improved with `strfry router`. + +`strfry router` handles many streams in one process, supports pre-filtering events using nostr filters and/or [plugins](#plugins), and more. See the [router documentation](https://github.com/hoytech/strfry/blob/master/docs/router.md) for more details. + + + +### Syncing + +The most original feature of strfry is a set reconcillation protocol based on [negentropy](https://github.com/hoytech/negentropy). This is implemented over a [nostr protocol extension](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) that allows two parties to synchronise their sets of stored messages with minimal bandwidth overhead. Negentropy can be used by both clients and relays. + +The results of arbitrary nostr filter expressions can be synced. Relays can maintain BTree data-structures for pre-configured filters, improving the efficiency of commonly synced queries (such as the full DB). Whenever two parties to the sync share common subsets of identical events, then there will be significant bandwidth savings compared to downloading the full set. In addition to syncing, negentropy can also be used to compute accurate event counts for a filter across multiple relays, without having to download the entire filter results from each relay. + +The `strfry negentropy` command can be used to manage the pre-configured queries to sync. + +`negentropy list` will list the current BTrees. Here we see we have one filter, `{}` which matches the full DB: + + $ strfry negentropy list + tree 1 + filter: {} + size: 483057 + fingerprint: 9faaf0be1c25c1b4ee7e65f18cf4b352 + +This filter will be useful for full-DB syncs, and for syncs that use only `since`/`until`. + +To add a new filter, use `negentropy add`. For example: + + $ strfry negentropy add '{"kinds":[0]}' + created tree 2 + to populate, run: strfry negentropy build 2 + +Note that the tree starts empty. To populate it, use the `negentropy build` command with the newly created tree ID: + + $ strfry negentropy build 2 + $ strfry negentropy list + tree 1 + filter: {} + size: 483057 + fingerprint: 9faaf0be1c25c1b4ee7e65f18cf4b352 + tree 2 + filter: {"kinds":[0]} + size: 33245 + fingerprint: 37c005e6a1ded72df4b9d4aa688689db + +Now, negentropy queries for kind 0 (optionally including `since`/`until`) can be performed efficiently and statelessly. + + + +### Compression Dictionaries + +Although nostr events are compressed during transfer using websocket compression, they are stored uncompressed on disk by default. In order to attempt to reduce the size of the strfry DB, the `strfry dict` command can be used to compress these events while still allowing them to be efficiently served via a relay. Only the raw relay JSON itself is compressed: The indices needed for efficient retrieval are not. Since the indices are often quite large, the relative effectiveness of this compression depends on the type of nostr events stored. + +`strfry dict` uses [zstd dictionaries](https://facebook.github.io/zstd/#small-data) to compress events. First you must build one or more dictionaries with `strfry dict train`. You can provide this command a nostr filter and it will select just these events. You may want to use custom dictionaries for certain kinds of events, or segment based on some other criteria. If desired, dictionary training can happen entirely offline without interfering with relay operation. + +After building dictionaries, selections of events can be compressed with `strfry dict compress` (events also selected with nostr filters). These events will be compressed with the indicated dictionary, but will still be served by the relay. Use the compress command again to re-compress with a different dictionary, or use `dict decompress` to return it to its uncompressed state. + +`strfry dict stats` can be used to print out stats for the various dictionaries, including size used by the dataset, compression ratios, etc. + + + ## Architecture @@ -291,7 +423,7 @@ After an event has been processed, all the matching connections and subscription These threads implements the provider-side of the [negentropy syncing protocol](https://github.com/hoytech/negentropy). -When [NEG-OPEN](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) requests are received, these threads perform DB queries in the same way as [ReqWorker](#ReqWorker) threads do. However, instead of sending the results back to the client, the IDs of the matching events are kept in memory, so they can be queried with future `NEG-MSG` queries. +When [NEG-OPEN](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) requests are received, these threads perform DB queries in the same way as [ReqWorker](#reqworker) threads do. However, instead of sending the results back to the client, the IDs of the matching events are kept in memory, so they can be queried with future `NEG-MSG` queries. Alternatively, if the query can be serviced with a [pre-computed negentropy BTree](#syncing), this is used instead and the query becomes stateless. @@ -320,6 +452,6 @@ Both of these tests have run for several hours with no observed failures. ## Author and Copyright -strfry © 2023 Doug Hoyte. +strfry © 2023-2024 Doug Hoyte. GPLv3 license. See the LICENSE file.