This commit is contained in:
Doug Hoyte
2024-09-05 15:55:27 -04:00
parent 052c56e62f
commit 5e375fb9df

226
README.md
View File

@ -4,32 +4,67 @@
strfry is a relay for the [nostr protocol](https://github.com/nostr-protocol/nostr)
* Supports most applicable NIPs: 1, 2, 4, 9, 11, 12, 15, 16, 20, 22, 28, 33, 40
* Supports most applicable NIPs: 1, 2, 4, 9, 11, 22, 28, 40, 70
* No external database required: All data is stored locally on the filesystem in LMDB
* Hot reloading of config file: No server restart needed for many config param changes
* Zero downtime restarts, for upgrading binary without impacting users
* Websocket compression: permessage-deflate with optional sliding window, when supported by clients
* Websocket compression using permessage-deflate with optional sliding window, when supported by clients. Optional on-disk compression using zstd dictionaries.
* Built-in support for real-time streaming (up/down/both) events from remote relays, and bulk import/export of events from/to jsonl files
* [negentropy](https://github.com/hoytech/negentropy)-based set reconcilliation for efficient syncing with remote relays
* [negentropy](https://github.com/hoytech/negentropy)-based set reconcilliation for efficient syncing with clients or between relays, accurate counting of events between relays, and more
If you are using strfry, please [join our telegram chat](https://t.me/strfry_users). Hopefully soon we'll migrate this to nostr.
<hr>
## Syncing
<!-- TOC FOLLOWS -->
<!-- START OF TOC -->
<!-- DO NOT EDIT! Auto-generated by md-toc: https://github.com/hoytech/md-toc -->
The most original feature of strfry is a set reconcillation protocol based on [negentropy](https://github.com/hoytech/negentropy). This is implemented over a [nostr protocol extension](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) that allows two parties to synchronise their sets of stored messages with minimal bandwidth overhead. Although primarily designed for relay-to-relay communication, this can also be used by clients.
* [Setup](#setup)
* [Compile](#compile)
* [Linux](#linux)
* [FreeBSD](#freebsd)
* [Operating](#operating)
* [Running a relay](#running-a-relay)
* [Importing data](#importing-data)
* [Exporting data](#exporting-data)
* [Fried Exports](#fried-exports)
* [Stream](#stream)
* [Sync](#sync)
* [Advanced](#advanced)
* [DB Upgrade](#db-upgrade)
* [Zero Downtime Restarts](#zero-downtime-restarts)
* [Plugins](#plugins)
* [Router](#router)
* [Syncing](#syncing)
* [Compression Dictionaries](#compression-dictionaries)
* [Architecture](#architecture)
* [Database](#database)
* [Threads and Inboxes](#threads-and-inboxes)
* [Websocket](#websocket)
* [Compression](#compression)
* [Ingester](#ingester)
* [Writer](#writer)
* [ReqWorker](#reqworker)
* [Filters](#filters)
* [DBScan](#dbscan)
* [ReqMonitor](#reqmonitor)
* [ActiveMonitors](#activemonitors)
* [Negentropy](#negentropy)
* [Cron](#cron)
* [Testing](#testing)
* [Fuzz tests](#fuzz-tests)
* [Author and Copyright](#author-and-copyright)
Either the full set of messages in the DB can be synced, or the results of one or more nostr filter expressions. If the two parties to the sync share common subsets of identical events, then there will be significant bandwidth savings compared to downloading the full set.
<!-- END OF TOC -->
## Usage
## Setup
### Compile
A C++20 compiler is required, along with a few other common dependencies. On Debian/Ubuntu use these commands:
A C++20 compiler is required, along with a few other common dependencies.
#### Linux
On Debian/Ubuntu use these commands:
sudo apt install -y git g++ make libssl-dev zlib1g-dev liblmdb-dev libflatbuffers-dev libsecp256k1-dev libzstd-dev
git clone https://github.com/hoytech/strfry && cd strfry/
@ -37,7 +72,7 @@ A C++20 compiler is required, along with a few other common dependencies. On Deb
make setup-golpe
make -j4
#### FreeBSD
FreeBSD has slightly different commands (warning, possibly out of date):
pkg install -y gcc gmake cmake git perl5 openssl lmdb flatbuffers libuv libinotify zstr secp256k1 zlib-ng
git clone https://github.com/hoytech/strfry && cd strfry/
@ -45,6 +80,14 @@ A C++20 compiler is required, along with a few other common dependencies. On Deb
gmake setup-golpe
gmake -j4
To upgrade strfry, do the following:
git pull
make update-submodules
make -j4
## Operating
### Running a relay
@ -54,7 +97,7 @@ Here is how to run the relay:
For dev/testing, the config file `./strfry.conf` is used by default. It stores data in the `./strfry-db/` directory.
In production, you'll probably want a systemd unit file and a reverse proxy such as nginx (details coming soon).
By default, it listens on port 7777 and only accepts connections from localhost. In production, you'll probably want a systemd unit file and a reverse proxy such as nginx to support SSL and other features.
### Importing data
@ -70,41 +113,12 @@ The `strfry export` command will print events from the DB to standard output in
Optionally, you can limit the time period exported with the `--since` and `--until` flags.
#### Fried Exports
### DB Upgrade
If you pass the `--fried` argument to `strfry export`, then the outputed JSON lines will include `fried` elements. This is precomputed data that strfry can use to re-import these events more quickly. To take advantage of this, use the `--fried` flag on import as well.
In the past, incompatible changes have been made to the DB format. If you try to use a `strfry` binary with an incompatible DB version, an error will be thrown. Only the `strfry export` command will work.
This can be especially useful for upgrading strfry to a new, incompatible database version. See the [fried exports](https://github.com/hoytech/strfry/blob/master/docs/fried.md) documentation for more details on the format.
In order to upgrade the DB, you should export and then import again:
./strfry export > dbdump.jsonl
mv strfry-db/data.mdb data.mdb.bak
./strfry import < dbdump.jsonl
After you have confirmed everything is working OK, the `dbdump.jsonl` and `data.mdb.bak` files can be deleted.
### Zero Downtime Restarts
strfry can have multiple different running instances simultaneously listening on the same port, because it uses the `REUSE_PORT` linux socket option. One of the reasons you may want to do this is to restart the relay without impacting currently connected users. This allows you to upgrade the strfry binary, or perform major configuration changes (for the subset of config options that require a restart).
If you send a `SIGUSR1` signal to a strfry process, it will initiate a "graceful shutdown". This means that it will no longer accept new websocket connections, and after its last existing websocket connection is closed, it will exit.
So, the typical flow for a zero downtime restart is:
* Record the PID of the currently running strfry instance.
* Start a new relay process using the same configuration as the currently running instance:
strfry relay
At this point, both instances will be accepting new connections.
* Initiate the graceful shutdown:
kill -USR1 $OLD_PID
Now only the new strfry instance will be accepting connections. The old one will exit once all its connections have been closed.
### Stream
@ -125,6 +139,8 @@ Both of these operations can be concurrently multiplexed over the same websocket
`strfry stream` will compress messages with permessage-deflate in both directions, if supported by the server. Sliding window compression is not supported for now.
If you want to open many concurrent streams, see the [strfry router] command for an easier and more efficient approach.
### Sync
@ -150,6 +166,122 @@ Instead of a "full DB" sync, you can also sync the result of a nostr filter (or
Warning: Syncing can consume a lot of memory and bandwidth if the DBs are highly divergent (for example if your local DB is empty and your filter matches many events).
By default strfry keeps a precomputed BTree to speed up full-DB syncs. You can also cache BTrees for arbitrary filters, see the [syncing](#syncing) section for more details.
## Advanced
### DB Upgrade
In the past, incompatible changes have been made to the DB format. If you try to use a `strfry` binary with an incompatible DB version, an error will be thrown. Only the `strfry export` command will work.
In order to upgrade the DB, you should export and then import again using [fried exports](#fried-exports):
./strfry export --fried > dbdump.jsonl
mv strfry-db/data.mdb data.mdb.bak
./strfry import --fried < dbdump.jsonl
After you have confirmed everything is working OK, the `dbdump.jsonl` and `data.mdb.bak` files can be deleted.
The `strfry compact` command creates a raw dump of the LMDB file (after compaction) so it cannot be used for DB upgrade purposes. It can however be useful for reclaiming space or for a migration of a DB to a new server running the same version of strfry.
### Zero Downtime Restarts
strfry can have multiple different running instances simultaneously listening on the same port, because it uses the `REUSE_PORT` linux socket option. One of the reasons you may want to do this is to restart the relay without impacting currently connected users. This allows you to upgrade the strfry binary, or perform major configuration changes (for the subset of config options that require a restart).
If you send a `SIGUSR1` signal to a strfry process, it will initiate a "graceful shutdown". This means that it will no longer accept new websocket connections, and after its last existing websocket connection is closed, it will exit.
So, the typical flow for a zero downtime restart is:
* Record the PID of the currently running strfry instance.
* Start a new relay process using the same configuration as the currently running instance:
strfry relay
At this point, both instances will be accepting new connections.
* Initiate the graceful shutdown:
kill -USR1 $OLD_PID
Now only the new strfry instance will be accepting connections. The old one will exit once all its connections have been closed.
### Plugins
When hosting a relay, you may not want to accept certain events. To avoid having to encode that logic into strfry itself, we have a plugin system. Any programming language can be used to build a plugin, using a simple line-based JSON interface.
In addition to write-policy plugins, plugins can also be used inside [strfry router](#router) to determine which events to stream up/down to other relays.
See the [plugin documentation](https://github.com/hoytech/strfry/blob/master/docs/plugins.md) for details and examples.
### Router
If you are building a complicated "mesh" topology of routers, or mirroing events to neighbour relays (up and/or down), you can use [strfry stream](#stream) to stream the events as the come in. However, when handling multiple streams, the efficiency and convenience of this can be improved with `strfry router`.
`strfry router` handles many streams in one process, supports pre-filtering events using nostr filters and/or [plugins](#plugins), and more. See the [router documentation](https://github.com/hoytech/strfry/blob/master/docs/router.md) for more details.
### Syncing
The most original feature of strfry is a set reconcillation protocol based on [negentropy](https://github.com/hoytech/negentropy). This is implemented over a [nostr protocol extension](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) that allows two parties to synchronise their sets of stored messages with minimal bandwidth overhead. Negentropy can be used by both clients and relays.
The results of arbitrary nostr filter expressions can be synced. Relays can maintain BTree data-structures for pre-configured filters, improving the efficiency of commonly synced queries (such as the full DB). Whenever two parties to the sync share common subsets of identical events, then there will be significant bandwidth savings compared to downloading the full set. In addition to syncing, negentropy can also be used to compute accurate event counts for a filter across multiple relays, without having to download the entire filter results from each relay.
The `strfry negentropy` command can be used to manage the pre-configured queries to sync.
`negentropy list` will list the current BTrees. Here we see we have one filter, `{}` which matches the full DB:
$ strfry negentropy list
tree 1
filter: {}
size: 483057
fingerprint: 9faaf0be1c25c1b4ee7e65f18cf4b352
This filter will be useful for full-DB syncs, and for syncs that use only `since`/`until`.
To add a new filter, use `negentropy add`. For example:
$ strfry negentropy add '{"kinds":[0]}'
created tree 2
to populate, run: strfry negentropy build 2
Note that the tree starts empty. To populate it, use the `negentropy build` command with the newly created tree ID:
$ strfry negentropy build 2
$ strfry negentropy list
tree 1
filter: {}
size: 483057
fingerprint: 9faaf0be1c25c1b4ee7e65f18cf4b352
tree 2
filter: {"kinds":[0]}
size: 33245
fingerprint: 37c005e6a1ded72df4b9d4aa688689db
Now, negentropy queries for kind 0 (optionally including `since`/`until`) can be performed efficiently and statelessly.
### Compression Dictionaries
Although nostr events are compressed during transfer using websocket compression, they are stored uncompressed on disk by default. In order to attempt to reduce the size of the strfry DB, the `strfry dict` command can be used to compress these events while still allowing them to be efficiently served via a relay. Only the raw relay JSON itself is compressed: The indices needed for efficient retrieval are not. Since the indices are often quite large, the relative effectiveness of this compression depends on the type of nostr events stored.
`strfry dict` uses [zstd dictionaries](https://facebook.github.io/zstd/#small-data) to compress events. First you must build one or more dictionaries with `strfry dict train`. You can provide this command a nostr filter and it will select just these events. You may want to use custom dictionaries for certain kinds of events, or segment based on some other criteria. If desired, dictionary training can happen entirely offline without interfering with relay operation.
After building dictionaries, selections of events can be compressed with `strfry dict compress` (events also selected with nostr filters). These events will be compressed with the indicated dictionary, but will still be served by the relay. Use the compress command again to re-compress with a different dictionary, or use `dict decompress` to return it to its uncompressed state.
`strfry dict stats` can be used to print out stats for the various dictionaries, including size used by the dataset, compression ratios, etc.
## Architecture
@ -291,7 +423,7 @@ After an event has been processed, all the matching connections and subscription
These threads implements the provider-side of the [negentropy syncing protocol](https://github.com/hoytech/negentropy).
When [NEG-OPEN](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) requests are received, these threads perform DB queries in the same way as [ReqWorker](#ReqWorker) threads do. However, instead of sending the results back to the client, the IDs of the matching events are kept in memory, so they can be queried with future `NEG-MSG` queries.
When [NEG-OPEN](https://github.com/hoytech/strfry/blob/master/docs/negentropy.md) requests are received, these threads perform DB queries in the same way as [ReqWorker](#reqworker) threads do. However, instead of sending the results back to the client, the IDs of the matching events are kept in memory, so they can be queried with future `NEG-MSG` queries. Alternatively, if the query can be serviced with a [pre-computed negentropy BTree](#syncing), this is used instead and the query becomes stateless.
@ -320,6 +452,6 @@ Both of these tests have run for several hours with no observed failures.
## Author and Copyright
strfry © 2023 Doug Hoyte.
strfry © 2023-2024 Doug Hoyte.
GPLv3 license. See the LICENSE file.