Adding more details on how to load things.

- Loading with memmap
- Loading a sharded tensor
- Moved some snippets to `candle-examples/src/lib.rs` This is because
managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706
- This causes a non aligned inclusion  https://github.com/rust-lang/mdBook/pull/1856 which we have
to ignore fmt to remove.

mdbook might need some more love :)
This commit is contained in:
Nicolas Patry
2023-08-01 16:36:53 +02:00
parent 45642a8530
commit a44471a305
4 changed files with 143 additions and 12 deletions

View File

@ -25,6 +25,8 @@ let weights = candle::safetensors::load(weights, &Device::Cpu);
We now have access to all the [tensors](https://huggingface.co/bert-base-uncased?show_tensors=true) within the file.
You can check all the names of the tensors [here](https://huggingface.co/bert-base-uncased?show_tensors=true)
## Using async
@ -35,17 +37,9 @@ cargo add hf-hub --features tokio
```
```rust,ignore
# extern crate candle;
# extern crate hf_hub;
use hf_hub::api::tokio::Api;
use candle::Device;
let api = Api::new().unwrap();
let repo = api.model("bert-base-uncased".to_string());
let weights = repo.get("model.safetensors").await.unwrap();
let weights = candle::safetensors::load(weights, &Device::Cpu);
# This is tested directly in examples crate because it needs external dependencies unfortunately:
# See [this](https://github.com/rust-lang/mdBook/issues/706)
{{#include ../../../candle-examples/src/lib.rs:book_hub_1}}
```
@ -78,3 +72,33 @@ let output = linear.forward(&input_ids);
```
For a full reference, you can check out the full [bert](https://github.com/LaurentMazare/candle/tree/main/candle-examples/examples/bert) example.
## Memory mapping
For more efficient loading, instead of reading the file, you could use [`memmap2`](https://docs.rs/memmap2/latest/memmap2/)
**Note**: Be careful about memory mapping it seems to cause issues on [Windows, WSL](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5893)
and will definitely be slower on network mounted disk, because it will issue more read calls.
```rust,ignore
{{#include ../../../candle-examples/src/lib.rs:book_hub_2}}
```
**Note**: This operation is **unsafe**. [See the safety notice](https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safety).
In practice model files should never be modified, and the mmaps should be mostly READONLY anyway, so the caveat most likely does not apply, but always keep it in mind.
## Tensor Parallel Sharding
When using multiple GPUs to use in Tensor Parallel in order to get good latency, you can load only the part of the Tensor you need.
For that you need to use [`safetensors`](https://crates.io/crates/safetensors) directly.
```bash
cargo add safetensors
```
```rust,ignore
{{#include ../../../candle-examples/src/lib.rs:book_hub_3}}
```