mirror of
https://github.com/huggingface/candle.git
synced 2025-06-15 18:28:24 +00:00
Adding more details on how to load things.
- Loading with memmap - Loading a sharded tensor - Moved some snippets to `candle-examples/src/lib.rs` This is because managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706 - This causes a non aligned inclusion https://github.com/rust-lang/mdBook/pull/1856 which we have to ignore fmt to remove. mdbook might need some more love :)
This commit is contained in:
@ -25,6 +25,8 @@ let weights = candle::safetensors::load(weights, &Device::Cpu);
|
||||
|
||||
We now have access to all the [tensors](https://huggingface.co/bert-base-uncased?show_tensors=true) within the file.
|
||||
|
||||
You can check all the names of the tensors [here](https://huggingface.co/bert-base-uncased?show_tensors=true)
|
||||
|
||||
|
||||
## Using async
|
||||
|
||||
@ -35,17 +37,9 @@ cargo add hf-hub --features tokio
|
||||
```
|
||||
|
||||
```rust,ignore
|
||||
# extern crate candle;
|
||||
# extern crate hf_hub;
|
||||
use hf_hub::api::tokio::Api;
|
||||
use candle::Device;
|
||||
|
||||
let api = Api::new().unwrap();
|
||||
let repo = api.model("bert-base-uncased".to_string());
|
||||
|
||||
let weights = repo.get("model.safetensors").await.unwrap();
|
||||
|
||||
let weights = candle::safetensors::load(weights, &Device::Cpu);
|
||||
# This is tested directly in examples crate because it needs external dependencies unfortunately:
|
||||
# See [this](https://github.com/rust-lang/mdBook/issues/706)
|
||||
{{#include ../../../candle-examples/src/lib.rs:book_hub_1}}
|
||||
```
|
||||
|
||||
|
||||
@ -78,3 +72,33 @@ let output = linear.forward(&input_ids);
|
||||
```
|
||||
|
||||
For a full reference, you can check out the full [bert](https://github.com/LaurentMazare/candle/tree/main/candle-examples/examples/bert) example.
|
||||
|
||||
## Memory mapping
|
||||
|
||||
For more efficient loading, instead of reading the file, you could use [`memmap2`](https://docs.rs/memmap2/latest/memmap2/)
|
||||
|
||||
**Note**: Be careful about memory mapping it seems to cause issues on [Windows, WSL](https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5893)
|
||||
and will definitely be slower on network mounted disk, because it will issue more read calls.
|
||||
|
||||
```rust,ignore
|
||||
{{#include ../../../candle-examples/src/lib.rs:book_hub_2}}
|
||||
```
|
||||
|
||||
**Note**: This operation is **unsafe**. [See the safety notice](https://docs.rs/memmap2/latest/memmap2/struct.Mmap.html#safety).
|
||||
In practice model files should never be modified, and the mmaps should be mostly READONLY anyway, so the caveat most likely does not apply, but always keep it in mind.
|
||||
|
||||
|
||||
## Tensor Parallel Sharding
|
||||
|
||||
When using multiple GPUs to use in Tensor Parallel in order to get good latency, you can load only the part of the Tensor you need.
|
||||
|
||||
For that you need to use [`safetensors`](https://crates.io/crates/safetensors) directly.
|
||||
|
||||
```bash
|
||||
cargo add safetensors
|
||||
```
|
||||
|
||||
|
||||
```rust,ignore
|
||||
{{#include ../../../candle-examples/src/lib.rs:book_hub_3}}
|
||||
```
|
||||
|
Reference in New Issue
Block a user