# candle
ML framework for Rust

```rust
let a = Tensor::zeros((2, 3), DType::F32, &Device::Cpu)?;
let b = Tensor::zeros((3, 4), DType::F32, &Device::Cpu)?;

let c = a.matmul(&b)?;
```

## Features

- Simple syntax (looks and like PyTorch)
- CPU and Cuda backends (and M1 support)
- Enable serverless (CPU), small and fast deployments
- Model training
- Distributed computing (NCCL).
- Models out of the box (Llama, Whisper, Falcon, ...)
- Emphasis on enabling users to use custom ops/kernels

## Structure

- [candle-core](./candle-core): Core ops, devices, and `Tensor` struct definition
- [candle-nn](./candle-nn/): Facilities to build real models
- [candle-examples](./candle-examples/): Real-world like examples on how to use the library in real settings
- [candle-kernels](./candle-kernels/): CUDA custom kernels

## How to use ?

Cheatsheet:

| Creation   | torch.zeros((2, 2))                     | Tensor::zeros((2, 2))?                                           |
|------------|-----------------------------------------|------------------------------------------------------------------|
| Creation   | torch.Tensor([2, 2])                    | Tensor::new(&[2.0f32, 2.0], &Device::Cpu)?                       |
| Creation   | torch.Tensor([2, 2, 2, 2]).view((2, 2)) | Tensor::from_slice(&[2.0, 2.0, 2.0, 2.0], (2, 2), &Device::Cpu)? |
| Indexing   | tensor[:, :4]                           | tensor.i((.., ..4))?                                             |
| Operations | a.matmul(b)                             | a.matmul(&b)?                                                    |
| Arithmetic | a + b                                   | &a + &b                                                          |
| Device     | tensor.to(device="cuda")                | tensor.to_device(&Device::Cuda(0))?                              |
| Dtype      | tensor.to(dtype=torch.float16)          | tensor.to_dtype(&DType::F16)?                                    |
| Saving     | torch.save({"A": A}, "model.bin")       | tensor.save_safetensors("A", "model.safetensors")?               |
| Loading    | weights = torch.load("model.bin")       | TODO (in the examples for now                                    |


Check out our [examples](./candle-examples/examples/):

- [Whisper](./candle-examples/examples/whisper/)
- [Llama](./candle-examples/examples/llama/)
- [Bert](./candle-examples/examples/bert/) (Useful for sentence embeddings)
- [Falcon](./candle-examples/examples/falcon/)


## FAQ

### Why Candle?

Candle stems from the need to reduce binary size in order to *enable serverless*
possible by making the whole engine smaller than PyTorch very large library volume.
This enables creating runtimes on a cluster much faster.

And simply *removing Python* from production workloads.
Python can really add overhead in more complex workflows and the [GIL](https://www.backblaze.com/blog/the-python-gil-past-present-and-future/) is a notorious source of headaches.

Rust is cool, and a lot of the HF ecosystem already has Rust crates [safetensors](https://github.com/huggingface/safetensors) and [tokenizers](https://github.com/huggingface/tokenizers).


### Other ML frameworks

- [dfdx](https://github.com/coreylowman/dfdx) is a formidable crate, with shapes being included
  in types preventing a lot of headaches by getting compiler to complain about shape mismatch right off the bat
  However we found that some features still require nightly and writing code can be a bit dauting for non rust experts.

  We're leveraging and contributing to other core crates for the runtime so hopefully both crates can benefit from each
  other

- [burn](https://github.com/burn-rs/burn) is a general crate that can leverage multiple backends so you can choose the best
  engine for your workload

- [tch-rs](https://github.com/LaurentMazare/tch-rs.git) Bindings to the torch library in Rust. Extremely versatile, but they 
  do bring in the entire torch library into the runtime. `tch-rs` was written by the same author as `candle`.

### Missing symbols when compiling with the mkl feature.

If you get some missing symbols when compiling binaries/tests using the mkl
features, e.g.:
```
  = note: /usr/bin/ld: (....o): in function `blas::sgemm':
          .../blas-0.22.0/src/lib.rs:1944: undefined reference to `sgemm_' collect2: error: ld returned 1 exit status

  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)
```

This is likely due to some missing linker flag that enable the mkl library. You
can try adding the following at the top of your binary:
```
extern crate intel_mkl_src;
```