Files
candle/candle-examples/examples/quantized-t5
Kyle Birnbaum 648596c073 Added readmes to examples (#2835)
* added chatGLM readme

* changed wording in readme

* added readme for chinese-clip

* added readme for convmixer

* added readme for custom ops

* added readme for efficientnet

* added readme for llama

* added readme to mnist-training

* added readme to musicgen

* added readme to quantized-phi

* added readme to starcoder2

* added readme to whisper-microphone

* added readme to yi

* added readme to yolo-v3

* added readme to whisper-microphone

* added space to example in glm4 readme

* fixed mamba example readme to run mamba instead of mamba-minimal

* removed slash escape character

* changed moondream image to yolo-v8 example image

* added procedure for making the reinforcement-learning example work with a virtual environment on my machine

* added simple one line summaries to the example readmes without

* changed non-existant image to yolo example's bike.jpg

* added backslash to sam command

* removed trailing - from siglip

* added SoX to silero-vad example readme

* replaced procedure for uv on mac with warning that uv isn't currently compatible with pyo3

* added example to falcon readme

* added --which arg to stella-en-v5 readme

* fixed image path in vgg readme

* fixed the image path in the vit readme

* Update README.md

* Update README.md

* Update README.md

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
2025-04-03 09:18:29 +02:00
..
2024-01-17 10:27:58 +01:00
2025-04-03 09:18:29 +02:00

candle-quantized-t5

Candle implementation for quantizing and running T5 translation models.

Seq2Seq example

This example uses a quantized version of the t5 model.

$ cargo run --example quantized-t5 --release -- --prompt "translate to German: A beautiful candle."
...
 Eine schöne Kerze.

Generating Quantized weight files

The weight file is automatically retrieved from the hub. It is also possible to generate quantized weight files from the original safetensors file by using the tensor-tools command line utility via:

$ cargo run --bin tensor-tools --release -- quantize --quantization q6k PATH/TO/T5/model.safetensors /tmp/model.gguf

Using custom models

To use a different model, specify the model-id.

For example, for text editing, you can use quantized CoEdit models.

$ cargo run --example quantized-t5 --release  -- \
  --model-id "jbochi/candle-coedit-quantized" \
  --prompt "Make this text coherent: Their flight is weak. They run quickly through the tree canopy." \
  --temperature 0
...
 Although their flight is weak, they run quickly through the tree canopy.

By default, it will look for model.gguf and config.json, but you can specify custom local or remote weight-file and config-files:

cargo run --example quantized-t5 --release  -- \
  --model-id "jbochi/candle-coedit-quantized" \
  --weight-file "model-xl.gguf" \
  --config-file "config-xl.json" \
  --prompt "Rewrite to make this easier to understand: Note that a storm surge is what forecasters consider a hurricane's most treacherous aspect." \
  --temperature 0
...
 Note that a storm surge is what forecasters consider a hurricane's most dangerous part.

MADLAD-400

MADLAD-400 is a series of multilingual machine translation T5 models trained on 250 billion tokens covering over 450 languages using publicly available data. These models are competitive with significantly larger models.

cargo run --example quantized-t5 --release  -- \
  --model-id "jbochi/madlad400-3b-mt" --weight-file "model-q4k.gguf" \
  --prompt "<2de> How are you, my friend?" \
  --temperature 0
...
 Wie geht es dir, mein Freund?