candle/examples at 1a6043af5123bf9e189063d3baf110b39cf47617 - candle - Gitea: Git with a cup of tea

huggingface/candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

Files

History

Laurent Mazare 2f22afd80e Cuda acceleration for quantized model. (#1754 )

* Boilerplate for the quantized cuda support.

* More basic cuda support.

* More cuda quantization (quantize on cpu for now).

* Add the dequantization bit.

* Start adding some dedicated cuda kernels from llama.cpp.

* Move the kernel code.

* Start interfacing with the kernel.

* Tweak the kernel launch params.

* Bugfix for quantized metal.

* Fix some clippy lints.

* Tweak the launch parameters.

* Tweak cuda basics to perform a quantized matmul.

* Perform the dequantization on the cpu + use cublas for matmul.

* Add the dequantization kernel.

* Test the qmatmul.

* More kernels.

* Matmul-vec kernel.

* Add a couple kernels.

* More dequantization kernels.

2024-02-25 18:11:47 +01:00

..

Speed up bert with approx gelu (#1410 )

2023-12-06 17:46:37 +01:00

…

Quantized GGUF style (#1523 )

2024-01-17 10:27:58 +01:00

Add the custom tokenizer. (#1686 )

2024-02-09 17:36:50 +01:00

Convmixer example (#1074 )

2023-10-11 19:51:10 +01:00

Add ConvNeXt-V2 and smaller model variants. (#1709 )

2024-02-14 10:53:07 +01:00

Cuda acceleration for quantized model. (#1754 )

2024-02-25 18:11:47 +01:00

…

Distibert (#1366 )

2023-11-24 15:09:14 +00:00

…

Use the new hub helper function. (#1484 )

2023-12-26 09:44:30 +01:00

Fix the eos token for gemma. (#1753 )

2024-02-24 11:07:02 +01:00

Use the hub model file when possible. (#1190 )

2023-10-26 20:00:50 +01:00

Make the cache for the llama model explicit too. (#1745 )

2024-02-22 12:04:33 +01:00

Explicit caching in llama2.c.

2024-02-22 10:22:03 +01:00

llama_multiprocess

Use the new hub helper function. (#1484 )

2023-12-26 09:44:30 +01:00

Improved mamba model optimized for inference (#1694 )

2024-02-11 17:04:57 +01:00

Improved mamba model optimized for inference (#1694 )

2024-02-11 17:04:57 +01:00

Add a KV cache to marian decoding. (#1226 )

2023-10-31 08:47:44 +00:00

Use the tokenizer-output-stream in the llama example. (#1715 )

2024-02-15 16:47:33 +01:00

Use the tokenizer-output-stream in the llama example. (#1715 )

2024-02-15 16:47:33 +01:00

Allow for different behavior between training and eval (#1213 )

2023-10-29 07:53:09 +01:00

Add MobileOne model. (#1595 )

2024-01-16 06:34:16 +01:00

Fix lints for clippy 1.75. (#1494 )

2023-12-28 20:26:20 +01:00

Update docs to reflect current usage of example (#1610 )

2024-02-04 11:59:47 +01:00

Quantized GGUF style (#1523 )

2024-01-17 10:27:58 +01:00

Quantized GGUF style (#1523 )

2024-01-17 10:27:58 +01:00

Quantized GGUF style (#1523 )

2024-01-17 10:27:58 +01:00

Fixing the qwen tokenizer location. (#1693 )

2024-02-11 08:52:36 +01:00

reinforcement-learning

Detach the tensors on batch-norm eval. (#1702 )

2024-02-13 14:26:32 +01:00

Quantized GGUF style (#1523 )

2024-01-17 10:27:58 +01:00

Mention VGG in the readme. (#1573 )

2024-01-12 09:59:29 +01:00

Expose the larger resnets (50/101/152) in the example. (#1131 )

2023-10-19 13:48:28 +01:00

Add a readme for rwkv. (#1712 )

2024-02-14 15:31:33 +01:00

segment-anything

Add negative prompts to segment-anything. (#1000 )

2023-09-30 06:17:42 +01:00

stable-diffusion

Fix typo in README (#1740 )

2024-02-22 12:35:26 +01:00

Quantized support for stable-lm2. (#1654 )

2024-02-04 11:57:05 +01:00

Helper function to load sharded safetensors files (#1481 )

2023-12-25 21:49:21 +01:00

docs: add trocr examples (#1692 )

2024-02-10 16:14:50 +01:00

Allow for different behavior between training and eval (#1213 )

2023-10-29 07:53:09 +01:00

Readme updates. (#1134 )

2023-10-20 09:08:39 +01:00

Supports more audio formats (#1628 )

2024-02-03 14:26:04 +01:00

whisper-microphone

feat: support microphone whisper streaming (#1678 )

2024-02-12 18:01:21 +01:00

Remove some unusued bits. (#1067 )

2023-10-09 19:49:57 +01:00

Fix token generation in bilingual models (non-English outputs) (#1668 )

2024-02-06 12:03:53 +01:00

Fix clippy lints for 1.76. (#1682 )

2024-02-08 16:48:47 +01:00

Fix linspace implementation (#1358 )

2023-11-23 07:35:13 +00:00

onnx_basics.rs

[ONNX] Support a couple more ops. (#1284 )

2023-11-06 22:44:58 +01:00