candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 02:38:10 +00:00

Files

Laurent Mazare 3071134788 Get the ggml based llama to generate some text. (#464 )

* Add more stats to the ggml example.

* Build a quantized model from the file content.

* Move the tensor retrieval in the main crate.

* Start adding the forward pass.

* Add more to the forward pass of the quantized llama.

* Apply the attention layers.

* Add the sampling loop.

* Get the sampling loop to work.

* Minor tweak.

* Add a quantize/dequantize test.

* Bugfix.

* Add a comment + swap the order.

* Bugfixes.

2023-08-16 12:41:07 +01:00

bert

Add a cuda kernel for upsampling. (#441 )

2023-08-14 13:12:17 +01:00

bigcode

Add a cuda kernel for upsampling. (#441 )

2023-08-14 13:12:17 +01:00

custom-ops

Use bail rather than wrapping a string where possible. (#249 )

2023-07-26 15:42:46 +01:00

falcon

Add a cuda kernel for upsampling. (#441 )