candle

huggingface/candle

Fork 0

mirror of https://github.com/huggingface/candle.git synced 2025-06-14 09:57:10 +00:00

Commit Graph

Author	SHA1	Message	Date
Laurent Mazare	50e49ecc5f	Add a quantized version of recurrent-gemma. (#2054 ) * Add a quantized version of recurrent-gemma. * Share the rglru part. * Get the quantized gemma model to work.	2024-04-13 20:07:01 +02:00
Laurent Mazare	26cbbf8d84	Mandatory topk sampling for recurrent-gemma. (#2051 )	2024-04-13 10:31:39 +02:00
Laurent Mazare	2bf413caa3	Add the recurrent-gemma model. (#2039 ) * Start adding the recurrent-gemma model. * More griffin. * Add the example + get the weights to load from the HF version. * More inference code. * Rope + kv-cache on the attention side. * Add to the inference code. * Add more to the recurrent gemma inference. * Get some first inference to run. * Add the softcap on logits. * Fixes. * Use partial rotary embeddings. * Get inference to work. * Add a comment. * And add a readme.	2024-04-13 00:05:21 +02:00

Author

SHA1

Message

Date

Laurent Mazare

50e49ecc5f

Add a quantized version of recurrent-gemma. (#2054 )

* Add a quantized version of recurrent-gemma.

* Share the rglru part.

* Get the quantized gemma model to work.

2024-04-13 20:07:01 +02:00

Laurent Mazare

26cbbf8d84

Mandatory topk sampling for recurrent-gemma. (#2051 )

2024-04-13 10:31:39 +02:00

Laurent Mazare

2bf413caa3

Add the recurrent-gemma model. (#2039 )

* Start adding the recurrent-gemma model.

* More griffin.

* Add the example + get the weights to load from the HF version.

* More inference code.

* Rope + kv-cache on the attention side.

* Add to the inference code.

* Add more to the recurrent gemma inference.

* Get some first inference to run.

* Add the softcap on logits.

* Fixes.

* Use partial rotary embeddings.

* Get inference to work.

* Add a comment.

* And add a readme.

2024-04-13 00:05:21 +02:00

3 Commits