candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	9c532aef47	Also enable llama-v3 8b instruct. (#2088 )	2024-04-19 08:50:06 +02:00
Thomas Santerre	f7a6468238	Add support for llama3 on the quantized example (#2086 ) * add support for l3b, new tokenizer * add todo * Add todo and use k_s model * Use the official tokenizers. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-18 22:52:00 +02:00
Laurent Mazare	2b93dffe64	Use faster rotary embeddings for llama like models. (#2087 )	2024-04-18 22:34:29 +02:00
Laurent Mazare	e6ee7ba4d4	Llama v3. (#2085 ) * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.	2024-04-18 22:19:54 +02:00
Laurent Mazare	1690ab45d2	Fix the silu gradient issue on 0. (#2083 )	2024-04-18 14:31:41 +02:00
Laurent Mazare	8de0ce6cba	Add more QMMV cuda kernels. (#2077 ) * Add more QMMV cuda kernels. * Enable the new kernels. * Adapt the testing.	2024-04-18 08:36:43 +02:00
Laurent Mazare	ce6d08df94	Minor fix to the readme. (#2080 ) Co-authored-by: Jane Doe <jane.doe@example.org>	2024-04-17 22:43:00 +02:00
Laurent Mazare	2817643db9	Add the mmv kernels for small batch sizes. (#2075 ) * Add the mmv kernels for smaller sizes. * Support more mmv kernels. * Use the new kernels. * Fix the call. * Silly fix. * Improve the testing. * Fix for dmmv. * Add another dedicated test for the batching mmv.	2024-04-16 21:30:51 +02:00
NorilskMajor	4d14777673	Utilize batches in Stable Diffusion (#2071 ) * Utilize batches in Stable Diffusion that were already there, but unutilized. Also refactor out the `save_image` function. * Clippy + cosmetic fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-16 06:49:04 +02:00
Laurent Mazare	f135b7963d	Fix for the batch dim in the quantized matmul example. (#2073 ) * Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.	2024-04-15 20:00:28 +02:00
Laurent Mazare	af955f260c	Make the falcon model cloneable. (#2067 )	2024-04-15 09:39:03 +02:00
Laurent Mazare	8ad822a983	Add a function to clear the KV cache in falcon. (#2066 ) * Add a function to clear the KV cache in falcon. * Clippy.	2024-04-15 09:29:25 +02:00
Laurent Mazare	e198bb0816	Handle zero dims in some simple operations. (#2064 ) * Handle zero dims in some simple operations. * Handle zero-dims in matmul. * More testing.	2024-04-15 09:18:54 +02:00
Laurent Mazare	f7d5bf5b97	Faster kernels for quantized matmul on cuda (#2060 ) * Hook the quantized matmul cuda kernels. * Add a (currently broken) test. * Kernel fixes. * Fix by transposing the rhs matrix. * Add the q4-1 kernels. * Proper block sizes. * More details in the tests.	2024-04-15 08:32:47 +02:00
Harry Stern	c119600d6e	Move image tensor to device in trocr example (#2063 ) Signed-off-by: Harry Stern <harry@harrystern.net>	2024-04-15 06:50:32 +02:00
Laurent Mazare	c449f65b12	Expose the synchronize function on the generic device. (#2062 )	2024-04-14 23:02:03 +02:00
ivarflakstad	db7dbf3071	Add missing bfloat unary strided kernels and fix typo (#2058 )	2024-04-14 20:01:13 +02:00
Laurent Mazare	4ecedb1598	Add the full quantized matmul kernels for cuda. (#2057 )	2024-04-14 17:52:08 +02:00
Laurent Mazare	53e5380bf6	Add a synchronize method to devices. (#2055 ) * Add a synchronize method to devices. * Metal version.	2024-04-14 16:32:55 +02:00
Laurent Mazare	50e49ecc5f	Add a quantized version of recurrent-gemma. (#2054 ) * Add a quantized version of recurrent-gemma. * Share the rglru part. * Get the quantized gemma model to work.	2024-04-13 20:07:01 +02:00
Thomas Santerre	4c88c3ce06	Add benchmarks for qmatmul operations (#2048 ) * Add qmatmul bench * add all dtypes	2024-04-13 12:30:14 +02:00
Laurent Mazare	8b8fb630df	Add a convenient way to rename tensors accessed through a varbuilder. (#2052 )	2024-04-13 12:09:41 +02:00
Victor-Mihaila	fb805b8ca2	Avoid crashes when running T5 models with F16 tensors on CPU (#2047 ) * This change avoids crashes when running T5 models with F16 tensors on CPU. * This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. You could write more, like: This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point. * Revert "This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. You could write more, like: This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point." This reverts commit `d886d3ce5e`.	2024-04-13 11:07:28 +02:00
Victor-Mihaila	79e3bec789	Change for the encoder-only ProstT5 model (#2045 ) * This change avoids crashes when running T5 models with F16 tensors on CPU. * This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point.	2024-04-13 11:06:24 +02:00
Gabriel	e6d412b156	Add ReduceMean onnx operation (#2049 ) * Add ReduceMean onnx operation * Format code with rustfmt	2024-04-13 11:00:25 +02:00
Laurent Mazare	26cbbf8d84	Mandatory topk sampling for recurrent-gemma. (#2051 )	2024-04-13 10:31:39 +02:00
Laurent Mazare	2bf413caa3	Add the recurrent-gemma model. (#2039 ) * Start adding the recurrent-gemma model. * More griffin. * Add the example + get the weights to load from the HF version. * More inference code. * Rope + kv-cache on the attention side. * Add to the inference code. * Add more to the recurrent gemma inference. * Get some first inference to run. * Add the softcap on logits. * Fixes. * Use partial rotary embeddings. * Get inference to work. * Add a comment. * And add a readme.	2024-04-13 00:05:21 +02:00
Laurent Mazare	3ad4770eb6	Use cat for faster MQA computation. (#2043 ) * Use cat for faster MQA computation. * Move the function to utils + use it in mistral. * Use the shared repeat-kv in a few more models. * Fix.	2024-04-12 09:15:10 +02:00
Laurent Mazare	a0460cd2b1	Add the code-gemma models. (#2038 ) * Add the code-gemma models. * Tweak to the gemma config.	2024-04-10 21:19:21 +02:00
Laurent Mazare	b81ecf712d	Support alternative dtypes for mamba (#2036 ) * Allow different dtypes in mamba. * Add a dtype flag.	2024-04-10 18:10:01 +02:00
Laurent Mazare	a4d5a414e3	Support gather on bf16 for metal. (#2035 )	2024-04-10 12:49:25 +02:00
Gabriel	798e0335cd	Handle more tensor shapes in onnx "Gather" operation (#2026 ) * Handle more tensor shapes in onnx "Gather" operation * Add more tests * Add comment * Fix typo	2024-04-08 14:06:14 +02:00
Laurent Mazare	718671a0d5	Use BufferOffset in metal backend ops. (#2029 ) * Use BufferOffset in the metal backend. * More BufferOffset usage. * Use in where-cond.	2024-04-08 09:37:25 +02:00
Laurent Mazare	c5fe4a7f89	Rework the buffer offset logic for metal kernels (#2028 ) * Move the metal kernels utils in a separate module. * Use the BufferOffset for unary ops. * Fix clippy lints. * Use the new BufferOffset. * Adapt the binary ops. * Affine. * More ops (powf, elu, cast).	2024-04-07 22:37:53 +02:00
Laurent Mazare	7f354473cf	Optimize copy-2d for metal. (#2024 ) * Optimize copy-2d for metal. * Add a hacky stopping rule for moondream.	2024-04-07 12:34:16 +02:00
Laurent Mazare	33c9b66554	Add the new gemma models. (#2023 ) * Add the new gemma models. * Revert the lightning changes. * Support for the 1.1 models.	2024-04-06 21:25:38 +02:00
Laurent Mazare	9fd52b3b71	Handle the batch dimension in quantized MMV on metal. (#2022 )	2024-04-06 20:02:24 +02:00
Laurent Mazare	e662431acf	Fix the final rmsnorm for quantized-metavoice. (#2021 )	2024-04-06 19:35:01 +02:00
Jorge António	ab892274d1	first commit (#2018 )	2024-04-05 15:20:28 +02:00
Laurent Mazare	b869a659ec	Faster mask implementation for mixformers. (#2017 ) * Faster mask implementation for mixformers. * Clippy.	2024-04-05 09:38:26 +02:00
Laurent Mazare	88f7793598	Moondream tracing. (#2016 ) * Moondream tracing. * A bit more tracing.	2024-04-05 09:11:08 +02:00
Laurent Mazare	2ac302a5d1	Add the rope THD kernel. (#2014 ) * Add the rope THD kernel. * Cuda kernel for rope-thd. * Add the metal kernels. * Add a dedicated test.	2024-04-05 08:32:58 +02:00
Santiago Medina	ace282e5c2	Add flag to run Moondream in f16 precision (#2015 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2 * Add flag to use f16 * Avoid breaking the quantized version on cuda. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-05 07:03:33 +02:00
Laurent Mazare	c87381fc96	Use F16 for moondream on cuda. (#2013 )	2024-04-04 23:30:10 +02:00
Thomas Santerre	c5626b8271	Add support for "sign" on tensors (#2012 ) * add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-04 22:32:47 +02:00
Laurent Mazare	e6a5b82ba6	Fix the matmul layout for accelerate & mkl. (#2011 ) * Fix the matmul layout for accelerate & mkl. * Reduce the required precision for pow (because of accelerate). * And a fix the gelu f16 test.	2024-04-04 19:18:03 +02:00
Thomas Santerre	5aebe53dd2	update dtypes checks for several metal operations (#2010 )	2024-04-04 18:39:06 +02:00
Laurent Mazare	f76bb7794a	Bumping the version number to 0.5.0. (#2009 )	2024-04-04 17:48:45 +02:00
Laurent Mazare	30b145150f	Optimize the gelu f16 opt. (#2008 ) * Optimize the gelu f16 opt. * And add a test.	2024-04-04 16:28:23 +02:00
Laurent Mazare	f48c07e242	Include topk sampling in the quantized example. (#2005 ) * Include topk sampling in the quantized example. * Also sample with top-k on the mistral side.	2024-04-04 09:27:54 +02:00

... 6 7 8 9 10 ...

2339 Commits