candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3b429f3023	Make the dtype configurable for phi. (#2133 )	2024-04-27 21:32:49 +02:00
Laurent Mazare	96a48e5cc4	Add argsort. (#2132 ) * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.	2024-04-27 20:17:35 +02:00
Isotr0py	6cf82fd7a3	Add Olmo models (#2127 ) * add olmo support * add olmo readme * Fix fmt. * Fix clippy. * Get olmo to work on cuda. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-26 11:02:51 +02:00
Laurent Mazare	cfab6e7616	Mention phi-v3 in the readmes. (#2122 )	2024-04-24 20:54:24 +02:00
Laurent Mazare	11d4a3c588	Add the phi-3 model. (#2120 ) * Add the phi-3 model. * Faster rope. * Bugfix. * Fix the detokenization.	2024-04-24 09:48:13 +02:00
Laurent Mazare	9d3f1c8af5	Add the phi-v3 quantized model. (#2118 ) * Add the phi-v3 quantized model. * Also include phi-3 in the main phi example.	2024-04-24 08:22:23 +02:00
Laurent Mazare	7211009179	Fix for rustfmt. (#2117 )	2024-04-23 19:09:33 +02:00
B1rtek	6fadaf2eff	candle-onnx: add operators RandomUniform and Exp (#2116 ) * Add basic RandomUniform implementation * Use is_some to check if seed is present * Added Exp operator implementation --------- Co-authored-by: Mateusz Okulus <mmokulus@gmail.com>	2024-04-23 19:02:19 +02:00
Laurent Mazare	8a05743a21	Add StorageRef. (#2113 ) * Add the storage-ref bits. * Add the metal implementation.	2024-04-23 13:23:27 +02:00
Laurent Mazare	b2e816752b	Use the faster rms-norm kernel for llama. (#2107 ) * Use the faster rms-norm kernel for llama. * Use the fast variant by default.	2024-04-22 18:52:00 +02:00
Laurent Mazare	618ecf5e23	Better time measurement for the llama example. (#2106 )	2024-04-22 17:54:27 +02:00
dependabot[bot]	267601eec1	Update tokenizers requirement from 0.15.0 to 0.19.1 (#2104 ) Updates the requirements on [tokenizers](https://github.com/huggingface/tokenizers) to permit the latest version. - [Release notes](https://github.com/huggingface/tokenizers/releases) - [Changelog](https://github.com/huggingface/tokenizers/blob/main/RELEASE.md) - [Commits](https://github.com/huggingface/tokenizers/compare/v0.15.0...v0.15.2) --- updated-dependencies: - dependency-name: tokenizers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-22 17:10:46 +02:00
dependabot[bot]	08a15cb79e	Update zip requirement from 0.6.6 to 1.1.1 (#2103 ) * Update zip requirement from 0.6.6 to 1.1.1 --- updated-dependencies: - dependency-name: zip dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Fix for the zip crate update. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-22 16:23:27 +02:00
Laurent Mazare	c388be93e7	Updated quantized phi model (#2099 ) * Quantized phi in a separate file. * Add the quantized phi example + rework the model code. * Improve the phi model. * Get some generation out. * Use the appropriate rope shape. * Tweak the default prompt. --------- Co-authored-by: Jane Doe <jane.doe@example.org>	2024-04-21 07:37:07 +02:00
Santiago Medina	d22f1d4f4e	Derive clone and debug traits for Moondream model (#2100 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2 * Derive debug and clone traits for Moondream model.	2024-04-21 07:08:28 +02:00
Thomas Santerre	0067fe00a8	Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056 ) * add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case	2024-04-21 00:10:33 +02:00
Laurent Mazare	587ee3bb6f	Small cleanups to the llama multi-process example. (#2098 )	2024-04-20 22:19:46 +02:00
Laurent Mazare	dd78422701	Handle multiple dimensions in metal QMM + two fixes. (#2097 )	2024-04-20 18:55:45 +02:00
Gabriel	9215e9ce8c	Add missing onnx operations (#2096 ) * Add missing onnx operations * Add tests and fix errors * Run rustfmt	2024-04-20 18:44:22 +02:00
Laurent Mazare	52ae332910	Use llama v3 by default + add to readme. (#2094 )	2024-04-20 16:11:24 +02:00
Laurent Mazare	8b390ddd29	Only download the weights in the main process (and not in the child processes). (#2093 )	2024-04-20 13:01:23 +02:00
Laurent Mazare	c97d639fa0	Multiprocess/multi-GPU support for llama 3. (#2092 ) * Multiprocess/multi-GPU support for llama 3. * Modernize the mp example a bit.	2024-04-20 12:49:21 +02:00
Laurent Mazare	b45c710dbf	Fix for gemma MQA. (#2091 )	2024-04-19 21:49:55 +02:00
Laurent Mazare	9c532aef47	Also enable llama-v3 8b instruct. (#2088 )	2024-04-19 08:50:06 +02:00
Thomas Santerre	f7a6468238	Add support for llama3 on the quantized example (#2086 ) * add support for l3b, new tokenizer * add todo * Add todo and use k_s model * Use the official tokenizers. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-18 22:52:00 +02:00
Laurent Mazare	2b93dffe64	Use faster rotary embeddings for llama like models. (#2087 )	2024-04-18 22:34:29 +02:00
Laurent Mazare	e6ee7ba4d4	Llama v3. (#2085 ) * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.	2024-04-18 22:19:54 +02:00
Laurent Mazare	1690ab45d2	Fix the silu gradient issue on 0. (#2083 )	2024-04-18 14:31:41 +02:00
Laurent Mazare	8de0ce6cba	Add more QMMV cuda kernels. (#2077 ) * Add more QMMV cuda kernels. * Enable the new kernels. * Adapt the testing.	2024-04-18 08:36:43 +02:00
Laurent Mazare	ce6d08df94	Minor fix to the readme. (#2080 ) Co-authored-by: Jane Doe <jane.doe@example.org>	2024-04-17 22:43:00 +02:00
Laurent Mazare	2817643db9	Add the mmv kernels for small batch sizes. (#2075 ) * Add the mmv kernels for smaller sizes. * Support more mmv kernels. * Use the new kernels. * Fix the call. * Silly fix. * Improve the testing. * Fix for dmmv. * Add another dedicated test for the batching mmv.	2024-04-16 21:30:51 +02:00
NorilskMajor	4d14777673	Utilize batches in Stable Diffusion (#2071 ) * Utilize batches in Stable Diffusion that were already there, but unutilized. Also refactor out the `save_image` function. * Clippy + cosmetic fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-16 06:49:04 +02:00
Laurent Mazare	f135b7963d	Fix for the batch dim in the quantized matmul example. (#2073 ) * Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.	2024-04-15 20:00:28 +02:00
Laurent Mazare	af955f260c	Make the falcon model cloneable. (#2067 )	2024-04-15 09:39:03 +02:00
Laurent Mazare	8ad822a983	Add a function to clear the KV cache in falcon. (#2066 ) * Add a function to clear the KV cache in falcon. * Clippy.	2024-04-15 09:29:25 +02:00
Laurent Mazare	e198bb0816	Handle zero dims in some simple operations. (#2064 ) * Handle zero dims in some simple operations. * Handle zero-dims in matmul. * More testing.	2024-04-15 09:18:54 +02:00
Laurent Mazare	f7d5bf5b97	Faster kernels for quantized matmul on cuda (#2060 ) * Hook the quantized matmul cuda kernels. * Add a (currently broken) test. * Kernel fixes. * Fix by transposing the rhs matrix. * Add the q4-1 kernels. * Proper block sizes. * More details in the tests.	2024-04-15 08:32:47 +02:00
Harry Stern	c119600d6e	Move image tensor to device in trocr example (#2063 ) Signed-off-by: Harry Stern <harry@harrystern.net>	2024-04-15 06:50:32 +02:00
Laurent Mazare	c449f65b12	Expose the synchronize function on the generic device. (#2062 )	2024-04-14 23:02:03 +02:00
ivarflakstad	db7dbf3071	Add missing bfloat unary strided kernels and fix typo (#2058 )	2024-04-14 20:01:13 +02:00
Laurent Mazare	4ecedb1598	Add the full quantized matmul kernels for cuda. (#2057 )	2024-04-14 17:52:08 +02:00
Laurent Mazare	53e5380bf6	Add a synchronize method to devices. (#2055 ) * Add a synchronize method to devices. * Metal version.	2024-04-14 16:32:55 +02:00
Laurent Mazare	50e49ecc5f	Add a quantized version of recurrent-gemma. (#2054 ) * Add a quantized version of recurrent-gemma. * Share the rglru part. * Get the quantized gemma model to work.	2024-04-13 20:07:01 +02:00
Thomas Santerre	4c88c3ce06	Add benchmarks for qmatmul operations (#2048 ) * Add qmatmul bench * add all dtypes	2024-04-13 12:30:14 +02:00
Laurent Mazare	8b8fb630df	Add a convenient way to rename tensors accessed through a varbuilder. (#2052 )	2024-04-13 12:09:41 +02:00
Victor-Mihaila	fb805b8ca2	Avoid crashes when running T5 models with F16 tensors on CPU (#2047 ) * This change avoids crashes when running T5 models with F16 tensors on CPU. * This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. You could write more, like: This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point. * Revert "This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. You could write more, like: This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point." This reverts commit `d886d3ce5e`.	2024-04-13 11:07:28 +02:00
Victor-Mihaila	79e3bec789	Change for the encoder-only ProstT5 model (#2045 ) * This change avoids crashes when running T5 models with F16 tensors on CPU. * This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point.	2024-04-13 11:06:24 +02:00
Gabriel	e6d412b156	Add ReduceMean onnx operation (#2049 ) * Add ReduceMean onnx operation * Format code with rustfmt	2024-04-13 11:00:25 +02:00
Laurent Mazare	26cbbf8d84	Mandatory topk sampling for recurrent-gemma. (#2051 )	2024-04-13 10:31:39 +02:00
Laurent Mazare	2bf413caa3	Add the recurrent-gemma model. (#2039 ) * Start adding the recurrent-gemma model. * More griffin. * Add the example + get the weights to load from the HF version. * More inference code. * Rope + kv-cache on the attention side. * Add to the inference code. * Add more to the recurrent gemma inference. * Get some first inference to run. * Add the softcap on logits. * Fixes. * Use partial rotary embeddings. * Get inference to work. * Add a comment. * And add a readme.	2024-04-13 00:05:21 +02:00

... 3 4 5 6 7 ...

2212 Commits