candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Author	SHA1	Message	Date
Thomas Santerre	cd6b9e317c	Add benchmarks for the candle-nn package (#1995 ) * add benchmarks for the candle-nn package * uncomment test * format	2024-04-03 07:03:54 +02:00
yinqiwen	5522bbc57c	Add fn 'get_with_hints_dtype' in VarBuilder (#1877 ) (#1897 ) * quantized models(awq/squeezellm/...) have multiple data type tensors, use 'get_with_hints_dtype' to load tensors with given dtype	2024-04-01 12:10:08 +02:00
Hugo Abonizio	60676780a9	Fix detail in new RoPE implementation (#1935 )	2024-03-25 18:20:09 +01:00
Laurent Mazare	e7f8e72588	Contiguous variant of the rope kernel. (#1929 ) * Contiguous variant of the rope kernel. * Add the cuda kernel. * Metal kernel.	2024-03-25 09:11:20 +01:00
Laurent Mazare	1b98f84a2b	Fast kernels for rotary embeddings. (#1928 ) * Fast kernels for rotary embeddings. * Add a test for the fast CPU kernel. * Rope cuda bindings. * Cuda kernel. * Metal kernel (part 1). * Cuda kernels. * Finish the metal kernel. * Use the new kernels in the quantized example. * Fix warning.	2024-03-24 22:48:52 +01:00
Laurent Mazare	0fddec762e	RmsNorm kernel for metal. (#1895 ) * RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.	2024-03-21 09:48:56 +01:00
Laurent Mazare	af7f8b87d3	Custom op for RmsNorm (#1890 ) * Trying out a custom RmsNorm cuda kernel. * CPU implementation for rms-norm. * Cuda wrappers. * Add some validation. * Add some testing. * More testing.	2024-03-21 06:36:28 +01:00
Laurent Mazare	ce9fbc3682	Optimize the cat operation on contiguous tensors (#1855 ) * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.	2024-03-17 10:49:13 +01:00
Kirpal Grewal	758366160e	add clone to candle dropout (#1814 )	2024-03-08 08:18:01 +01:00
ivarflakstad	0c09d10f32	Improve metal buffer usage (#1807 ) * Improve metal buffer usage * Clone cpu storage when loading to reduce wait_until_complete calls * Use powers of two for buffer sizes so reuse is more likely. * Select best available buffer by size. * Add count to MetalStorage -> can use buffer with different size Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co> * Simplify new buffer creation without blit copy. Revert &[] -> Vec * Add documentation on newBufferWithBytes safety / synchronization * Drop unused buffers after command buffer is done syncing. --------- Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co>	2024-03-07 09:42:34 +01:00
Laurent Mazare	4fd00b8900	Add the StarCoder2 model. (#1779 ) * Add the StarCoder2 model. * Add the example code and get things to work. * And also tweak the readme.	2024-02-28 21:02:41 +01:00
Laurent Mazare	0c49e95dfb	Encodec model. (#1771 ) * Encodec model. * Fixes. * Add the padding functions. * Get the LSTM bit to work. * Get the encodec model to generate some tokens (decoder only for now). * Minor tweak. * Minor tweak.	2024-02-27 22:59:40 +01:00
Laurent Mazare	1a6043af51	Tweak the VarMap set type. (#1758 )	2024-02-25 20:50:08 +01:00
Laurent Mazare	c753f72c85	Support for attention bias in gemma + refactor things a bit. (#1744 ) * Support for attention bias in gemma + refactor things a bit. * Fix the cuda tests.	2024-02-22 09:35:28 +01:00
Laurent Mazare	3ba37443e5	Bugfix for applying the bias in conv1d-transpose. (#1732 )	2024-02-18 22:51:20 +01:00
Laurent Mazare	1fb728772d	Support for groups in conv-transpose1d. (#1731 ) * Groups support in conv-transpose-1d. * Remove dangling file.	2024-02-18 21:28:07 +01:00
Laurent Mazare	678d44a7f6	Expose the weights and biases in transposed convolutions. (#1727 )	2024-02-18 10:35:01 +01:00
Laurent Mazare	41416d2376	Expose more conv1d functions/structs. (#1726 )	2024-02-17 18:50:55 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Laurent Mazare	ad73e93da2	Detach the tensors on batch-norm eval. (#1702 ) * Detach the tensors on batch-norm eval. * Fix pyo3 bindings. * Black tweak. * Formatting. * Also update the pyo3-onnx formatting. * Apply black.	2024-02-13 14:26:32 +01:00
Laurent Mazare	020a979de2	Fix clippy lints for 1.76. (#1682 )	2024-02-08 16:48:47 +01:00
Dilshod Tadjibaev	b75e8945bc	Enhance pickle to retrieve state_dict with a given key (#1671 )	2024-02-06 21:17:33 +01:00
Daniël de Kok	a90fc5ca5a	Add `VarBuilder::from_backend` (#1670 ) `candle-nn` already exposes a trait to define custom backends. However, it's not possible to actually construct a `VarBuilder` with a custom backend because the constructor is not exposed. This change makes the constructor public and renames it from `new` to `from_backend` to avoid that it is seen as the primary constructor (which could be confusing to users).	2024-02-06 15:26:11 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Laurent Mazare	539ead927a	Update the Phi model to use the updated architecture. (#1580 ) * Update the Phi model to use the updated architecture. * Add more of the phi model. * Repeat KV + caching. * Apply the rotary embeddings. * Add support for the new phi model in the phi example. * Fix a couple glitches. * Fix a couple more glitches.	2024-01-13 17:38:27 +01:00
Nicolas Patry	b4cb982e49	Simplifying our internal cargo dependencies. (#1529 )	2024-01-07 12:04:14 +01:00
Laurent Mazare	135ae5f3eb	Simplify the one-hot implementation, support arbitrary rank. (#1514 ) * Simplify the one-hot implementation, support arbitrary rank. * More cleanup.	2024-01-01 11:40:17 +01:00
Ryan Tate	41614b4a9b	Add one-hot/cold encoding (#1489 ) * add one-hot encoding * one_hot: improve error handling, use generic to_vecN::<D> Bails if the index value is equal to or greater than the depth value, which would result in an out-of-bounds error. A redundant check is added to ensure the index value does not exceed the length of the one-hot matrix size, which would also result in an out-of-bounds error. Bails if the index value is less than -1. If the index value is -1, then it ignores the setting of the on_value for the index value. Only values that are less than -1 are considered errors. * one-hot: use two generics, one_hot::<I, O>, for input and output data types Separating the input and output data types allows the input tensor indices to be a different data type than the output encoded tensor data type. For example, one_hot::<i64, u8>(...) will take an input tensor of i64 values and encode the output tensor using u8 values. The generic I::DTYPE must match the data type of the input indices, otherwise the method will bail. Additionally, this method adds an `allow_f64` option to enable the input indices data type to be f64 values. f64 values are disabled by default. TODO: indices data type and the generic I data type are currently not compile-time checked. * one_hot: remove input generic, use indices dtype matching This commit removes the to_f64() type cast and explicitly matches the DType from the input tensor. Currently, only U8, U32 and I64 is supported for input tensors. The match arms on the dtype is verbose. It would be nice to use a generic type with the WithDtype traitbound to pass to the to_vecN method and then return an inner value. Open to suggestions for better approaches here to reduce the match arm verbosity. * one_hot: use flat_map iterator over dims instead of nested for loop This commit replaces the nested for loops with an flat map iter over the dimensions of the input tensor. This commit also adds a test for a rank 3 input tensor. * one_hot: use mandatory on/off-values, remove const msgs This commit also updates doc tests, comments and test cases. * Small cleanups. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-01-01 11:18:40 +01:00
Laurent Mazare	b0fe5e4453	Do not implement Module for BatchNorm. (#1513 )	2024-01-01 10:13:13 +01:00
Laurent Mazare	a0facd0e67	Small tweaks to batch-norm. (#1505 )	2023-12-30 17:06:07 +01:00
nkoppel	4290b81244	[Breaking] Add training to batchnorm with exponential moving average (#1504 ) * Add training to batchnorm with exponential moving average * Add more checks to batch norm * Resolve some review comments * Add with_momentum varients of `new` methods * Add check for range of momentum variable; update batch norm test * Run cargo fmt * Add back num_features parameter * Format; tiny simplification	2023-12-30 16:42:08 +01:00
Laurent Mazare	d35f0a1376	Bump the crate version to 0.3.3. (#1490 )	2023-12-28 13:38:30 +01:00
Nicolas Patry	9fc210fae8	Merge pull request #1318 from huggingface/metal4 Starting to fix some tests.	2023-12-20 15:37:31 +01:00
Nicolas Patry	03641293ee	Clippy pass.	2023-12-18 15:22:43 +01:00
Laurent Mazare	94817dac56	Bump the crate version to 0.3.2. (#1452 )	2023-12-17 05:34:53 -06:00
Laurent Mazare	1e86717bf2	Fix a couple typos (#1451 ) * Mixtral quantized instruct. * Fix a couple typos.	2023-12-17 05:20:05 -06:00
Dave Lage	c630622a07	Expose AdamW parameters (#1449 ) * Expose AdamW parameters * Use reference	2023-12-16 18:41:56 -06:00
Nicolas Patry	6bc92e63cb	Addressing a lot of comments.	2023-12-15 13:06:04 +01:00
Nicolas Patry	aa04015098	Remove `unwrap()`.	2023-12-15 12:23:28 +01:00
Nicolas Patry	26540641c1	Renamed all kernel names.	2023-12-15 11:24:47 +01:00
Nicolas Patry	ece4c69a68	Fixing softmax.	2023-12-15 01:35:08 +01:00
Nicolas Patry	361f2ad2af	Working with merging encoders and using fences.	2023-12-14 16:05:33 +01:00
YiiSh	e60f9b5dfc	Speedup ShardedSafeTensors to load Tensors with default hints (#1384 ) * Speedup ShardedSafeTensors to load Tensors with default hints * Tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-12-14 08:08:56 -06:00
nicolas	87dc559817	Lots of updates including some stack of command buffers.	2023-12-12 17:41:56 +01:00
Laurent Mazare	236b820e28	Another prelu bugfix. (#1407 )	2023-12-06 09:54:41 +01:00
Laurent Mazare	2648e797c2	Use the proper broadcasting for prelu. (#1406 )	2023-12-05 07:09:31 +01:00
Laurent Mazare	b5c283e86f	Add the prelu layer. (#1402 )	2023-12-03 16:06:09 +00:00
Nicolas Patry	4349ff1fc2	Starting to fix some tests. Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit `c65f68e988`. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.	2023-11-30 11:30:31 +01:00
Laurent Mazare	bfa7c8fc01	Implement the module trait directly for QMatMul. (#1372 )	2023-11-25 10:09:45 +00:00
Laurent Mazare	a209ce8ceb	Update for 0.3.1. (#1324 )	2023-11-11 18:48:52 +00:00

1 2 3 4 5

241 Commits