candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 02:38:10 +00:00

Author	SHA1	Message	Date
Laurent Mazare	6400e1b0a0	Fix the block size for some cuda kernels. (#1767 )	2024-02-27 14:08:33 +01:00
Laurent Mazare	badf886583	Cuda kernel for dequantizing q8k. (#1760 ) * Cuda kernel for dequantizing q8k. * Clippy lints.	2024-02-26 08:42:44 +01:00
Laurent Mazare	2f22afd80e	Cuda acceleration for quantized model. (#1754 ) * Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.	2024-02-25 18:11:47 +01:00
Laurent Mazare	c753f72c85	Support for attention bias in gemma + refactor things a bit. (#1744 ) * Support for attention bias in gemma + refactor things a bit. * Fix the cuda tests.	2024-02-22 09:35:28 +01:00
Kirpal Grewal	8013b50829	Add grads for interpolate1d (#1742 ) * add backprop for interpolate1d * fix clippy lint * correct fix clippy lint	2024-02-22 08:44:01 +01:00
Laurent Mazare	a2cb2edead	Add a couple backtraces on cpu errors. (#1738 )	2024-02-20 19:54:13 +01:00
Laurent Mazare	fc67d878bb	Bugfix for conv-transpose1d (#1734 ) * Add a currently broken test. * Bugfix + fix test.	2024-02-19 09:04:49 +01:00
Laurent Mazare	1fb728772d	Support for groups in conv-transpose1d. (#1731 ) * Groups support in conv-transpose-1d. * Remove dangling file.	2024-02-18 21:28:07 +01:00
Laurent Mazare	cb86b0c82c	Fix float unpickling. (#1730 )	2024-02-18 19:33:55 +01:00
Laurent Mazare	6284ad784c	Module implementation for options. (#1728 )	2024-02-18 14:12:55 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Laurent Mazare	0de0795220	Qmetal tweaks (#1704 ) * Add the dummy qmetal backend. * Fix the metal compilation.	2024-02-13 18:11:17 +01:00
Nicolas Patry	c1b418586c	Fixing quantized llama demo on metal. (#1703 )	2024-02-13 16:28:56 +01:00
Laurent Mazare	ad73e93da2	Detach the tensors on batch-norm eval. (#1702 ) * Detach the tensors on batch-norm eval. * Fix pyo3 bindings. * Black tweak. * Formatting. * Also update the pyo3-onnx formatting. * Apply black.	2024-02-13 14:26:32 +01:00
Laurent Mazare	d0aa197b07	ConvTranspose1d cuda support. (#1697 ) * ConvTranspose1d cuda support. * Add the conv-transpose1d kernel. * Remove some unused variables.	2024-02-12 15:03:18 +01:00
Laurent Mazare	274bf11633	Support defaultdict in PyTorch checkpoints. (#1696 ) * Support defaultdict in PyTorch checkpoints. * Fix clippy lint.	2024-02-12 10:26:56 +01:00
Laurent Mazare	cdc3823d8f	Pickle support: dig within the _rebuild_parameter calls. (#1681 )	2024-02-08 13:09:49 +01:00
Dilshod Tadjibaev	e5eb9602d0	Add support for loading Fortran contiguous tensors (#1672 ) * Add support for loading Fortran contiguous tensors This commit introduces the ability to handle Fortran contiguous tensors in the tensor loading process. Previously, the code only supported loading tensors that were contiguous in memory, failing with an error for non-contiguous tensors. With this update, tensors identified as Fortran contiguous (column-major order) are now correctly handled by reversing their dimensions after loading. This enhancement ensures broader compatibility with different tensor layouts, improving the robustness of tensor loading operations. - Check if a tensor is Fortran contiguous using the `is_fortran_contiguous` flag. - For Fortran contiguous tensors, reverse the dimensions after loading to correctly represent their layout in memory. - Continue to bail out with an error for tensors that are neither C contiguous nor Fortran contiguous, maintaining the previous behavior for non-contiguous tensors without explicit support. This change addresses the issue of loading Fortran contiguous tensors, which was previously unsupported, thereby extending the functionality of the tensor loading mechanism to accommodate a wider variety of tensor layouts. * Add reshape step to handle fortran contiguous case * Skip fortran contiguous fix if rank is < 2 * Fail on rank 0, 1 if contiguous	2024-02-07 21:49:59 +01:00
Dilshod Tadjibaev	b75e8945bc	Enhance pickle to retrieve state_dict with a given key (#1671 )	2024-02-06 21:17:33 +01:00
Laurent Mazare	adfae2460a	Fix rustfmt. (#1669 )	2024-02-06 12:06:06 +01:00
Laurent Mazare	b545f54a19	Fix clippy lints. (#1667 )	2024-02-06 09:03:36 +01:00
Roma Klapaukh	1ba11f22d6	Fix: pth files don't load on Windows (#1661 ) * Don't treat zip path as OS path * Add a test case * Add code to generate test pth data	2024-02-06 08:50:55 +01:00
Jiayu Liu	982722019b	add roll function to tensor (#1666 )	2024-02-06 08:49:45 +01:00
Ivar Flakstad	db923517b3	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-17 18:03:57 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Ivar Flakstad	86a8e58897	Update metal random kernel and set_seed method * set_seed via buffer content pointer copy + did_modify_range * ensure random.metal kernel does not write outside of buffer range when tid==0	2024-01-17 09:12:44 +01:00
Ivar Flakstad	79478ff5a1	Seed should be updated by random kernel result.	2024-01-15 11:58:25 +01:00
Laurent Mazare	bdd8107fda	Expose the ndarray trait. (#1586 )	2024-01-14 20:09:49 +01:00
Ivar Flakstad	ecf88a6d38	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-14 17:10:54 +01:00
Laurent Mazare	e6d86b0819	Add the pow operator. (#1583 ) * Add the pow operator. * Support the pow operation in onnx.	2024-01-13 20:24:06 +01:00
Nicolas Patry	bafe95b660	Fix format. (#1576 )	2024-01-12 14:23:17 +01:00
ivarflakstad	a3d92ab226	Metal: Activate bfloat affine and add benchmark (#1543 ) * Use cfg to seperate benchmark results based on features * Add bfloat affine and benchmarks * Fix flops calculation * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-01-12 11:19:49 +01:00
ivarflakstad	e90bcdcc7c	Metal: f16 and bf16 where_cond + benchmark (#1545 ) * Use cfg to seperate benchmark results based on features * Add metal where_cond for f16 and bf16. Add benchmark * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Updated feature separated benchmarks --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-01-12 11:18:11 +01:00
Ivar Flakstad	e63bb8661b	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-12 07:19:58 +01:00
Laurent Mazare	41915184bb	Bugfix for dequantizing q5k layers. (#1569 )	2024-01-11 23:15:11 +01:00
Kyle McCarthy	402349d120	feat(bf16): add cast support + tests for cast + bin ops (#1524 )	2024-01-11 15:49:13 +01:00
ivarflakstad	9f0c99f0c1	Seperate benchmarks by enabled features (#1538 ) * Use cfg to seperate benchmark results based on features * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Derive bench_name from actual device * Run CPU benchmarks even when GPU feature is enabled --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-01-11 15:35:38 +01:00
Laurent Mazare	0fc95c9f0c	Add a dequantize command to tensor-tools. (#1565 ) * Add a dequantize command to tensor-tools. * Clippy fixes.	2024-01-11 11:21:01 +01:00
Juarez Bochi	ae06cb74bb	Add relu kernel for metal (#1488 ) * Add relu kernel for metal * Copy error messages proposed in #1491 * Revert non relu changes * Fix name changes * Fix the last of us (: * Fix copy and paste mistakes * Fix typo * Revert order changes * Revert order change * Add deleted functions back * Run rustfmt	2024-01-10 18:27:17 +01:00
Ivar Flakstad	87efb5d8eb	Updated feature separated benchmarks	2024-01-09 19:04:31 +01:00
Ivar Flakstad	ad181f9cdc	Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng	2024-01-09 18:55:40 +01:00
Ivar Flakstad	88945f2c22	Improve benchmarks layout	2024-01-09 18:31:28 +01:00
Laurent Mazare	12b2a337f3	Handle start-offset when loading a tensor from a pickle file. (#1546 )	2024-01-08 09:20:48 +01:00
Laurent	fb05af4c42	Avoid some unnecessary returns.	2024-01-08 07:19:59 +01:00
Ivar Flakstad	ad075a5f7e	Remove allow pragma	2024-01-08 06:48:33 +01:00
Laurent Mazare	0eb90ed783	Simpler repro for the neon optimization issue + bugfix (#1544 ) * Simpler repro for the neon optimization issue. * Bugfix for q4k. * Improve the fix, share the dot-prod bit. * Clippy fixes. * Fix for q6k. * Also fix for q2k. * Use the new shared dotprod. * Add more testing.	2024-01-07 20:21:49 +01:00
Ivar Flakstad	3f04a79ada	Use cfg to seperate benchmark results based on features	2024-01-07 14:40:15 +01:00
Nicolas Patry	b4cb982e49	Simplifying our internal cargo dependencies. (#1529 )	2024-01-07 12:04:14 +01:00
Ivar Flakstad	6ebe043273	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-07 11:52:03 +01:00
Ivar Flakstad	6bf52b9fdf	Gaussian normal distribution of PRNG via Box-Muller transform	2024-01-07 11:39:46 +01:00

1 2 3 4 5 ...

619 Commits