77db8396d0
Explicit error when slice-set is called with the same src and dst. ( #2733 )
2025-01-22 21:31:49 +01:00
6fd2f63a15
Bump the ug dependency. ( #2720 )
...
* Bump the ug dependency.
* Fix some test.
* Fix the ug test.
2025-01-16 09:39:16 +01:00
2344c4e4b8
Clippy fixes for 1.84. ( #2710 )
2025-01-10 10:15:15 +01:00
e38e2a85dd
Fix a cuda warning. ( #2693 )
2024-12-31 09:06:10 +01:00
62ced44ea9
Add a Context trait similar to anyhow::Context. ( #2676 )
...
* Add a Context trait similar to anyhow::Context.
* Switch two unwrap to context.
2024-12-22 09:18:13 +01:00
6f715f9256
add scatter add ( #2656 )
2024-12-01 18:39:38 +01:00
dba7a9c93e
add u32 - U32 gather ( #2653 )
2024-11-30 23:18:07 +01:00
b52c2c6050
Clippy fixes for the cuda feature. ( #2650 )
2024-11-29 09:01:34 +01:00
54e7fc3c97
Lint fixes introduced with Rust 1.83 ( #2646 )
...
* Fixes for lint errors introduced with Rust 1.83
* rustfmt
* Fix more lints.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-11-28 23:00:21 +01:00
c12db594e3
fix typo ( #2606 )
2024-11-23 08:40:00 +01:00
3159f91b90
20241118 docs ( #2629 )
...
* module docs
* varbuilder gguf docs
* add a link to gguf files
* small additonal mod doc titles
* safetensor docs
* more core docs
* more module docs in canlde_core
* 2 more link fixes
2024-11-19 04:07:07 +01:00
0ed24b9852
Add max-all/min-all. ( #2616 )
2024-11-14 21:08:04 +01:00
06350c31c7
Add some missing index-select metal kernels. ( #2613 )
...
* Add some missing index-select metal kernels.
* Make some matrix contiguous pre-matmul.
2024-11-12 17:10:12 +01:00
3769206583
Update docs ( #2553 )
...
* add module docs for candle-core
* doc each of the candle-nn modules and add the links to the doc page
2024-11-11 22:13:52 +01:00
e2b6b367fa
Add some fast Metal MLX SDPA kernels ( #2584 )
...
* Add some fast Metal MLX SDPA kernels (#32 )
* Sketch the sdpa kernel
* Add full sdpa kernel,
* Add test
* Add vectorized kernel for decoding
* Update tests
* Add some docs
* Fix sdpa_vector names
* Add softcapping for vectorized sdpa
* Add softcapping for full sdpa
* Add support for head dim 32, 96, 256
* Add support for head dim 32, 96, 256
* Update docs
* Add update notice
* Clippy and format
* Conditional compilation for bf16
* Use it in quantized llama
* Some review comments
* Use set_params!
* Remove unused
* Remove feature
* Fix metal sdpa for v stride
* Remove comma
* Add the dim method to layout and shape.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-11-05 09:28:00 +01:00
0e2c8c17fb
UG metal integration. ( #2580 )
2024-10-27 15:20:37 +01:00
594d984f9c
Support for UG kernels. ( #2579 )
...
* Support for UG kernels.
* Add a dedicated test.
2024-10-27 13:37:19 +01:00
dcd83336b6
Testcases ( #2567 )
2024-10-17 13:00:45 +02:00
e4a96f9e7c
Switch to using the MLX matmul by default. ( #2547 )
2024-10-06 23:24:55 +02:00
6faecaa616
Fix for cudnn bf16 conv2d. ( #2535 )
2024-10-02 23:18:55 +02:00
7b60bda4ed
Add support for cuda streams. ( #2532 )
2024-10-02 21:30:58 +02:00
a2bcc227df
Efficient implementation of Tensor::ones()
for metal
( #2512 )
...
* WIP: hopefully better const impl
* with GPU
* More tests on
* Reverting primitive for
* Incorporating review changes - added check elem count check in kerner, using for call strategy
* rustfmt ran
2024-10-01 19:11:59 +02:00
def4c6cdee
Cuda quantized mmv bugfix. ( #2526 )
2024-10-01 12:57:55 +02:00
724650446c
Yet another cuda qmm padding fix. ( #2509 )
2024-09-30 21:53:30 +02:00
844d45cde4
Bugfix for the metal elu kernel. ( #2490 )
...
* Bugfix for the metal elu kernel.
* Add a test.
2024-09-21 15:03:19 +02:00
af2104078f
Metal commands refactoring ( #2489 )
...
* Split out the commands part of the metal device.
* Make most fields private.
* Move the allocator back.
* Rework the encoder provider type.
2024-09-21 13:18:42 +02:00
382c6b51af
Improve error message ( #2485 )
2024-09-20 07:11:41 -06:00
6eea45a761
Add a couple cast metal kernels. ( #2479 )
2024-09-15 22:27:46 +02:00
ebf722b446
Export TensorIndexer public to candle users ( #2477 )
2024-09-13 22:21:57 +02:00
b60faebea4
Missing metal kernels. ( #2474 )
2024-09-12 13:58:50 +02:00
72d649058b
Hook the MLX matmul kernels in candle-core. ( #2473 )
2024-09-12 13:52:59 +02:00
afb6575835
Use the new MLX kernels to handle the BF16 matmul. ( #2470 )
2024-09-11 17:34:05 +02:00
13b2a8a4a0
Complete the missing backticks in the comments ( #2469 )
2024-09-11 16:37:05 +02:00
aafa24ed93
Update cudarc to 0.12. ( #2451 )
...
* Update cudarc to 0.12.
* Some cudnn tweaks.
2024-08-27 10:10:30 +02:00
736d8eb752
Stream tensor ( #2429 )
...
* Support Minus(u) for arbitrary values of u, e.g. Minus(3).
* Forces u to be strictly positive.
* Add StreamTensor.
2024-08-17 21:54:28 +02:00
7cff5898ec
Support Minus(u) for arbitrary values of u, e.g. Minus(3). ( #2428 )
...
* Support Minus(u) for arbitrary values of u, e.g. Minus(3).
* Forces u to be strictly positive.
2024-08-17 21:29:01 +02:00
d3fe989d08
Add documentation examples for Tensor::i
and Tensor::narrow
methods ( #2308 )
...
* Add documentation examples for `Tensor` methods
* Apply fmt.
* Cosmetic tweaks.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-08-10 08:11:09 +02:00
c0a559d427
optimize gradient for silu a bit ( #2393 )
2024-08-04 11:24:17 +02:00
0fcb40b229
Revert the bf16 gemm metal changes for now. ( #2386 )
2024-08-01 23:08:47 +02:00
d4b6f6eef6
Add a minimal test for the metal bf16 matmul. ( #2381 )
2024-08-01 11:22:46 +02:00
957d604a78
Enable BF16 on metal. ( #2380 )
2024-08-01 11:05:07 +02:00
ce90287f45
Add get_ids to GradStore ( #2379 )
2024-08-01 10:56:13 +02:00
1ba87a9450
Use BF16 on metal when possible. ( #2378 )
2024-08-01 10:48:58 +02:00
bd80078acf
Fix log_sum_exp to handle large positive/negative inputs ( #2367 )
2024-08-01 10:37:02 +02:00
8696cf6494
Enable the affine kernel for u8/u32. ( #2376 )
2024-08-01 10:03:11 +02:00
0f5cbb08b3
Add support for Llama 3.1 ( #2359 )
...
* Add Llama 3.1 rope
* Clippy
* Format
* Clippy
* Add support for multiple eos tokens:
* Untagged either
* Remove either dep and fix settings.json
* Make the max positional embeddings configurable
2024-07-26 21:32:26 +02:00
f25173d68b
Fix for backprop in ConvTranspose2D with stride of 2 ( #2337 )
...
* Add gradient test for conv_transpose2d with stride of 2.
* Swap dilation and stride in ConvTranspose2D backpropagation.
Without this, a shape mismatch occurs with a stride of 2 and dilation of 1.
* Add further tests of the ConvTranspose2D gradient.
Values calculated with torch, minor numerical errors adjusted and commented.
2024-07-17 19:22:23 +02:00
6a4741bbf9
Fix Elu gradient NaN on large input ( #2328 )
...
* Fix Elu gradient NaN on large input
* Reuse previously computed exp in Elu
2024-07-16 14:41:16 +02:00
25960676ca
Add a basic metal example with capture ( #2324 )
...
* Add some tracing.
* Get the trace to work.
2024-07-09 12:38:11 +02:00
6baa1d486b
Fix a bug in the metal implemtation of col2im1d. ( #2284 )
2024-06-22 23:21:20 +02:00