fb660b8d43
Add a cudnn feature to candle-nn/candle-transformers. ( #2890 )
2025-04-13 17:43:41 +02:00
19fb6dac1f
Bump the crate version. ( #2881 )
2025-04-11 22:28:21 +02:00
acc5bd335f
Cuda cleanup. ( #2880 )
...
* Cuda cleanup.
* More fixes.
2025-04-11 21:43:35 +02:00
d9904a3baf
Update to cudarc 0.14 (breaking change). ( #2858 )
...
* Start updating to cudarc 0.14.
* Adapt a couple more things.
* And a couple more fixes.
* More tweaks.
* And a couple more fixes.
* Bump the major version number.
* Proper module system for the cuda kernels.
* Proper ptx loading.
* Launch the sort kernel.
* Custom op.
* Start using the builder pattern.
* More builder.
* More builder.
* Get candle-core to compile.
* Get the tests to pass.
* Get candle-nn to work too.
* Support for custom cuda functions.
* cudnn fixes.
* Get flash attn to run.
* Switch the crate versions to be alpha.
* Bump the ug dependency.
2025-04-03 09:12:19 +02:00
f3d472952f
fix: candle-flash-attn
linux and msvc
build ( #2829 )
...
* fix: candle-flash-attn linux and msvc build
* Missing newline at eof.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2025-03-25 08:45:12 +01:00
468d1d525f
Bump the crate version to 0.8.4. ( #2808 )
2025-03-15 07:42:24 +01:00
fd7f7242a1
Bump the crate version to 0.8.3 ( #2772 )
...
* update to cudarc to v0.13.5 to support cuda 12.8
* Bump the crate version.
---------
Co-authored-by: Michael McCulloch <michael.james.mcculloch@fastmail.com >
2025-02-15 15:54:48 +01:00
2a2852d1c1
Fix flash-attn build. ( #2754 )
2025-01-28 18:49:46 +01:00
e6cd499e98
Fix candle-flash-attn build on Windows (msvc) ( #2734 )
2025-01-22 22:19:48 +01:00
236c35e578
Bump the caret version to 0.8.2. ( #2703 )
2025-01-07 15:50:16 +01:00
2a705e6f37
Flash-Attn upgrade / SoftCap Candle-FlashAttn [3/n] ( #2690 )
...
* update flash-attn v1
* restore: hdim224
* add 224 flash_fwd_template
* remove whitespace
* softcap is working, including test and api.
* make softcap test case better
* unpadded lse added
2024-12-31 10:04:47 +01:00
a594ef669c
Flash-Attn upgrade / SoftCap Candle-FlashAttn [2/n] ( #2689 )
...
* update flash-attn v1
* restore: hdim224
* add 224 flash_fwd_template
* remove whitespace
* softcap is working, including test and api.
* make softcap test case better
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-12-31 09:41:23 +01:00
71cd6d5533
Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] ( #2688 )
...
* update flash-attn v1
* restore: hdim224
* add 224 flash_fwd_template
* remove whitespace
2024-12-31 09:32:22 +01:00
67cab7d6b8
Bump the crate version to 0.8.1. ( #2662 )
2024-12-07 17:03:53 +01:00
9453cc3095
Bump the crate version to 0.8.0. ( #2612 )
2024-11-12 14:11:46 +01:00
3a3c48b14b
Bump the crate version to 0.7.2. ( #2517 )
2024-09-29 10:56:50 +02:00
8097559c1a
Move the candle version to 0.7.1. ( #2495 )
2024-09-22 20:44:39 +02:00
c2fca0ca11
Bump the crate version. ( #2491 )
2024-09-21 15:13:12 +02:00
6070278a31
Bump the version to 0.6.1. ( #2438 )
2024-08-22 09:23:52 +02:00
30cdd769f9
Update the flash attn kernels. ( #2333 )
2024-07-15 20:37:36 +02:00
f65e90e7ef
Bump the crate version. ( #2248 )
2024-06-05 15:49:15 +02:00
7ebc3548e1
Use flash-attn in gemma. ( #2195 )
...
* Use flash-attn in gemma.
* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
89f53b9d7b
Bump the version number to 0.5.1. ( #2155 )
...
* Bump the version number to 0.5.1.
* Fix clippy lints for 1.78.
* More clippy fixes.
2024-05-03 11:17:05 +02:00
f76bb7794a
Bumping the version number to 0.5.0. ( #2009 )
2024-04-04 17:48:45 +02:00
e7fc1daa21
Bump the crate versions to 0.4.2. ( #1821 )
2024-03-08 22:01:51 +01:00
5e526abc8c
Bump the version number to 0.4.1. ( #1768 )
...
* Fix the block size for some cuda kernels.
* Bump the version number to 0.4.1.
2024-02-27 14:19:59 +01:00
a83ca2ece0
Bump the crate version to 0.4.0. ( #1658 )
2024-02-04 19:08:01 +01:00
9e824ec810
Explicit version for packages that are not in the workspace. ( #1642 )
2024-01-31 18:57:38 +01:00
30313c3081
Moving to a proper build crate bindgen_cuda
. ( #1531 )
...
* Moving to a proper build crate `bindgen_cuda`.
* Fmt.
2024-01-07 12:29:24 +01:00
e72d52b1a2
Unpin more of the workplace relative dependencies. ( #1535 )
2024-01-07 12:26:20 +01:00
8d1a57c9a0
chore: update flash attention kernels ( #1518 )
...
* chore: update flash attention kernels
* fmt
* remove unused kernels
* force f32
* correct stride
2024-01-05 18:28:55 +01:00
d35f0a1376
Bump the crate version to 0.3.3. ( #1490 )
2023-12-28 13:38:30 +01:00
94817dac56
Bump the crate version to 0.3.2. ( #1452 )
2023-12-17 05:34:53 -06:00
a209ce8ceb
Update for 0.3.1. ( #1324 )
2023-11-11 18:48:52 +00:00
d2c3f14773
Fix for flash-attn. ( #1310 )
...
Co-authored-by: laurent <laurent@par2dc5-ai-prd-cl01dgx02.cm.cluster >
2023-11-10 10:27:27 +01:00
75629981bc
feat: parse Cuda compute cap from env ( #1066 )
...
* feat: add support for multiple compute caps
* Revert to one compute cap
* fmt
* fix
2023-10-16 15:37:38 +01:00
096dee7073
Bump the version to 0.3.0. ( #1014 )
...
* Bump the version to 0.3.0.
* Changelog update.
2023-10-01 13:51:57 +01:00
7dd8e12472
Bump the crate versions to v0.2.3. ( #886 )
...
* Bump the crate version.
* Also update the python bindings.
2023-09-18 12:14:03 +01:00
2257f4d475
Bump the crate version + update the changelog. ( #822 )
2023-09-12 06:39:24 +01:00
0e250aee4f
Shape with holes ( #770 )
...
* Shape with holes.
* rustfmt.
2023-09-08 08:38:13 +01:00
cfcbec9fc7
Add small customization to the build ( #768 )
...
* Add ability to override the compiler used by NVCC from an environment variable
* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR
* Add the compilation failure to the readme, with a possible solution
* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
ab0d9fbdd1
Properly set the is_bf16 flag. ( #738 )
2023-09-04 16:45:26 +01:00
f80fd44201
BF16 support for flash-attn. ( #737 )
2023-09-04 16:35:43 +01:00
d0cdea95a5
Add back the bf16 flash-attn kernels. ( #730 )
2023-09-04 07:50:52 +01:00
618f4e4c78
Add some documentation. ( #673 )
...
* Add some documentation.
* Bump the crate version.
2023-08-30 11:54:00 +01:00
a3f97c143d
Bump the crate version + update CHANGELOG. ( #628 )
2023-08-27 18:17:11 +01:00
aba1e90797
Add some group parameter to convolutions. ( #566 )
...
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
2023-08-23 12:58:55 +01:00
a8f61e66cc
Bump the crates version to 0.1.2. ( #522 )
2023-08-20 08:07:07 +01:00
03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
ebcfd96d94
add c++17 flags ( #452 )
2023-08-15 15:29:34 +01:00