0af3e428ec
fix: place ug
dep behind not wasm32
flag ( #2760 )
...
* place `ug` behind not wasm32 attr
so that wasm32 can compile
* mv `ug` to conditional target dep
assuming every non-wasm32 user wants this
2025-02-01 23:05:52 +01:00
43017539ab
Adds DebertaV2/V3 ( #2743 )
...
* Adds DebertaV2/V3
* Fixes all clippy warnings
* Typos.
* Addresses PR review findings. Some refactorings
* Avoid some unwrap/unwrap_or.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2025-01-29 08:59:28 +01:00
e142bf9530
use moondream1 model/revision for moondream example ( #2748 )
2025-01-28 22:19:54 +01:00
d2c53f4f2f
Remove the MFA gemm library. ( #2755 )
2025-01-28 21:48:17 +01:00
2a2852d1c1
Fix flash-attn build. ( #2754 )
2025-01-28 18:49:46 +01:00
8f20f2a722
Add the MLX merge sort kernels ( #2751 )
...
* Add some metal sort kernels imported from MLX.
* Add another test.
* Start adding the multiblock version.
* Proper kernel names.
* Split out the main metal file.
* Multi-block sort.
* More sorting.
* DType parametrization.
* Add a larger test.
2025-01-28 14:09:43 +01:00
ab9019425a
Make the metal sdpa tests deterministic. ( #2750 )
2025-01-28 09:05:24 +01:00
da02b59516
Allow using composed strings as metal kernel names. ( #2747 )
2025-01-27 22:40:12 +01:00
27996a1a9e
Remove the old MFA gemm kernels. ( #2742 )
...
* Remove the old MFA gemm kernels.
* Use bf16 in helium on metal.
2025-01-26 20:36:31 +01:00
1a32107fab
Add a few metal gather ops. ( #2740 )
...
* Add a few metal gather ops.
* Fix some compilation issues.
* Adjust the tolerance.
2025-01-25 23:31:03 +01:00
333d94a19a
fix: fix the codegeex4 model examples and transformers model ( #2738 )
...
* Update main.rs
* Update codegeex4_9b.rs
* Get things to compile.
* Add some default for when rope_ratio is missing.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2025-01-25 17:41:12 +01:00
3164a19a5d
Add inpainting to the stable diffusion example ( #2735 )
...
* Update the stable diffusion example with inpainting support for 1.5, 2 and XL.
* Apply cargo fmt.
* Clippy fixes.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2025-01-23 10:08:38 +01:00
e6cd499e98
Fix candle-flash-attn build on Windows (msvc) ( #2734 )
2025-01-22 22:19:48 +01:00
77db8396d0
Explicit error when slice-set is called with the same src and dst. ( #2733 )
2025-01-22 21:31:49 +01:00
85f0aaefe5
Add serde::serialize to activations. ( #2732 )
2025-01-22 10:23:34 +01:00
e4c3a71f11
Fix GLM4 alignment issue ( #2723 )
...
* Fix GLM4 alignment issue
* Cleanups.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2025-01-20 22:51:46 +01:00
17cbbe4286
Sync upstream MLX sdpa vector kernels with mask ( #2718 )
...
* Sync upstream mlx sdpa vector kernels with mask
* Dispatch to the 2pass kernel
* Format
2025-01-16 11:30:10 +01:00
6fd2f63a15
Bump the ug dependency. ( #2720 )
...
* Bump the ug dependency.
* Fix some test.
* Fix the ug test.
2025-01-16 09:39:16 +01:00
efd0e6822f
Fix the helium weights download. ( #2717 )
2025-01-13 18:21:37 +01:00
158817f230
Helium repo update. ( #2716 )
2025-01-13 18:04:14 +01:00
309cd0f7c7
Add the helium model. ( #2715 )
2025-01-13 17:39:49 +01:00
ab7ff7081e
Fixes for running Phi-4 quantized. ( #2714 )
2025-01-13 14:35:33 +01:00
461e8c1685
ModernBERT model ( #2713 )
...
* layer_norm_no_bias
* Modernbert model.
* Format + cleanup error.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2025-01-13 08:39:27 +01:00
2344c4e4b8
Clippy fixes for 1.84. ( #2710 )
2025-01-10 10:15:15 +01:00
32defdb7d5
Update cudarc. ( #2708 )
2025-01-08 15:10:23 +01:00
236c35e578
Bump the caret version to 0.8.2. ( #2703 )
0.8.2
2025-01-07 15:50:16 +01:00
6f8351dfda
add link to README ( #2701 )
2025-01-04 23:07:30 +01:00
57f41da13b
Fix mistral attention on Metal ( #2699 )
...
Co-authored-by: Luka Zakrajsek <luka.zakrajsek@soniox.com >
2025-01-04 16:11:20 +01:00
cbaa0ad46f
UniPC for diffusion sampling ( #2684 )
...
* feat: Add unipc multistep scheduler
* chore: Clippy and formatting
* chore: Update comments
* chore: Avoid unsafety in float ordering
* refactor: Update Scheduler::step mutability requirements
* fix: Corrector img2img
* chore: Update unipc ref link to latest diffusers release
* chore: Deduplicate float ordering
* fix: Panic when running with dev profile
2025-01-01 21:34:17 +01:00
b12c7c2888
Update the hf-hub dependency to 0.4.0. ( #2691 )
...
* Update the hf-hub dependency to 0.4.0.
* Fix the book.
* Use 0.4.1.
2024-12-31 19:07:47 +01:00
94ffc2ec6f
Actually remove the default hf-hub cache path for glm. ( #2696 )
2024-12-31 11:00:44 +01:00
7354afc673
Use the default hf-hub cache for glm. ( #2695 )
2024-12-31 10:55:45 +01:00
2a705e6f37
Flash-Attn upgrade / SoftCap Candle-FlashAttn [3/n] ( #2690 )
...
* update flash-attn v1
* restore: hdim224
* add 224 flash_fwd_template
* remove whitespace
* softcap is working, including test and api.
* make softcap test case better
* unpadded lse added
2024-12-31 10:04:47 +01:00
a594ef669c
Flash-Attn upgrade / SoftCap Candle-FlashAttn [2/n] ( #2689 )
...
* update flash-attn v1
* restore: hdim224
* add 224 flash_fwd_template
* remove whitespace
* softcap is working, including test and api.
* make softcap test case better
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-12-31 09:41:23 +01:00
71cd6d5533
Flash-Attn upgrade / SoftCap Candle-FlashAttn [1/n] ( #2688 )
...
* update flash-attn v1
* restore: hdim224
* add 224 flash_fwd_template
* remove whitespace
2024-12-31 09:32:22 +01:00
d60eba1408
Streamline the glm4 example. ( #2694 )
2024-12-31 09:21:41 +01:00
e38e2a85dd
Fix a cuda warning. ( #2693 )
2024-12-31 09:06:10 +01:00
460616fc84
Update README.org ( #2670 )
...
The command line error in the CPU section of the documentation.
2024-12-30 11:32:02 +01:00
91f1f019b1
Added XLMRobertaModel for Reranking ( #2686 )
...
* add xlm-roberta-base
* Add task enum for fill-mask and reranker in xlm-roberta example; update README and fix attention mask dimensions
- Introduced a new `Task` enum to replace string task identifiers in the xlm-roberta example.
- Updated the logic in `main.rs` to handle tasks using the new enum.
- Enhanced README with example output for fill-mask task.
- Fixed dimension retrieval in `prepare_4d_attention_mask` function for better clarity and safety.
* Clippy fix.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-12-30 11:16:57 +01:00
cd639131f0
Fix bug in whisper transformer ( #2681 )
...
* Fix bug in whisper transformer
- due to num_threads going to zero
in single threaded case
* Apply rustfmt.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-12-24 13:58:21 +01:00
11aa30be10
Fix Batcher iterator break when return_last_incomplete_batch and items.is_empty ( #2654 ) ( #2655 )
2024-12-24 08:41:26 +01:00
1be6b090c7
Fix position encodings for Pixtral ( #2678 )
...
* init commit: add position id in meshgrid
* pass in subsampled positions
* clippy fix
* clippy fix
2024-12-23 13:22:35 +01:00
62ced44ea9
Add a Context trait similar to anyhow::Context. ( #2676 )
...
* Add a Context trait similar to anyhow::Context.
* Switch two unwrap to context.
2024-12-22 09:18:13 +01:00
5c2f893e5a
make DepthAnythingV2 more reusable ( #2675 )
...
* make DepthAnythingV2 more reusable
* Fix clippy lints.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-12-21 12:06:03 +01:00
67cab7d6b8
Bump the crate version to 0.8.1. ( #2662 )
0.8.1
2024-12-07 17:03:53 +01:00
1807be84f4
Change/bert encoder public ( #2658 )
...
* change: BertEncoder struct to public
* change: make certain fields in Config struct public
* change: all fields in bert config struct to be public
* change: add clone to bert encoder and others
* Clippy fix.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-12-04 21:22:30 +01:00
145aa7193c
Add Nvembed v2 model ( #2649 )
...
* Update mod.rs
* Create mod.rs
* Create decoder.rs
* Create model.rs
* Create main.rs
* Create README.md
* Update README.md
* Update main.rs
* Update and rename decoder.rs to embedding.rs
* Update mod.rs
* Update model.rs
2024-12-03 10:56:01 +01:00
6f715f9256
add scatter add ( #2656 )
2024-12-01 18:39:38 +01:00
dba7a9c93e
add u32 - U32 gather ( #2653 )
2024-11-30 23:18:07 +01:00
b52c2c6050
Clippy fixes for the cuda feature. ( #2650 )
2024-11-29 09:01:34 +01:00