Commit Graph

  • aaa44a1948 Improved launch config for layer-norm/rms-norm. Laurent 2024-11-01 17:59:22 +01:00
  • 7ac0de15a9 Lazy upcasting for t5. (#2589) Laurent Mazare 2024-10-30 18:08:51 +01:00
  • d232e132f6 Support sd3.5 medium and MMDiT-X (#2587) Czxck001 2024-10-29 22:19:07 -07:00
  • 139ff56aeb Reduce memory usage for sd 3.5. (#2582) Laurent Mazare 2024-10-28 22:45:02 +01:00
  • 498bc2cdc9 Release the mmdit model earlier to reduce memory usage. (#2581) Laurent Mazare 2024-10-28 16:06:53 +01:00
  • 0e2c8c17fb UG metal integration. (#2580) Laurent Mazare 2024-10-27 15:20:37 +01:00
  • 594d984f9c Support for UG kernels. (#2579) Laurent Mazare 2024-10-27 13:37:19 +01:00
  • 37e0ab8c64 Stable diffusion 3.5 support. (#2578) Laurent Mazare 2024-10-27 10:01:04 +01:00
  • 07849aa595 Update README.md (#2577) sashaphmn 2024-10-26 19:23:52 +03:00
  • 3699c1a053 Fix the repo name for llama 3.1. (#2576) Laurent Mazare 2024-10-26 11:25:04 +02:00
  • a2e9d41b20 use softmax_last_dim (metal and cuda kernel) in llama attention layer (#2572) Zack Angelo 2024-10-23 11:07:09 -07:00
  • 7c09215ef4 ONNX: GatherElements, Xor (#2568) Anubhab Bandyopadhyay 2024-10-17 23:52:35 +05:30
  • dcd83336b6 Testcases (#2567) Anubhab Bandyopadhyay 2024-10-17 16:30:45 +05:30
  • a01aa89799 onnx: ReduceMin/Max Ops (#2563) Anubhab Bandyopadhyay 2024-10-15 14:04:07 +05:30
  • 3d1dc06cdb Enable stable-diffusion 3 on metal. (#2560) Laurent Mazare 2024-10-14 08:59:12 +02:00
  • f553ab5eb4 Adds support for Stella_en_v5 embedding model - 1.5B variant (#2551) Anubhab Bandyopadhyay 2024-10-14 02:39:12 +05:30
  • 41ade774e8 fix: Allow marian configs to deserialize from json. (#2556) Mikarific 2024-10-13 15:05:50 -06:00
  • 6eab6b57f5 Fix the guide to gain access to Stable Diffusion 3 Medium (#2559) Czxck001 2024-10-13 13:55:26 -07:00
  • ca7cf5cb3b Add Stable Diffusion 3 Example (#2558) Czxck001 2024-10-13 13:08:40 -07:00
  • 0d96ec31e8 feat: intergrate chinese clip and add example (#2555) SethWen 2024-10-10 21:18:55 +08:00
  • 937e8eda74 Add BertForMaskedLM to support SPLADE Models (#2550) Akshay Ballal 2024-10-07 23:28:21 +02:00
  • edf7668291 improve (#2548) Jorge António 2024-10-07 16:30:56 +01:00
  • e4a96f9e7c Switch to using the MLX matmul by default. (#2547) Laurent Mazare 2024-10-06 23:24:55 +02:00
  • f856b5c3a7 pyo3 update. (#2545) Laurent Mazare 2024-10-06 10:09:38 +02:00
  • d2e432914e Tensor tools print all (#2543) Laurent Mazare 2024-10-05 10:05:14 +02:00
  • 410c89f72a Add required feature for whisper example in Readme (#2539) dengelt 2024-10-04 14:29:55 +02:00
  • 56aacb05da Make the RNN configs accessible from the models. (#2541) Laurent Mazare 2024-10-04 14:22:23 +02:00
  • 1bb68854d3 Tweaks to the graph experiment. laurent 2024-10-03 17:12:52 +02:00
  • b2956857ef More cuda graph attempts. laurent 2024-10-03 12:43:08 +02:00
  • 9076dee432 Cuda graph experiments. laurent 2024-10-03 08:43:00 +02:00
  • 6faecaa616 Fix for cudnn bf16 conv2d. (#2535) Laurent Mazare 2024-10-02 23:18:55 +02:00
  • 90d04ff622 Support whisper large-v3 turbo in the whisper-microphone example. (#2533) Laurent Mazare 2024-10-02 22:09:14 +02:00
  • 7b60bda4ed Add support for cuda streams. (#2532) Laurent Mazare 2024-10-02 21:30:58 +02:00
  • 936300678d Add whisper large-v3 turbo to the example. (#2531) Laurent Mazare 2024-10-02 21:07:08 +02:00
  • f479840ce6 Add a seed to the flux example. (#2529) Laurent Mazare 2024-10-02 10:52:02 +02:00
  • fd08d3d0a4 Tweak some metal tests. (#2528) Laurent Mazare 2024-10-02 10:22:31 +02:00
  • a2bcc227df Efficient implementation of Tensor::ones() for metal (#2512) Anubhab Bandyopadhyay 2024-10-01 22:41:59 +05:30
  • def4c6cdee Cuda quantized mmv bugfix. (#2526) Laurent Mazare 2024-10-01 12:57:55 +02:00
  • 888d886dd8 Add ColPali (#2524) Akshay Ballal 2024-10-01 11:48:39 +02:00
  • 6110ad8d4f Refactor the whisper microphone example. (#2523) Laurent Mazare 2024-10-01 00:24:17 +02:00
  • aa35bf2ff5 Add/lstm direction (#2455) Justin Sing 2024-09-30 16:44:07 -04:00
  • 724650446c Yet another cuda qmm padding fix. (#2509) Laurent Mazare 2024-09-30 21:53:30 +02:00
  • dfe9a00683 Pixtral polishing. (#2522) Laurent Mazare 2024-09-30 21:23:54 +02:00
  • 683ab698de Add Pixtral. (#2521) Laurent Mazare 2024-09-30 19:31:14 +02:00
  • 2f49e1b534 Add PaliGemma. (#2519) Laurent Mazare 2024-09-29 19:56:56 +02:00
  • 0ebb38813b Paligemma siglip vision config (#2518) Laurent Mazare 2024-09-29 17:53:52 +02:00
  • 3a3c48b14b Bump the crate version to 0.7.2. (#2517) 0.7.2 Laurent Mazare 2024-09-29 10:56:50 +02:00
  • 261ed65f36 Add the SigLIP model. (#2515) Laurent Mazare 2024-09-28 23:48:00 +02:00
  • 62525e8352 Remove some extra whitelines. (#2513) Laurent Mazare 2024-09-28 14:41:28 +02:00
  • 2c25754281 Clippy fixes for onnx + fix a broken test. (#2510) Laurent Mazare 2024-09-26 23:37:59 +02:00
  • ed48f54b54 Expand split ops (#2505) Steven Lovegrove 2024-09-26 13:57:55 -07:00
  • ad8a4c5e5a Add some llama-3.2 examples. (#2508) Laurent Mazare 2024-09-26 21:00:18 +02:00
  • c3c392f45c Merge pull request #2507 from huggingface/ci-move Guillaume LEGENDRE 2024-09-26 18:48:52 +02:00
  • a0184a4fe4 move CI/Cuda runner Guillaume LEGENDRE 2024-09-26 17:09:26 +02:00
  • 10d47183c0 Quantized version of flux. (#2500) Laurent Mazare 2024-09-26 10:23:43 +02:00
  • ab12425bff Another tweak. qmm-fix2 laurent 2024-09-26 10:14:53 +02:00
  • 43a8cbe244 Tweaks. laurent 2024-09-26 00:05:17 +02:00
  • 46acac5a64 Cuda quantization padding fix. laurent 2024-09-25 23:40:14 +02:00
  • 5221146cfa Cuda quantization padding fix. qmm-pad-fix laurent 2024-09-25 23:35:16 +02:00
  • fd3b53f48b Fix for the quantized model. laurent 2024-09-25 12:34:46 +02:00
  • c6019e9635 Use the newly minted gguf file. laurent 2024-09-25 12:08:20 +02:00
  • 8cc560bb8c Hook the quantized model. Laurent 2024-09-25 11:24:50 +02:00
  • 0bd61bae29 More generic sampling. Laurent 2024-09-25 11:15:37 +02:00
  • fa1e0e438e Quantized version of flux. Laurent 2024-09-25 11:07:49 +02:00
  • d01207dbf3 Add a RotatingKVCache. (#2493) 0.7.1 Laurent Mazare 2024-09-23 13:14:32 +02:00
  • 8097559c1a Move the candle version to 0.7.1. (#2495) Laurent Mazare 2024-09-22 20:44:39 +02:00
  • 829dcfa8dc Update cudarc to 0.12.1. (#2494) Laurent Mazare 2024-09-22 20:32:29 +02:00
  • 42c702a023 Update cudarc to 0.12.1. cudarc-12-6 Laurent 2024-09-22 20:16:57 +02:00
  • d6f01f625d More rotating kv-cache. Laurent 2024-09-22 14:53:41 +02:00
  • 3277844fd9 Mimi streaming fixes. Laurent 2024-09-22 14:41:17 +02:00
  • c79bf421c7 Add a way to test the mimi streaming mode. Laurent 2024-09-22 14:14:05 +02:00
  • 58c1e909d3 Handle contiguity + bugfix + use in mimi. Laurent 2024-09-22 13:43:46 +02:00
  • 9964c6d86c Improve the api for the rotating cache so that the whole src tensor gets returned when it's overlarge. Laurent 2024-09-22 13:23:23 +02:00
  • fc877920ce More tests for the rotating kv-cache. Laurent 2024-09-22 12:57:46 +02:00
  • 6547c4bfc3 More kv-cache testing. Laurent 2024-09-22 12:51:07 +02:00
  • f9579f80be Test the reset too. Laurent 2024-09-22 12:46:18 +02:00
  • 1bddd44cb8 Add some KvCache tests. Laurent 2024-09-22 12:44:09 +02:00
  • 9cfe3c7141 Add a RotatingKVCache. Laurent 2024-09-22 12:31:25 +02:00
  • c2fca0ca11 Bump the crate version. (#2491) 0.7.0 Laurent Mazare 2024-09-21 15:13:12 +02:00
  • 844d45cde4 Bugfix for the metal elu kernel. (#2490) Laurent Mazare 2024-09-21 15:03:19 +02:00
  • af2104078f Metal commands refactoring (#2489) Laurent Mazare 2024-09-21 13:18:42 +02:00
  • 5fc4f17727 Adding Granite 7b Instruct model example (#2487) Juan Gomez 2024-09-21 05:52:01 -04:00
  • c58c5d5b01 Add the mimi audio-tokenizer. (#2488) Laurent Mazare 2024-09-20 14:31:20 -06:00
  • 382c6b51af Improve error message (#2485) ivnsch 2024-09-20 15:11:41 +02:00
  • 6eea45a761 Add a couple cast metal kernels. (#2479) Laurent Mazare 2024-09-15 21:27:46 +01:00
  • ebf722b446 Export TensorIndexer public to candle users (#2477) Shengtuo Hu 2024-09-13 13:21:57 -07:00
  • c09afc211c Fix for metal tanh. (#2475) Laurent Mazare 2024-09-13 06:08:36 +01:00
  • b60faebea4 Missing metal kernels. (#2474) Laurent Mazare 2024-09-12 12:58:50 +01:00
  • 72d649058b Hook the MLX matmul kernels in candle-core. (#2473) Laurent Mazare 2024-09-12 12:52:59 +01:00
  • 0cb0bd1dfa Add some metal gemm benchark. (#2471) Laurent Mazare 2024-09-11 21:52:37 +01:00
  • afb6575835 Use the new MLX kernels to handle the BF16 matmul. (#2470) Laurent Mazare 2024-09-11 16:34:05 +01:00
  • 5635650d38 Integrate the MLX gemm kernels (#2468) Laurent Mazare 2024-09-11 15:56:48 +01:00
  • 13b2a8a4a0 Complete the missing backticks in the comments (#2469) hongmengning 2024-09-11 22:37:05 +08:00
  • 7ec4f64d38 Attempt at fixing M1/M2 metal async copy bug metal-gemm-testing Ivar Flakstad 2024-09-06 15:59:35 +02:00
  • e3261216b1 Clippy fixes for 1.81.0. (#2461) Laurent Mazare 2024-09-05 22:46:55 +01:00
  • 8712ceb84f Revert "slight changes to async ops" Ivar Flakstad 2024-09-02 12:34:11 +02:00
  • f9b2bb4d46 Revert "Alter metal simdgroup matrix load/store ops" Ivar Flakstad 2024-09-02 12:34:09 +02:00
  • aefca7f8e6 Revert "Testing ushort intermediate in case combo of async and f16/bf16 is the issue" Ivar Flakstad 2024-09-02 12:33:58 +02:00
  • c02b7c3272 Fix FLUX.1 weights (#2457) Eugene Hauptmann 2024-08-29 17:10:28 +02:00
  • 86613c00e2 MobileCLIP models S1 and S2 (#2454) Jani Monoses 2024-08-29 16:38:58 +03:00