Commit Graph

  • cd4d941ed1 Add LLaVA support (#2234) chenwanqq 2024-06-03 17:54:09 +08:00
  • 03344d3c19 ONNX: Add Floor and Ceil (#2235) mokulus 2024-06-02 21:45:20 +02:00
  • 1ec3b2cc18 add where_cond f32 for metal (#2236) Lionel Touati 2024-06-02 14:30:06 +02:00
  • f7773d498a Deactivate some book test that breaks the CI. (#2233) Laurent Mazare 2024-06-01 09:44:22 +02:00
  • 84cd5158ad Update gemm requirement from 0.17.0 to 0.18.0 dependabot/cargo/gemm-0.18.0 dependabot[bot] 2024-06-01 06:19:34 +00:00
  • 7abc3b8cd7 Bump cudarc version to 0.11.4 (#2230) Eric Buehler 2024-06-01 02:18:35 -04:00
  • 46012ed31f Another cudarc update. (#2229) Laurent Mazare 2024-05-30 22:27:06 +02:00
  • f3fade3b03 Update cudarc to 0.11.2. (#2227) Laurent Mazare 2024-05-29 18:50:52 +02:00
  • ea260aeffd Add Debug, Clone, Deserialize to moondream config (#2222) Dave Lage 2024-05-28 00:08:00 -04:00
  • 0814dfd148 Add a metal kernel for col2im1d. (#2214) Laurent Mazare 2024-05-25 11:03:23 +02:00
  • 3ceca9901a Enable the new layer-norm. (#2213) Laurent Mazare 2024-05-24 16:48:21 +02:00
  • 1df2bddccf Add the layernorm specialized op. (#2212) Laurent Mazare 2024-05-24 15:58:01 +02:00
  • 6f0b807ffd More efficient cuda implementation for ConvTranspose1d. (#2211) Laurent Mazare 2024-05-24 11:05:43 +02:00
  • d54e02d73d Avoid a contiguous call in the quantized phi 3 model. (#2209) Laurent Mazare 2024-05-23 21:24:55 +02:00
  • 45e235a747 Simplify the KvCache api. (#2207) Laurent Mazare 2024-05-23 17:07:21 +02:00
  • 31cf64147b Add a couple kv-cache helper functions. (#2206) Laurent Mazare 2024-05-23 16:21:47 +02:00
  • 77ea479a18 Add Phi-3 Medium (#2205) Jani Monoses 2024-05-23 14:33:17 +03:00
  • 72e7ca529a Add some missing where-cond kernels for metal. (#2203) Laurent Mazare 2024-05-22 09:44:52 +02:00
  • a394dfe4c1 Update imageproc requirement from 0.24.0 to 0.25.0 dependabot/cargo/imageproc-0.25.0 dependabot[bot] 2024-05-21 19:49:19 +00:00
  • 7ff921c538 Add RandomNormal ONNX operator (#2200) 0.5.1 mokulus 2024-05-21 21:47:32 +02:00
  • 567247fdcf Update metal requirement from 0.27.0 to 0.28.0 dependabot/cargo/metal-0.28.0 dependabot[bot] 2024-05-21 19:45:53 +00:00
  • 9b8537a62f Remove the deprecated wav crate in favor of hound. (#2202) Laurent Mazare 2024-05-21 21:43:35 +02:00
  • 7ebc3548e1 Use flash-attn in gemma. (#2195) Laurent Mazare 2024-05-18 19:18:59 +02:00
  • eefc1c77ef Support flash-attn in quantized phi3. (#2194) Laurent Mazare 2024-05-18 17:12:56 +02:00
  • 01545f7303 Add a slice_set op. (#2193) Laurent Mazare 2024-05-18 15:58:18 +02:00
  • 349c3e806a Support embedding model gte-Qwen1.5-7B-instruct (#2190) Yin Guobing 2024-05-17 03:34:10 +08:00
  • bdaa34216a chore: add fix for windows cudarc into the readme (#2189) Martin Stefcek 2024-05-16 14:32:50 +02:00
  • cc80e065e5 Allow the threshold argumet to be negative in the segment-anything example (#2187) Daniel Varga 2024-05-15 13:17:20 +02:00
  • 13c64f6828 Fix VarBuilder::from_slice_safetensors (#2180) Harry Stern 2024-05-12 01:26:06 -04:00
  • 21f82a5155 Add SliceSafetensors. (#2179) Laurent Mazare 2024-05-11 13:15:42 +02:00
  • 9cff7bc3f4 Make it possible to use TF32 accumulation in F32 matmuls. (#2178) Laurent Mazare 2024-05-11 12:28:39 +02:00
  • 08fd7f7119 Typo fix b1rtek 2024-05-10 00:51:01 +02:00
  • 2ced31b530 Added a test for LeakyRelu b1rtek 2024-05-10 00:50:05 +02:00
  • 91b0d526ee Added LeakyRelu implementation b1rtek 2024-05-10 00:49:54 +02:00
  • 4de76b89a2 Added tests for ArgMax b1rtek 2024-05-09 20:45:53 +02:00
  • 8f1119b3e0 Added ArgMax operator implementation b1rtek 2024-05-09 20:45:41 +02:00
  • c4743aa570 Added tests from pytorch examples b1rtek 2024-05-09 20:22:34 +02:00
  • 9a273196b7 ArgMin now returns a tensor with i64 b1rtek 2024-05-09 20:22:22 +02:00
  • d9bc5ec151 Switch cudarc back to dynamic linking. (#2176) Laurent Mazare 2024-05-09 10:35:44 +02:00
  • 13b88547f7 Added tests for ArgMin b1rtek 2024-05-09 03:00:22 +02:00
  • 1caf62e4a6 Added ArgMin operator implementation b1rtek 2024-05-09 03:00:15 +02:00
  • 84328e2b60 Update cudarc requirement from 0.11.0 to 0.11.1 (#2174) Sidharth Rajaram 2024-05-08 11:40:36 -07:00
  • 82b641fd27 Update cudarc requirement from 0.10.0 to 0.11.0 (#2165) dependabot[bot] 2024-05-06 17:12:14 +02:00
  • 01794dc16e Use write rather than try-write on the metal rw-locks. (#2162) Laurent Mazare 2024-05-05 07:22:46 +02:00
  • f7980abbcd Improve the sampling methods. improve-sampling laurent 2024-05-04 10:53:30 +02:00
  • a75cd8164f Force the revision for the phi3-llama quantized models. (#2159) Laurent Mazare 2024-05-04 10:41:18 +02:00
  • b13a82a438 Separate quantized phi-3 implementation. (#2157) Laurent Mazare 2024-05-04 10:14:57 +02:00
  • 59b18d974e Pin the version used for the quantized phi 3 gguf file. (#2156) Laurent Mazare 2024-05-03 15:03:22 +02:00
  • 89f53b9d7b Bump the version number to 0.5.1. (#2155) Laurent Mazare 2024-05-03 11:17:05 +02:00
  • a09d451d11 Support top-k in tthe llama example. (#2150) Laurent Mazare 2024-05-01 22:25:47 +02:00
  • fa06f5f5f9 F16/BF16 bugfix (bis). (#2143) Laurent Mazare 2024-04-29 14:08:44 +02:00
  • 09d4845aa8 Bugfix the recent f16/bf16 changes. (#2142) Laurent Mazare 2024-04-29 13:30:11 +02:00
  • a0d03aded1 Bug Fix: When converting a tensor to a variable, clone if the tensor is already a variable. (#2124) Jeffrey Dallatezza 2024-04-29 02:21:53 -07:00
  • 3bbb88fcb4 Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114) MilkFather 2024-04-29 17:04:43 +08:00
  • ed7b99f525 Add a toggle for F16/BF16 accumulation in gemm. (#2141) Laurent Mazare 2024-04-29 09:21:07 +02:00
  • 287013ef28 Add a forward_via_f16 method to the qmatmul op. (#2138) Laurent Mazare 2024-04-28 20:35:01 +02:00
  • eb26e2467e Add the cuda dequantize f16 kernels. (#2137) Laurent Mazare 2024-04-28 20:05:05 +02:00
  • c68ed8963f chore: fix some typos in comments (#2121) hardlydearly 2024-04-28 14:34:32 +08:00
  • e5c8b88f90 Apply the cast before the scaling. (#2135) Laurent Mazare 2024-04-28 08:30:35 +02:00
  • 805f3be8e1 Add a sort function. (#2134) Laurent Mazare 2024-04-28 08:18:04 +02:00
  • 3b429f3023 Make the dtype configurable for phi. (#2133) Laurent Mazare 2024-04-27 21:32:49 +02:00
  • 96a48e5cc4 Add argsort. (#2132) Laurent Mazare 2024-04-27 20:17:35 +02:00
  • 6cf82fd7a3 Add Olmo models (#2127) Isotr0py 2024-04-26 17:02:51 +08:00
  • cfab6e7616 Mention phi-v3 in the readmes. (#2122) Laurent Mazare 2024-04-24 20:54:24 +02:00
  • 11d4a3c588 Add the phi-3 model. (#2120) Laurent Mazare 2024-04-24 09:48:13 +02:00
  • 9d3f1c8af5 Add the phi-v3 quantized model. (#2118) Laurent Mazare 2024-04-24 08:22:23 +02:00
  • 7211009179 Fix for rustfmt. (#2117) Laurent Mazare 2024-04-23 19:09:33 +02:00
  • 6fadaf2eff candle-onnx: add operators RandomUniform and Exp (#2116) B1rtek 2024-04-23 19:02:19 +02:00
  • a06b2ded28 Merge branch 'refs/heads/random' into operators-random-exp B1rtek 2024-04-23 17:36:33 +02:00
  • a867d652d3 Merge branch 'refs/heads/exp' into operators-random-exp B1rtek 2024-04-23 17:33:05 +02:00
  • 8a05743a21 Add StorageRef. (#2113) Laurent Mazare 2024-04-23 13:23:27 +02:00
  • b2e816752b Use the faster rms-norm kernel for llama. (#2107) Laurent Mazare 2024-04-22 18:52:00 +02:00
  • 618ecf5e23 Better time measurement for the llama example. (#2106) Laurent Mazare 2024-04-22 17:54:27 +02:00
  • 267601eec1 Update tokenizers requirement from 0.15.0 to 0.19.1 (#2104) dependabot[bot] 2024-04-22 17:10:46 +02:00
  • 08a15cb79e Update zip requirement from 0.6.6 to 1.1.1 (#2103) dependabot[bot] 2024-04-22 16:23:27 +02:00
  • c388be93e7 Updated quantized phi model (#2099) Laurent Mazare 2024-04-21 07:37:07 +02:00
  • d22f1d4f4e Derive clone and debug traits for Moondream model (#2100) Santiago Medina 2024-04-20 22:08:28 -07:00
  • 0067fe00a8 Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056) Thomas Santerre 2024-04-20 18:10:33 -04:00
  • 587ee3bb6f Small cleanups to the llama multi-process example. (#2098) Laurent Mazare 2024-04-20 22:19:46 +02:00
  • dd78422701 Handle multiple dimensions in metal QMM + two fixes. (#2097) Laurent Mazare 2024-04-20 18:55:45 +02:00
  • 9215e9ce8c Add missing onnx operations (#2096) Gabriel 2024-04-20 18:44:22 +02:00
  • 52ae332910 Use llama v3 by default + add to readme. (#2094) Laurent Mazare 2024-04-20 16:11:24 +02:00
  • 8b390ddd29 Only download the weights in the main process (and not in the child processes). (#2093) Laurent Mazare 2024-04-20 13:01:23 +02:00
  • c97d639fa0 Multiprocess/multi-GPU support for llama 3. (#2092) Laurent Mazare 2024-04-20 12:49:21 +02:00
  • 70388c27b6 Added Exp operator implementation b1rtek 2024-04-19 22:48:05 +02:00
  • b45c710dbf Fix for gemma MQA. (#2091) Laurent Mazare 2024-04-19 21:49:55 +02:00
  • 0fa41a791f Use is_some to check if seed is present Mateusz Okulus 2024-04-19 16:09:45 +02:00
  • 46073c5f73 Add basic RandomUniform implementation Mateusz Okulus 2024-04-19 16:06:43 +02:00
  • 6d6d87f8b3 Use BF16 for llama v3 by default. llama-v3-mp laurent 2024-04-19 14:22:01 +02:00
  • 9c532aef47 Also enable llama-v3 8b instruct. (#2088) Laurent Mazare 2024-04-19 08:50:06 +02:00
  • f7a6468238 Add support for llama3 on the quantized example (#2086) Thomas Santerre 2024-04-18 16:52:00 -04:00
  • 2b93dffe64 Use faster rotary embeddings for llama like models. (#2087) Laurent Mazare 2024-04-18 22:34:29 +02:00
  • e6ee7ba4d4 Llama v3. (#2085) Laurent Mazare 2024-04-18 22:19:54 +02:00
  • 1690ab45d2 Fix the silu gradient issue on 0. (#2083) Laurent Mazare 2024-04-18 14:31:41 +02:00
  • 8de0ce6cba Add more QMMV cuda kernels. (#2077) Laurent Mazare 2024-04-18 08:36:43 +02:00
  • ce6d08df94 Minor fix to the readme. (#2080) Laurent Mazare 2024-04-17 22:43:00 +02:00
  • 3754b834f4 More prep work for phi. phi2-gguf laurent 2024-04-17 10:23:15 +02:00
  • d79041d94d Rework the MLP bit. laurent 2024-04-17 09:28:50 +02:00
  • af11b2d461 Prepare for supporting phi-2 properly in the quantized model. laurent 2024-04-17 09:14:38 +02:00
  • 2817643db9 Add the mmv kernels for small batch sizes. (#2075) Laurent Mazare 2024-04-16 21:30:51 +02:00