Commit Graph

  • fdb1acd2ff Move llama in a cargo-examples directory. laurent 2023-07-03 11:30:58 +01:00
  • d0d530dfdc Merge pull request #59 from LaurentMazare/safety Nicolas Patry 2023-07-03 12:00:35 +02:00
  • 81cec86e75 Adding a bit more docs around safety. Nicolas Patry 2023-07-03 11:55:54 +02:00
  • 48089005f6 Merge pull request #58 from LaurentMazare/use-patched-gemm Laurent Mazare 2023-07-03 10:36:09 +01:00
  • 639270b796 Use the patched gemm for the time being. laurent 2023-07-03 10:29:15 +01:00
  • ec4871b8a4 Merge pull request #57 from LaurentMazare/safetensor-module2 Laurent Mazare 2023-07-03 10:19:57 +01:00
  • 899c76de75 Handle more types in safetensors. laurent 2023-07-03 10:09:46 +01:00
  • 783b7054ee Move more safetensors bits to the shared module. laurent 2023-07-03 09:34:08 +01:00
  • b036faf6a0 Merge pull request #56 from LaurentMazare/safetensor-module Laurent Mazare 2023-07-03 08:47:25 +01:00
  • fe2c07e368 Add the ST error. laurent 2023-07-03 08:44:00 +01:00
  • cf2789fb81 Move some safetensors bits in the candle-core crate. laurent 2023-07-03 08:37:46 +01:00
  • 9e419641fb Merge pull request #55 from LaurentMazare/pyo3-device Laurent Mazare 2023-07-02 21:04:58 +01:00
  • 5b0ee2e0ba Get cuda to work on pyo3. laurent 2023-07-02 21:04:11 +01:00
  • fbfe74caab Preliminary pyo3 support for device. laurent 2023-07-02 20:42:55 +01:00
  • eb6f7d30b6 Merge pull request #54 from LaurentMazare/more-pyo3-2 Laurent Mazare 2023-07-02 20:21:37 +01:00
  • bdb257ceab Add the tensor function. laurent 2023-07-02 20:15:50 +01:00
  • 78871ffe38 Add dtype support. laurent 2023-07-02 20:12:26 +01:00
  • 65e069384c Merge pull request #53 from LaurentMazare/more-pyo3 Laurent Mazare 2023-07-02 07:50:49 +01:00
  • d38897461b Add to the example. laurent 2023-07-02 07:37:17 +01:00
  • 5b8c6764b0 Add matmul/where_cond. laurent 2023-07-02 07:34:14 +01:00
  • 9a9858bbe0 Expose a couple more ops. laurent 2023-07-02 07:30:00 +01:00
  • dfe197f791 Handle more input types to create tensors. laurent 2023-07-02 07:19:46 +01:00
  • 4a28dcf828 Rename the method. laurent 2023-07-02 07:08:11 +01:00
  • c62cb73a7f Support higher order shapes for conversions. laurent 2023-07-02 07:07:22 +01:00
  • fa58c7643d Add a trait to avoid repeating the dtype matching. laurent 2023-07-02 06:58:10 +01:00
  • 2596821a08 Merge pull request #52 from LaurentMazare/pyo3 Laurent Mazare 2023-07-02 06:35:31 +01:00
  • 2370b1675d More pyo3. laurent 2023-07-01 22:15:58 +01:00
  • 86df4ad79c Get shape to return a tuple. laurent 2023-07-01 21:34:38 +01:00
  • fbbde5b02c Add some binary operators. laurent 2023-07-01 21:27:35 +01:00
  • 42d1a52d01 Add two methods. laurent 2023-07-01 20:55:15 +01:00
  • 52db2a6849 Apply rustfmt. laurent 2023-07-01 20:37:28 +01:00
  • ebb0fedf14 Very simple pyo3 bindings for candle. laurent 2023-07-01 20:36:44 +01:00
  • dd879f5b67 Merge pull request #51 from LaurentMazare/custom-prompt Laurent Mazare 2023-07-01 06:40:36 +01:00
  • 7c65e2d187 Add a flag for custom prompt. laurent 2023-07-01 06:36:22 +01:00
  • 2c04bff12f Merge pull request #50 from LaurentMazare/rayon1 Laurent Mazare 2023-06-30 18:56:26 +01:00
  • bbe0c5fbaa Do not use rayon for a single thread (bis). laurent 2023-06-30 18:47:22 +01:00
  • 6b67d25d9f Do not use rayon for a single thread. laurent 2023-06-30 18:46:32 +01:00
  • b8b175c01e Merge pull request #49 from LaurentMazare/llama-dtype Laurent Mazare 2023-06-30 16:43:56 +01:00
  • 679b6987b6 Early conversion for the llama weights. laurent 2023-06-30 16:42:53 +01:00
  • dbd7d5b3fd Merge pull request #47 from LaurentMazare/llama-f32 Laurent Mazare 2023-06-30 15:04:33 +01:00
  • ed4d0959d3 Add a const to easily tweak the dtype used for llama internal computations. laurent 2023-06-30 15:01:39 +01:00
  • a243504f53 Merge pull request #46 from LaurentMazare/bugfix-cuda-u8-bf16 Laurent Mazare 2023-06-30 10:45:48 +01:00
  • 313fa022a5 Bugfix: remove the u8/bf16 conversion kernel as it is ambiguous. laurent 2023-06-30 10:43:32 +01:00
  • d2ab4f86bf Merge pull request #45 from LaurentMazare/u8 Laurent Mazare 2023-06-30 10:35:51 +01:00
  • fbc329ed85 Add the verbose cpu cast operations. laurent 2023-06-30 10:33:29 +01:00
  • 8ad47907f3 Add the kernels. laurent 2023-06-30 10:26:56 +01:00
  • a7b16cbb98 Merge pull request #44 from LaurentMazare/check-dim Laurent Mazare 2023-06-30 09:14:45 +01:00
  • 19cbbc5212 Improve how we check that the dims are in bounds. laurent 2023-06-30 09:11:00 +01:00
  • 00476d37f8 Merge pull request #43 from LaurentMazare/bf16 Laurent Mazare 2023-06-30 05:48:58 +01:00
  • 6486a6d7b2 Avoid some cast kernels. laurent 2023-06-29 23:23:44 +01:00
  • ec79fc43f2 Add the bf16 cuda kernels. laurent 2023-06-29 23:12:02 +01:00
  • 018e017e7e Merge pull request #42 from LaurentMazare/kv-cache-enable Laurent Mazare 2023-06-29 22:22:11 +01:00
  • f6152e74b6 Tweak the kv-cache flag. laurent 2023-06-29 22:16:40 +01:00
  • ae3f202f3b Add a flag. laurent 2023-06-29 22:12:15 +01:00
  • 23389b1bd7 Enable the KV cache after fixing the caching length and the rope bits. laurent 2023-06-29 22:00:57 +01:00
  • e87a99d16e Merge pull request #41 from LaurentMazare/kv-cache Laurent Mazare 2023-06-29 19:11:52 +01:00
  • af66f0829e Revert the new profile. laurent 2023-06-29 19:08:50 +01:00
  • b50bd880ce Only narrow when needed + deactivate the kv cache. laurent 2023-06-29 19:07:52 +01:00
  • 4b148b5414 Merge pull request #40 from LaurentMazare/fix_kernel_cache Nicolas Patry 2023-06-29 18:02:06 +02:00
  • 1ea08a19cb Rerun on new files. Nicolas Patry 2023-06-29 15:59:58 +00:00
  • b5bdbef53a Fixing kernel cache (a bit brutal for now, but if build triggers, rebuild ALL kernels). Nicolas Patry 2023-06-29 15:51:08 +00:00
  • 3232df9458 Add some KV cache to llama. laurent 2023-06-29 15:29:40 +01:00
  • 889f7e0971 Merge pull request #39 from LaurentMazare/anyhow-backtrace Laurent Mazare 2023-06-29 13:17:53 +01:00
  • e27ee98d3f Add backtraces. laurent 2023-06-29 13:17:20 +01:00
  • e90f4aad26 Merge pull request #38 from LaurentMazare/llama_f16 Nicolas Patry 2023-06-29 14:12:31 +02:00
  • 78ec40b077 Typo. Nicolas Patry 2023-06-29 12:09:53 +00:00
  • de48e6fd59 Putting back main. Nicolas Patry 2023-06-29 12:08:35 +00:00
  • 0958c588f6 Putting back seed. Nicolas Patry 2023-06-29 12:07:21 +00:00
  • c5e8f788be Revert some changes. Nicolas Patry 2023-06-29 12:05:53 +00:00
  • e63ed6aaa3 Remove unwrap. Nicolas Patry 2023-06-29 12:04:25 +00:00
  • 2fe1d3e36d Moving llama to f16. Nicolas Patry 2023-06-29 11:56:49 +00:00
  • 31396a3b9f Merge pull request #37 from LaurentMazare/llama-seed Laurent Mazare 2023-06-29 12:51:45 +01:00
  • b4dc9f6108 Add a seed parameter to llama. laurent 2023-06-29 12:47:19 +01:00
  • 53628db3a9 Merge pull request #36 from LaurentMazare/fix_example Nicolas Patry 2023-06-29 13:36:05 +02:00
  • 1913512f42 Simple example fix. Ubuntu 2023-06-29 11:10:57 +00:00
  • c0719b7781 Merge pull request #35 from LaurentMazare/const-scalar Laurent Mazare 2023-06-29 12:10:19 +01:00
  • 2741b39ad3 Use broadcasted scalars for const tensors. laurent 2023-06-29 11:56:40 +01:00
  • 3872dc4751 Merge pull request #19 from LaurentMazare/llama_safetensors Nicolas Patry 2023-06-29 12:49:26 +02:00
  • 5930168457 Merge pull request #34 from LaurentMazare/simpler-dtype-trait Laurent Mazare 2023-06-29 11:41:17 +01:00
  • b4aab7b95f Put more requirements on the withdtype trait. laurent 2023-06-29 11:37:42 +01:00
  • c8fc9da737 Merge pull request #33 from LaurentMazare/cuda-map Laurent Mazare 2023-06-29 10:14:12 +01:00
  • c9c468e1aa Use Map2 for binary ops. laurent 2023-06-29 10:09:15 +01:00
  • 83c7d660ca Add Map2. laurent 2023-06-29 10:05:06 +01:00
  • 367170da45 Also use Map1 for embedding. laurent 2023-06-29 09:45:27 +01:00
  • 8ad03a5fb6 Use Map1 on unary ops. laurent 2023-06-29 09:37:38 +01:00
  • fff13dbb4e Factorize the kernel naming scheme. laurent 2023-06-29 09:29:59 +01:00
  • d3c7b0d168 Use Map1 for sum. laurent 2023-06-29 09:27:07 +01:00
  • 122e334d0c Simplify the pattern matching logic in the cuda backend. laurent 2023-06-29 09:21:11 +01:00
  • eda46d2df2 Merge pull request #32 from LaurentMazare/running_less_ci Nicolas Patry 2023-06-29 08:24:31 +02:00
  • 5f65d46c32 Merge pull request #29 from LaurentMazare/cpu-map Laurent Mazare 2023-06-29 05:27:59 +01:00
  • f08f146348 Merge pull request #31 from LaurentMazare/fix_hub Nicolas Patry 2023-06-29 00:22:17 +02:00
  • 0862e7d9e9 Windows 2019 is slower to load (I guess less availability). Ubuntu 2023-06-28 22:21:38 +00:00
  • d3000ac9eb Running CI only when pushing on main and on pull request. Ubuntu 2023-06-28 22:20:31 +00:00
  • beccf673f4 Fixing hub test. Ubuntu 2023-06-28 22:16:46 +00:00
  • eaa3ce359e Cosmetic change. laurent 2023-06-28 22:02:23 +01:00
  • 1328b5cb20 Factor some code out. laurent 2023-06-28 21:56:44 +01:00
  • c583ee0f2c Add map2. laurent 2023-06-28 21:38:01 +01:00
  • 46c07b924c Tweak some comment. laurent 2023-06-28 21:10:54 +01:00
  • 2ae368e98e Switch from a macro to a trait to make things more generic. laurent 2023-06-28 21:06:56 +01:00
  • 0cfa21f26a Merge pull request #27 from LaurentMazare/layout-refactor Laurent Mazare 2023-06-28 15:59:53 +01:00