Commit Graph

  • fdfe8fd129 Preliminary support for inplace ops. (#1921) Laurent Mazare 2024-03-23 14:16:19 +01:00
  • 790037390c Add cast_bf16_x/cast_x_bf16 when CUDA_ARCH<800 but CUDA_VERSION >= 11000 (#1919) yinqiwen 2024-03-23 20:44:10 +08:00
  • 6f877592a7 Avoid broadcasting on the batch dimension for the attention mask. (#1920) Laurent Mazare 2024-03-23 13:08:53 +01:00
  • cc856db9ce Backwards for ConvTranspose2D (#1910) Kirpal Grewal 2024-03-23 06:05:55 +00:00
  • fc1fe5e45b Support scatter/index_add with i64 indices for f16 (#1915) Daniël de Kok 2024-03-22 11:51:41 +01:00
  • 32f567bac4 Fix loading the gguf files. (#1913) Laurent Mazare 2024-03-22 10:28:38 +01:00
  • fee33b45c2 Add support for strided index-select on Metal (#1909) Thomas Santerre 2024-03-22 02:30:02 -04:00
  • 6708870e63 Add the alloc_uninit function. (#1901) Laurent Mazare 2024-03-22 07:25:23 +01:00
  • a00e24d752 Improve the error message on overlong prompts. (#1908) Laurent Mazare 2024-03-21 21:08:07 +01:00
  • c07e4057ab Fix for the llama model. (#1906) Laurent Mazare 2024-03-21 19:36:10 +01:00
  • c0bdd9c7a6 Use the fast RmsNorm in the quantized model. (#1904) Laurent Mazare 2024-03-21 18:49:35 +01:00
  • 9563a5fee4 Add support for conv_transpose2d on Metal backend (#1903) Thomas Santerre 2024-03-21 13:08:45 -04:00
  • ec97c98e81 Async tensor copying. (#1900) Laurent Mazare 2024-03-21 13:09:42 +01:00
  • bb3ee48039 whisper readme (#1899) Sanchit Gandhi 2024-03-21 17:24:09 +05:30
  • 0c11e055be support distil-large-v3 (#1898) Sanchit Gandhi 2024-03-21 16:16:49 +05:30
  • 18036c6ccb Update the image crate + use the re-exported version. (#1893) Laurent Mazare 2024-03-21 10:56:41 +01:00
  • 0fddec762e RmsNorm kernel for metal. (#1895) Laurent Mazare 2024-03-21 09:48:56 +01:00
  • 74b7f59261 Prepare for the custom-op extension. (#1892) Laurent Mazare 2024-03-21 07:02:20 +01:00
  • af7f8b87d3 Custom op for RmsNorm (#1890) Laurent Mazare 2024-03-21 06:36:28 +01:00
  • b219903d0f Cuda backend optimization (#1886) Laurent Mazare 2024-03-20 18:32:55 +01:00
  • 469635a3eb Minor cleanup. (#1885) Laurent Mazare 2024-03-20 14:38:27 +01:00
  • 455c42aa72 Avoid copying the data on squeeze and unsqueeze. (#1884) Laurent Mazare 2024-03-20 13:04:36 +01:00
  • 2a8679509e Add support for conv_transpose1d for metal backend (#1874) Thomas Santerre 2024-03-19 03:46:58 -04:00
  • 143c481c20 Expose candle gather op in pyo3. (#1870) Laurent Mazare 2024-03-18 21:54:15 +01:00
  • f115895b9e Apply rustfmt. (#1873) Laurent Mazare 2024-03-18 21:43:31 +01:00
  • 90fc82211f Use a common with_tracing::RmsNorm in a few models. (#1871) Jani Monoses 2024-03-18 22:40:06 +02:00
  • 6a966cf9e0 Add a DQN example to the reinforcement-learning section (#1872) Gabriel 2024-03-18 21:22:53 +01:00
  • 04a61a9c72 Add avg_pool2d metal implementation for the metal backend (#1869) Thomas Santerre 2024-03-18 13:50:14 -04:00
  • 5ac3302fac Prebuild all our kernels. precompile_metal Nicolas Patry 2024-03-18 16:39:38 +01:00
  • c974dee369 working bfloat matmul Ivar Flakstad 2024-03-18 14:38:40 +01:00
  • 58605252e8 Microphone support for the encodec example. (#1866) Laurent Mazare 2024-03-18 11:19:46 +01:00
  • d365ef32d9 Improve the encodec example: handle resampling. (#1865) Laurent Mazare 2024-03-18 10:09:40 +01:00
  • 754fa1e813 Add support for max_pool2d for Metal backend (#1863) Thomas Santerre 2024-03-18 03:33:30 -04:00
  • 184105792f add test for index add and add missing match statements (#1862) Thomas Santerre 2024-03-17 17:19:12 -04:00
  • 53f951f6e2 Merge remote-tracking branch 'origin/main' into cuda-conv-tr1d cuda-conv-tr1d laurent 2024-03-17 21:17:56 +01:00
  • a15f859ab4 Fix for the encodec example. (#1861) Laurent Mazare 2024-03-17 21:15:12 +01:00
  • e316cb6997 add support for casting between all datatypes (#1860) Thomas Santerre 2024-03-17 15:55:11 -04:00
  • 52e70856ea Tweaks. laurent 2024-03-17 20:48:21 +01:00
  • 3cae6f5e9a Zero padding. laurent 2024-03-17 20:24:34 +01:00
  • dffafd1049 Small optimization. laurent 2024-03-17 20:15:51 +01:00
  • 75f2aea5fd Fix the kernel. laurent 2024-03-17 19:55:54 +01:00
  • 42ae70c458 Optimize the cuda conv transpose1d kernel. laurent 2024-03-17 19:28:37 +01:00
  • 101a4c8389 Moondream first bits. moondream laurent 2024-03-17 17:49:56 +01:00
  • ce9fbc3682 Optimize the cat operation on contiguous tensors (#1855) Laurent Mazare 2024-03-17 10:49:13 +01:00
  • db8b24ae92 Add support for index u8/i64 and input f16/bf16 scatter-add on metal (#1849) Thomas Santerre 2024-03-17 03:09:43 -04:00
  • 74bf6994b1 Move the image tensor to the appropriate device. (#1856) Laurent Mazare 2024-03-16 22:25:46 +01:00
  • cdc4c172c4 Implement the error trait for DTypeParseError. (#1852) Laurent Mazare 2024-03-15 08:37:27 +01:00
  • e1f9c3776d StableLM-2 models were updated to use GPT-2 tokenization. (#1847) Jani Monoses 2024-03-14 22:01:36 +02:00
  • 3318fe30fb Update gemma README (#1843) Tyler Rockwood 2024-03-13 15:41:36 -05:00
  • 2bb9c683b9 Update README.md (#1840) Thomas Santerre 2024-03-13 09:36:25 -04:00
  • ff03fd3fb3 Expose some helper functions to create quantized models. (#1837) Laurent Mazare 2024-03-12 11:30:24 +01:00
  • df5f69444e Properly handle the batch dimension in cuda quantized matmul. (#1832) Laurent Mazare 2024-03-10 20:23:43 +01:00
  • 0c5eecbc0f Add some tracing to metavoice. (#1826) Laurent Mazare 2024-03-09 12:24:11 +01:00
  • 56c9d3ee7b Fix the model path for rwkv. (#1825) Laurent Mazare 2024-03-09 11:21:48 +01:00
  • dd00482ea3 Quantized version of the metavoice model. (#1824) Laurent Mazare 2024-03-09 11:06:04 +01:00
  • 936f6a4840 Fix dequantization. (#1823) Laurent Mazare 2024-03-08 23:12:13 +01:00
  • 3440cec3a0 Fast CPU kernel for transposed 1d convolutions. (#1822) Laurent Mazare 2024-03-08 22:43:07 +01:00
  • e7fc1daa21 Bump the crate versions to 0.4.2. (#1821) Laurent Mazare 2024-03-08 22:01:51 +01:00
  • be5b68cd0b Metal random-generation bug fixes (#1811) Niklas Hallqvist 2024-03-08 16:11:50 +01:00
  • ea984d0421 Expose more printer options. (#1817) Laurent Mazare 2024-03-08 15:04:18 +01:00
  • 9634583781 Expose a couple layout methods. (#1816) Laurent Mazare 2024-03-08 10:52:22 +01:00
  • 758366160e add clone to candle dropout (#1814) Kirpal Grewal 2024-03-08 07:18:01 +00:00
  • 0a3487a776 Add a --seed argument to the stable-diffusion example. (#1812) Niklas Hallqvist 2024-03-08 08:17:36 +01:00
  • 0c09d10f32 Improve metal buffer usage (#1807) ivarflakstad 2024-03-07 09:42:34 +01:00
  • 9dc53ec8ad Last push. spkemb laurent 2024-03-05 23:18:30 +01:00
  • 577316bc4e And another fix. laurent 2024-03-05 22:41:16 +01:00
  • b5ee026cea A few more tweaks. laurent 2024-03-05 22:39:06 +01:00
  • 52ed77c16f Add an argument for the speaker encoder weights. laurent 2024-03-05 22:33:41 +01:00
  • dae32d13d6 Speaker embeddings for metavoice. laurent 2024-03-05 22:19:30 +01:00
  • 8a99cf7dd2 Add a flag to select the dtype used in metavoice. (#1805) Laurent Mazare 2024-03-05 12:16:00 +01:00
  • bd9ab9bc04 Add a cuda kernel for dequantizing q8_0. (#1804) Laurent Mazare 2024-03-05 09:50:37 +01:00
  • 8cc0a183ba Speaker embeddings computation for metavoice. (#1800) Laurent Mazare 2024-03-04 14:13:01 +01:00
  • 6530932285 Add the new models to the main readme. (#1797) Laurent Mazare 2024-03-03 16:25:14 +01:00
  • 924ccae30c Add an initial Segformer implementation (#1617) Jiayu Liu 2024-03-03 23:01:46 +08:00
  • 60dc72b96b More metavoice tweaks. (#1796) Laurent Mazare 2024-03-03 15:05:25 +01:00
  • 20abb72fec Normalize loudness of the generated audio (#1795) Laurent Mazare 2024-03-03 14:00:42 +01:00
  • ca5d727ba2 Use the same padding in metavoice as in the python version. (#1794) Laurent Mazare 2024-03-03 12:04:48 +01:00
  • 09e0148cce Tweaks to run metavoice on metal (#1792) Laurent Mazare 2024-03-03 07:46:44 +01:00
  • de11623752 Metavoice position fix (#1791) Laurent Mazare 2024-03-02 21:00:35 +01:00
  • 21f1d04976 Add the instruction finetuned gemma variants. (#1790) Laurent Mazare 2024-03-02 18:56:59 +01:00
  • 4fff5b51f5 Metavoice - first cut (#1717) Laurent Mazare 2024-03-02 18:50:01 +01:00
  • 314630638d Rustfmt fix. (#1788) Laurent Mazare 2024-03-02 10:35:07 +01:00
  • 3e3def4134 Update StableLM config (#1787) Frkri 2024-03-02 09:56:57 +01:00
  • 6980774a91 fix rwkv example eos token (#1785) Jack Shih 2024-03-01 17:22:28 +08:00
  • 64d4038e4f Mention rwkv v6 in the readmes. (#1784) Laurent Mazare 2024-03-01 08:58:30 +01:00
  • 979deaca07 EfficientVit (MSRA) model (#1783) Jani Monoses 2024-03-01 09:53:52 +02:00
  • b485e4b6ee add models of rwkv v6 and quantized rwkv v6 (#1781) Jack Shih 2024-03-01 15:37:56 +08:00
  • 2c95b7394a Handle Q5_0 and Q5_1 quants in cuda. laurent 2024-02-29 10:54:01 +01:00
  • 4fd00b8900 Add the StarCoder2 model. (#1779) Laurent Mazare 2024-02-28 21:02:41 +01:00
  • 57267cd536 Add a flag to force running the quantized model on CPUs. (#1778) Laurent Mazare 2024-02-28 14:58:42 +01:00
  • 60ee5cfd4d Support more modes in the encodec example. (#1777) Laurent Mazare 2024-02-28 09:22:33 +01:00
  • 56e44aabe3 Make some dependencies optional in the examples. (#1776) Laurent Mazare 2024-02-28 07:17:03 +01:00
  • d0aca6c3c6 Encodec encoding demo. (#1775) Laurent Mazare 2024-02-28 06:49:03 +01:00
  • 15e8644149 Apply dilations in the encodec model. (#1772) Laurent Mazare 2024-02-27 23:26:35 +01:00
  • 0c49e95dfb Encodec model. (#1771) Laurent Mazare 2024-02-27 22:59:40 +01:00
  • 205767f9de Avoid tensor copying in the quantized example. (#1770) Laurent Mazare 2024-02-27 20:32:30 +01:00
  • 5e526abc8c Bump the version number to 0.4.1. (#1768) Laurent Mazare 2024-02-27 14:19:59 +01:00
  • 6400e1b0a0 Fix the block size for some cuda kernels. (#1767) Laurent Mazare 2024-02-27 14:08:33 +01:00
  • 32544a2ad6 Add an option to split the prompt. (#1766) Laurent Mazare 2024-02-27 11:24:11 +01:00
  • badf886583 Cuda kernel for dequantizing q8k. (#1760) Laurent Mazare 2024-02-26 08:42:44 +01:00