Commit Graph

  • 4d14777673 Utilize batches in Stable Diffusion (#2071) NorilskMajor 2024-04-16 13:49:04 +09:00
  • f135b7963d Fix for the batch dim in the quantized matmul example. (#2073) Laurent Mazare 2024-04-15 20:00:28 +02:00
  • af955f260c Make the falcon model cloneable. (#2067) Laurent Mazare 2024-04-15 09:39:03 +02:00
  • 8ad822a983 Add a function to clear the KV cache in falcon. (#2066) Laurent Mazare 2024-04-15 09:29:25 +02:00
  • e198bb0816 Handle zero dims in some simple operations. (#2064) Laurent Mazare 2024-04-15 09:18:54 +02:00
  • f7d5bf5b97 Faster kernels for quantized matmul on cuda (#2060) Laurent Mazare 2024-04-15 08:32:47 +02:00
  • c119600d6e Move image tensor to device in trocr example (#2063) Harry Stern 2024-04-15 00:50:32 -04:00
  • c449f65b12 Expose the synchronize function on the generic device. (#2062) Laurent Mazare 2024-04-14 23:02:03 +02:00
  • db7dbf3071 Add missing bfloat unary strided kernels and fix typo (#2058) ivarflakstad 2024-04-14 20:01:13 +02:00
  • 4ecedb1598 Add the full quantized matmul kernels for cuda. (#2057) Laurent Mazare 2024-04-14 17:52:08 +02:00
  • 53e5380bf6 Add a synchronize method to devices. (#2055) Laurent Mazare 2024-04-14 16:32:55 +02:00
  • 50e49ecc5f Add a quantized version of recurrent-gemma. (#2054) Laurent Mazare 2024-04-13 20:07:01 +02:00
  • 4c88c3ce06 Add benchmarks for qmatmul operations (#2048) Thomas Santerre 2024-04-13 06:30:14 -04:00
  • 8b8fb630df Add a convenient way to rename tensors accessed through a varbuilder. (#2052) Laurent Mazare 2024-04-13 12:09:41 +02:00
  • fb805b8ca2 Avoid crashes when running T5 models with F16 tensors on CPU (#2047) Victor-Mihaila 2024-04-13 11:07:28 +02:00
  • 79e3bec789 Change for the encoder-only ProstT5 model (#2045) Victor-Mihaila 2024-04-13 11:06:24 +02:00
  • e6d412b156 Add ReduceMean onnx operation (#2049) Gabriel 2024-04-13 11:00:25 +02:00
  • 26cbbf8d84 Mandatory topk sampling for recurrent-gemma. (#2051) Laurent Mazare 2024-04-13 10:31:39 +02:00
  • 2bf413caa3 Add the recurrent-gemma model. (#2039) Laurent Mazare 2024-04-13 00:05:21 +02:00
  • 3ad4770eb6 Use cat for faster MQA computation. (#2043) Laurent Mazare 2024-04-12 09:15:10 +02:00
  • 6e92129f54 Add missing bfloat unary strided kernels metal-mfa-bfloat Ivar Flakstad 2024-04-11 16:20:45 +02:00
  • be2dcbd55d Merge branch 'main' into metal-mfa-bfloat Ivar Flakstad 2024-04-11 14:39:46 +02:00
  • a0460cd2b1 Add the code-gemma models. (#2038) Laurent Mazare 2024-04-10 21:19:21 +02:00
  • b81ecf712d Support alternative dtypes for mamba (#2036) Laurent Mazare 2024-04-10 18:10:01 +02:00
  • a4d5a414e3 Support gather on bf16 for metal. (#2035) Laurent Mazare 2024-04-10 12:49:25 +02:00
  • 798e0335cd Handle more tensor shapes in onnx "Gather" operation (#2026) Gabriel 2024-04-08 14:06:14 +02:00
  • 718671a0d5 Use BufferOffset in metal backend ops. (#2029) Laurent Mazare 2024-04-08 09:37:25 +02:00
  • c5fe4a7f89 Rework the buffer offset logic for metal kernels (#2028) Laurent Mazare 2024-04-07 22:37:53 +02:00
  • 7f354473cf Optimize copy-2d for metal. (#2024) Laurent Mazare 2024-04-07 12:34:16 +02:00
  • 33c9b66554 Add the new gemma models. (#2023) copy2d-metal Laurent Mazare 2024-04-06 21:25:38 +02:00
  • 9fd52b3b71 Handle the batch dimension in quantized MMV on metal. (#2022) Laurent Mazare 2024-04-06 20:02:24 +02:00
  • e662431acf Fix the final rmsnorm for quantized-metavoice. (#2021) Laurent Mazare 2024-04-06 19:35:01 +02:00
  • 09fafcfa99 Copy multi metal [do not merge] copy-multi-metal Laurent 2024-04-06 10:11:16 +02:00
  • ab892274d1 first commit (#2018) Jorge António 2024-04-05 14:20:28 +01:00
  • b869a659ec Faster mask implementation for mixformers. (#2017) Laurent Mazare 2024-04-05 09:38:26 +02:00
  • 88f7793598 Moondream tracing. (#2016) Laurent Mazare 2024-04-05 09:11:08 +02:00
  • 2ac302a5d1 Add the rope THD kernel. (#2014) Laurent Mazare 2024-04-05 08:32:58 +02:00
  • ace282e5c2 Add flag to run Moondream in f16 precision (#2015) Santiago Medina 2024-04-04 22:03:33 -07:00
  • c87381fc96 Use F16 for moondream on cuda. (#2013) Laurent Mazare 2024-04-04 23:30:10 +02:00
  • c5626b8271 Add support for "sign" on tensors (#2012) Thomas Santerre 2024-04-04 16:32:47 -04:00
  • e6a5b82ba6 Fix the matmul layout for accelerate & mkl. (#2011) Laurent Mazare 2024-04-04 19:18:03 +02:00
  • 5aebe53dd2 update dtypes checks for several metal operations (#2010) Thomas Santerre 2024-04-04 12:39:06 -04:00
  • f76bb7794a Bumping the version number to 0.5.0. (#2009) Laurent Mazare 2024-04-04 17:48:45 +02:00
  • 30b145150f Optimize the gelu f16 opt. (#2008) Laurent Mazare 2024-04-04 16:28:23 +02:00
  • f48c07e242 Include topk sampling in the quantized example. (#2005) Laurent Mazare 2024-04-04 09:27:54 +02:00
  • 8967c46563 Split the cuda error file. (#2003) Laurent Mazare 2024-04-04 08:27:23 +02:00
  • 1e46cf8b19 Minor cleanups in reduce.metal. (#2004) Laurent Mazare 2024-04-04 08:26:02 +02:00
  • bd8db2a771 refactor to reduce the amount of code wrapped in template syntax (#2002) Thomas Santerre 2024-04-04 02:13:12 -04:00
  • 318d143224 Relax the contiguous check for cuda kernels. (#2000) Laurent Mazare 2024-04-03 09:02:38 +02:00
  • 2be1a35710 Added link to the Coursera ML algorithm implementations (#1989) Vishal Patil 2024-04-03 01:16:32 -04:00
  • 26226068a4 Moondream WASM (#1999) Radamés Ajna 2024-04-02 22:11:50 -07:00
  • cd6b9e317c Add benchmarks for the candle-nn package (#1995) Thomas Santerre 2024-04-03 01:03:54 -04:00
  • 08c049def3 Improve the handling of matmul with squeezed layouts. (#1998) Laurent Mazare 2024-04-02 23:17:05 +02:00
  • d17b2cdad9 Match Moondream's latest release (#1997) Santiago Medina 2024-04-02 12:37:09 -07:00
  • fb918a23c8 first commit (#1994) Jorge António 2024-04-02 15:31:05 +01:00
  • b23436bf90 Stable diffusion fix. (#1993) Laurent Mazare 2024-04-02 14:36:28 +02:00
  • be9c200cbb Expose the t5 config fields + allow t5-large. (#1987) Laurent Mazare 2024-04-01 20:58:34 +02:00
  • ea0d8d3753 Quantized moondream implementation and BOS token (#1980) Santiago Medina 2024-04-01 10:37:54 -07:00
  • 308ea070ed modify access for conv and op to be pub to allow external packages to have custom backends (#1986) Thomas Santerre 2024-04-01 11:44:49 -04:00
  • b20acd622c Update for pyo3 0.21. (#1985) Laurent Mazare 2024-04-01 17:07:02 +02:00
  • 5522bbc57c Add fn 'get_with_hints_dtype' in VarBuilder (#1877) (#1897) yinqiwen 2024-04-01 18:10:08 +08:00
  • 888c09a3db add identity op (#1976) Mauro Sciancalepore 2024-04-01 12:08:25 +02:00
  • 318cb82f16 Quantized cuda tweaks. (#1981) Laurent Mazare 2024-04-01 11:06:42 +02:00
  • c7557b65dc Switch the default to using the faster kernels. (#1978) Laurent Mazare 2024-04-01 10:00:11 +02:00
  • cd29c7ccd4 More ggml cuda kernels (#1977) Laurent Mazare 2024-04-01 00:15:48 +02:00
  • f9954b73ba Add options to use local files + specify a custom repo or branch. (#1973) Laurent Mazare 2024-03-31 09:32:50 +02:00
  • eead1dcead Clippy fix. (#1972) Laurent Mazare 2024-03-31 08:57:40 +02:00
  • 92f81d2fcb Add Moondream transformer implementation and example (#1970) Santiago Medina 2024-03-30 23:54:56 -07:00
  • 3144150b8d Move the tensor-tools binary in a separate crate. (#1969) Laurent Mazare 2024-03-30 15:49:37 +01:00
  • b190fd8592 Remove some unnecessary calls to contiguous. (#1968) Laurent Mazare 2024-03-30 13:22:00 +01:00
  • efe4a0c84b Add a print command to tensor-tools. (#1967) Laurent Mazare 2024-03-30 11:34:33 +01:00
  • 665da30487 Backend refactoring. (#1966) Laurent Mazare 2024-03-29 23:02:11 +01:00
  • 356a170ae9 Update parquet requirement from 50.0.0 to 51.0.0 (#1867) dependabot[bot] 2024-03-29 21:58:15 +01:00
  • 7ecbc6d50b fix minor typo (#1924) Marco Inacio 2024-03-29 17:09:57 +00:00
  • 8ad12a0e81 Add some examples using the MT5 variants. (#1963) Laurent Mazare 2024-03-29 18:09:29 +01:00
  • eb1b27abcd Readme fix. (#1961) Laurent Mazare 2024-03-28 23:24:46 +01:00
  • 708e422456 Qwen MoE model. (#1960) Laurent Mazare 2024-03-28 23:10:57 +01:00
  • c5092f2c29 Add a couple t5 models. (#1958) Laurent Mazare 2024-03-28 17:58:06 +01:00
  • cdc8b57b5c Fix clippy lints + minor cleanups. (#1957) Laurent Mazare 2024-03-28 14:17:46 +01:00
  • b0340d72ec CLIP model implementation with example (#1950) Tigran Zhampeissov 2024-03-28 17:44:12 +05:00
  • b3484e7a5e Fix for the RWKV models. (#1955) Laurent Mazare 2024-03-28 10:17:38 +01:00
  • ada5d7c096 add send and sync trait bounds for scheduler config in stable diffusion models (#1952) Jorge António 2024-03-28 09:03:00 +00:00
  • 13ae5a34c7 Ensure that the kernels get rebuilt on cuh changes. (#1954) Laurent Mazare 2024-03-28 06:56:48 +01:00
  • ab86cd37c8 Support i64 in index-select on metal. (#1951) Laurent Mazare 2024-03-27 16:30:07 +01:00
  • a9abde5f93 More flexible matmul contiguity checks. (#1949) Laurent Mazare 2024-03-27 10:59:05 +01:00
  • 75b6d4b0da add config for mamba 2.8b model parameter (#1946) Jorge António 2024-03-27 06:47:23 +00:00
  • 66f0a4eeea Another fix for squeezing. (#1943) Laurent Mazare 2024-03-26 17:05:26 +01:00
  • 4523ecfb2a Faster repeat penalty (#1940) Laurent Mazare 2024-03-26 11:31:20 +01:00
  • f5dfe883d7 Extend supported dtypes for metal (im2col & upsample_2d) (#1938) Thomas Santerre 2024-03-26 01:48:56 -04:00
  • 196765e995 Use the new rope kernel in mistral. (#1937) Laurent Mazare 2024-03-25 23:26:05 +01:00
  • 60676780a9 Fix detail in new RoPE implementation (#1935) Hugo Abonizio 2024-03-25 14:20:09 -03:00
  • d3a8d291d5 Avoid the attention mask where possible. (#1933) Laurent Mazare 2024-03-25 15:31:04 +01:00
  • cd254074f3 Really unique identifier for metal device ids. (#1932) Laurent Mazare 2024-03-25 11:48:16 +01:00
  • e7f8e72588 Contiguous variant of the rope kernel. (#1929) Laurent Mazare 2024-03-25 09:11:20 +01:00
  • 1b98f84a2b Fast kernels for rotary embeddings. (#1928) Laurent Mazare 2024-03-24 22:48:52 +01:00
  • cf7d7fcf2f Also avoid the mask in the llama example. laurent 2024-03-24 19:04:32 +01:00
  • 8c0db87992 Avoid using the attn mask when not necessary. opt-attn-mask laurent 2024-03-24 18:55:56 +01:00
  • e2b4829531 Support more mistral models. (#1927) Laurent Mazare 2024-03-24 08:04:04 +01:00
  • 5e70821dd0 Allow for arbitrary temperature modifications. laurent 2024-03-23 15:47:39 +01:00
  • a62a97340c Add topk sampling. (#1923) Laurent Mazare 2024-03-23 15:26:09 +01:00