b23436bf90
Stable diffusion fix. ( #1993 )
...
* Stable diffusion fix.
* And add a comment.
2024-04-02 14:36:28 +02:00
be9c200cbb
Expose the t5 config fields + allow t5-large. ( #1987 )
2024-04-01 20:58:34 +02:00
ea0d8d3753
Quantized moondream implementation and BOS token ( #1980 )
...
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Pass bos token at the beginning of tensor.
* Quantize moondream.
* Forward with image bos token.
* Clippy.
* Use q4_0 quantization.
* Add pointers for sequence and tokens; Remove seq_len conditional
2024-04-01 19:37:54 +02:00
308ea070ed
modify access for conv and op to be pub to allow external packages to have custom backends ( #1986 )
2024-04-01 17:44:49 +02:00
b20acd622c
Update for pyo3 0.21. ( #1985 )
...
* Update for pyo3 0.21.
* Also adapt the RL example.
* Fix for the pyo3-onnx bindings...
* Print details on failures.
* Revert pyi.
2024-04-01 17:07:02 +02:00
5522bbc57c
Add fn 'get_with_hints_dtype' in VarBuilder ( #1877 ) ( #1897 )
...
* quantized models(awq/squeezellm/...) have multiple data type tensors, use 'get_with_hints_dtype' to load tensors with given dtype
2024-04-01 12:10:08 +02:00
888c09a3db
add identity op ( #1976 )
2024-04-01 12:08:25 +02:00
318cb82f16
Quantized cuda tweaks. ( #1981 )
...
* Quantized cuda tweaks.
* Add some safety checks.
* Factorize the dequantization bits.
2024-04-01 11:06:42 +02:00
c7557b65dc
Switch the default to using the faster kernels. ( #1978 )
...
* Switch the default to using the faster kernels.
* Add the force-dmmv flag.
2024-04-01 10:00:11 +02:00
cd29c7ccd4
More ggml cuda kernels ( #1977 )
...
* Add more cuda kernels for quantized matmul.
* Add the vec-dot bits.
* Expose the quantized matmul-vec kernels.
* Also include the quantize-q8-1 kernel.
* Glue code for the q8-1 quantization.
* mm-vec product via q8-1 quantization.
* Add a test.
* Add a mm test.
* Get the test to return some sensible results.
* Also test dmmv.
* Fix the launch params.
* Allow for tweaking the force_dmmv parameter while it's experimental.
2024-04-01 00:15:48 +02:00
f9954b73ba
Add options to use local files + specify a custom repo or branch. ( #1973 )
2024-03-31 09:32:50 +02:00
eead1dcead
Clippy fix. ( #1972 )
2024-03-31 08:57:40 +02:00
92f81d2fcb
Add Moondream transformer implementation and example ( #1970 )
...
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
2024-03-31 08:54:56 +02:00
3144150b8d
Move the tensor-tools binary in a separate crate. ( #1969 )
2024-03-30 15:49:37 +01:00
b190fd8592
Remove some unnecessary calls to contiguous. ( #1968 )
...
* Remove some unnecessary calls to contiguous.
* Slightly improved kv cache concatenation.
2024-03-30 13:22:00 +01:00
efe4a0c84b
Add a print command to tensor-tools. ( #1967 )
...
* Add a print command to tensor-tools.
* Add some flags to tweak the formatting.
2024-03-30 11:34:33 +01:00
665da30487
Backend refactoring. ( #1966 )
...
* Backend refactoring.
* Metal tweaks.
* Move the cudnn module.
2024-03-29 23:02:11 +01:00
356a170ae9
Update parquet requirement from 50.0.0 to 51.0.0 ( #1867 )
...
Updates the requirements on [parquet](https://github.com/apache/arrow-rs ) to permit the latest version.
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/compare/50.0.0...50.0.0 )
---
updated-dependencies:
- dependency-name: parquet
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-29 21:58:15 +01:00
7ecbc6d50b
fix minor typo ( #1924 )
2024-03-29 18:09:57 +01:00
8ad12a0e81
Add some examples using the MT5 variants. ( #1963 )
2024-03-29 18:09:29 +01:00
eb1b27abcd
Readme fix. ( #1961 )
2024-03-28 23:24:46 +01:00
708e422456
Qwen MoE model. ( #1960 )
...
* Qwen MoE model.
* Add the MoE model to the example.
* Fix the scaling.
* Readme updates.
* Readme tweaks.
2024-03-28 23:10:57 +01:00
c5092f2c29
Add a couple t5 models. ( #1958 )
2024-03-28 17:58:06 +01:00
cdc8b57b5c
Fix clippy lints + minor cleanups. ( #1957 )
...
* Fix clippy lints + minor cleanups.
* fmt.
* Derive clone.
2024-03-28 14:17:46 +01:00
b0340d72ec
CLIP model implementation with example ( #1950 )
...
* CLIP model implementation with example
* CLIP Implementation fixes, batch images
* CLIP model remove images from git
* CLIP model remove unnecessary use of batch_indices
2024-03-28 13:44:12 +01:00
b3484e7a5e
Fix for the RWKV models. ( #1955 )
...
* Fix for the RWKV models.
* More general fix + revert the rwkv hack.
* Remove the old hack.
2024-03-28 10:17:38 +01:00
ada5d7c096
add send and sync trait bounds for scheduler config in stable diffusion models ( #1952 )
...
* first commit
* add Sync deriving
* static
* remove static
2024-03-28 10:03:00 +01:00
13ae5a34c7
Ensure that the kernels get rebuilt on cuh changes. ( #1954 )
2024-03-28 06:56:48 +01:00
ab86cd37c8
Support i64 in index-select on metal. ( #1951 )
...
* Support i64 in index-select on metal.
* Add some testing of index-select for all dtypes.
2024-03-27 16:30:07 +01:00
a9abde5f93
More flexible matmul contiguity checks. ( #1949 )
...
* More flexible matmul contiguity checks.
* Also relax the checks on the metal side.
2024-03-27 10:59:05 +01:00
75b6d4b0da
add config for mamba 2.8b model parameter ( #1946 )
...
* first commit
* Make the mamba config public.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-03-27 07:47:23 +01:00
66f0a4eeea
Another fix for squeezing. ( #1943 )
2024-03-26 17:05:26 +01:00
4523ecfb2a
Faster repeat penalty ( #1940 )
...
* Avoid the attention mask where possible.
* Faster repeat penalty.
2024-03-26 11:31:20 +01:00
f5dfe883d7
Extend supported dtypes for metal (im2col & upsample_2d) ( #1938 )
...
* update im2col dtype implementations
* update dtypes for upsample
2024-03-26 06:48:56 +01:00
196765e995
Use the new rope kernel in mistral. ( #1937 )
...
* Use the new rope kernel in mistral.
* Compute the cos and sin with full precision.
* Bugfix.
2024-03-25 23:26:05 +01:00
60676780a9
Fix detail in new RoPE implementation ( #1935 )
2024-03-25 18:20:09 +01:00
d3a8d291d5
Avoid the attention mask where possible. ( #1933 )
2024-03-25 15:31:04 +01:00
cd254074f3
Really unique identifier for metal device ids. ( #1932 )
...
* Really unique identifier for metal device ids.
* Same device.
2024-03-25 11:48:16 +01:00
e7f8e72588
Contiguous variant of the rope kernel. ( #1929 )
...
* Contiguous variant of the rope kernel.
* Add the cuda kernel.
* Metal kernel.
2024-03-25 09:11:20 +01:00
1b98f84a2b
Fast kernels for rotary embeddings. ( #1928 )
...
* Fast kernels for rotary embeddings.
* Add a test for the fast CPU kernel.
* Rope cuda bindings.
* Cuda kernel.
* Metal kernel (part 1).
* Cuda kernels.
* Finish the metal kernel.
* Use the new kernels in the quantized example.
* Fix warning.
2024-03-24 22:48:52 +01:00
cf7d7fcf2f
Also avoid the mask in the llama example.
2024-03-24 19:04:32 +01:00
8c0db87992
Avoid using the attn mask when not necessary.
2024-03-24 18:55:56 +01:00
e2b4829531
Support more mistral models. ( #1927 )
...
* Support more mistral models.
* Use the appropriate rope parameter.
2024-03-24 08:04:04 +01:00
5e70821dd0
Allow for arbitrary temperature modifications.
2024-03-23 15:47:39 +01:00
a62a97340c
Add topk sampling. ( #1923 )
2024-03-23 15:26:09 +01:00
fdfe8fd129
Preliminary support for inplace ops. ( #1921 )
...
* Preliminary support for inplace ops.
* Add a test.
2024-03-23 14:16:19 +01:00
790037390c
Add cast_bf16_x/cast_x_bf16 when CUDA_ARCH<800 but CUDA_VERSION >= 11000 ( #1919 )
...
- it make possible to load bf16 models on T4(sm75)
2024-03-23 13:44:10 +01:00
6f877592a7
Avoid broadcasting on the batch dimension for the attention mask. ( #1920 )
2024-03-23 13:08:53 +01:00
cc856db9ce
Backwards for ConvTranspose2D ( #1910 )
...
* add documentation for nackprop
* add backwards for ConvTranspose2D
* add test python code to test
2024-03-23 07:05:55 +01:00
fc1fe5e45b
Support scatter/index_add with i64 indices for f16 ( #1915 )
2024-03-22 11:51:41 +01:00