392fe02fba
Move the common quantized-nn code to a shared module. ( #1063 )
2023-10-09 06:22:22 +01:00
59ab6d7832
Quantized version of StableLM. ( #1058 )
...
* Quantized version of StableLM.
* Adapt the stable-lm example to support quantizsed.
* Use some separate hub repo.
* Another repo name tweak.
2023-10-08 15:42:38 +01:00
783735cf22
Use softmax-last-dim where possible. ( #1057 )
2023-10-08 13:16:42 +01:00
2e5fb0b251
Do not use the kv-cache on external key-value states. ( #1054 )
2023-10-07 22:37:19 +01:00
823fe23f9b
Add flash-attn support for stable-lm. ( #1052 )
2023-10-07 21:12:54 +01:00
aa53368aeb
Better control on the optional dequantization in QMatMul ( #1049 )
...
* Cosmetic change to the quantized whisper model.
* Fix the dequantization.
* Add the dequantize all variable.
2023-10-07 10:16:18 +01:00
d5f7267087
Add the stable-lm example. ( #1046 )
...
* Add the stable-lm example.
* Get stable-lm to generate some proper text.
2023-10-06 19:20:35 +01:00
b0442eff8a
Sketch the stable-lm model. ( #1045 )
2023-10-06 18:19:06 +01:00
4631c48273
Remove some todos. ( #1042 )
2023-10-05 22:42:20 +01:00
f47bd9bab5
Delete invalid comment ( #1038 )
2023-10-05 19:28:08 +01:00
089fc3b584
Improve the quantized whisper setup. ( #1018 )
...
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
e04c789230
Add a quantized variant of whisper ( #1017 )
...
* Add the quantized-whisper model.
* Quantized the whisper model.
* Adapt the whisper example to handle quantization.
* Add the quantized flag.
* Load the proper weights.
2023-10-02 14:59:53 +01:00
096dee7073
Bump the version to 0.3.0. ( #1014 )
...
* Bump the version to 0.3.0.
* Changelog update.
2023-10-01 13:51:57 +01:00
deee7612da
Quantized version of mistral. ( #1009 )
...
* Quantized version of mistral.
* Integrate the quantized mistral variant.
* Use the quantized weight files.
* Tweak the quantization command.
* Fix the dtype when computing the rotary embeddings.
* Update the readme with the quantized version.
* Fix the decoding of the remaining tokens.
2023-09-30 18:25:47 +01:00
4021272875
Use flash-attn for mistral. ( #1004 )
2023-09-30 12:15:10 +01:00
6203ced495
Add negative prompts to segment-anything. ( #1000 )
2023-09-30 06:17:42 +01:00
d188d6a764
Fix the multiple points case for sam. ( #998 )
2023-09-29 22:39:43 +02:00
53510ce427
Use a silu activation in mistral. ( #991 )
2023-09-29 07:06:54 +01:00
23b3576c47
Add the sliding window. ( #986 )
2023-09-28 17:26:33 +01:00
716ab2ccdc
Mistral gpu fix ( #985 )
...
* Add the mistral example.
* Use the two model files.
* Adjust the dtype.
* Tweak the weight paths.
* Remove the end of text token.
* Get the mistral model to generate some text.
* Fix when running on the gpu.
* More gpu fixes.
2023-09-28 16:38:13 +01:00
ada8851a23
Add the mistral example. ( #984 )
...
* Add the mistral example.
* Use the two model files.
* Adjust the dtype.
* Tweak the weight paths.
* Remove the end of text token.
* Get the mistral model to generate some text.
2023-09-28 16:19:18 +01:00
c05a348e36
Add the Mistral 7b model ( #983 )
...
* Start sketching the mistral 7b model.
* Add the kv cache.
* Add the decoder layer.
* Add the mistral model.
* Rotary embeddings.
* Add the attention mask.
2023-09-28 14:29:41 +01:00
ce0a4e3a85
Use the gelu-erf activation. ( #969 )
2023-09-26 22:30:21 +01:00
1fcac4afed
Expose a function to clear the KV cache on mixformers. ( #964 )
2023-09-26 05:41:07 +01:00
a36d883254
Use a single flag for the point argument. ( #958 )
2023-09-25 12:53:24 +01:00
7f2bbcf746
[segment-anything] Support multi-point as the prompt input ( #945 )
...
* [sam] Support multi-point prompts
* [segment-anything] Pass points by reference
* [segment-anything] Update example code and image
* Fix clippy lint.
---------
Co-authored-by: Yun Ding <yunding@nvidia.com >
Co-authored-by: laurent <laurent.mazare@gmail.com >
2023-09-25 12:14:10 +01:00
0007ae9c11
Add the quantized mixformer model. ( #953 )
...
* Add the quantized mixformer model.
* Add the quantized option in the phi example.
2023-09-24 15:03:48 +01:00
e15862cfdb
Shared the quantized var-builder code. ( #952 )
...
* Shared the quantized var-builder code.
* Fix compilation.
2023-09-24 12:55:07 +01:00
bb3471ea31
Adapt more examples to the updated safetensor api. ( #947 )
...
* Simplify the safetensor usage.
* Convert more examples.
* Move more examples.
* Adapt stable-diffusion.
2023-09-23 21:26:03 +01:00
7582937a32
Add the causal mask in mixformer. ( #937 )
2023-09-23 09:50:26 +01:00
b54acfa3d0
Tracing for the phi model ( #936 )
...
* Add some tracing bits to mixformers.
* Add the missing file.
* Add the conv2d layer to with-tracing.
* Improve the tracing usage.
2023-09-23 09:19:34 +01:00
df6f5240ba
Complete the mixformer implementation. ( #930 )
...
* Complete the mixformers implementation.
* Tweak the attention.
* Add the phi-1.5 example.
* Improve the phi example.
* Bugfix.
* Get the phi example to work.
2023-09-22 20:03:16 +01:00
a46b1b4657
Mixformer ( #929 )
...
* Sketch the mixformer model.
* More modeling code.
* More mixformers.
* MixFormer creation.
* More mixformers.
2023-09-22 16:17:14 +01:00
19e52e5007
T5 Wasm ( #918 )
...
* init t5 wasm model
* split workers for each model
* clean up
* add some ui
* readme
* index
* typo
* remove cache param, clear_kv_cache
* add max_length as param
* add model tasks option to ui
* add method to load quantized gguf from buffer
* Add quantized wasm module
* add quantized models to UI, dynamic import wasms
* link to quantized
* fix copy
* fix ModelEncoder
* fix README.md
2023-09-22 15:31:10 +01:00
3b557765e8
T5 quantized example ( #922 )
...
* Load gguf files for the quantized t5.
* Add the quantized t5 example.
* Allow for loading local files.
* Add some support for quantizing safetensor files.
* Transpose before quantizing.
* Quantized t5.
* Retrieve the weights from the hub.
2023-09-21 12:33:15 +01:00
2619c4307f
Add a quantized version of the t5 model. ( #921 )
2023-09-21 11:13:39 +01:00
c89b82b2d4
Add a clear cache function to the t5 model. ( #919 )
2023-09-21 09:01:06 +01:00
ab1d40ea97
Add more t5 tracing. ( #915 )
2023-09-20 20:20:54 +01:00
3a0d3e05df
Add more t5 tracing. ( #914 )
...
* Add more t5 tracing.
* Rever the sm change.
2023-09-20 16:37:51 +01:00
9b24d89d2d
Tracing mode for T5. ( #913 )
...
* Tracing mode for T5.
* Tracing for the linear layer.
2023-09-20 15:03:35 +01:00
fb1c2ac535
Add flash-attn support. ( #912 )
...
* Add flash-attn support.
* Add the use-flash-attn flag.
* Re-enable flash-attn.
2023-09-20 14:07:55 +01:00
f685b2231c
Add some missing biases. ( #908 )
2023-09-20 10:14:51 +01:00
05626ef492
Flan T5: Read lm_head when word embeddings are not tied ( #903 )
...
* Read lm_head when word embeddings are not tied
* Fix formatting
* Address comments
2023-09-19 22:36:47 +01:00
67a486d18d
Line-up the wuerstchen model with the python implementation. ( #901 )
...
* Line-up the wuerstchen model with the python implementation.
* Missing cos.
* Fix the picture denormalization.
2023-09-19 21:59:44 +01:00
8696f64bae
Fix T5 kv cache ( #899 )
...
* Fix T5 kv cache
* Add argument for decoder prompt
* Fix range
2023-09-19 20:36:15 +01:00
4f91c8e109
Improve the error message on shape mismatch for cat. ( #897 )
...
* Improve the error message on shape mismatch for cat.
* Cosmetic tweak.
2023-09-19 15:09:47 +01:00
06e46d7c3b
Only use classifier free guidance for the prior. ( #896 )
...
* Only use classifier free guidance for the prior.
* Add another specific layer-norm structure.
* Tweaks.
* Fix the latent shape.
* Print the prior shape.
* More shape fixes.
* Remove some debugging continue.
2023-09-19 14:13:05 +01:00
92db8cecd3
Specialized attention module for Wuerstchen. ( #890 )
...
* Specialized attention module for Wuerstchen.
* Reshaping ops.
* Attention processor.
* Finish the forward pass.
* Hook the new attention processor.
* Get the prior forward pass to work.
* Make it contiguous.
2023-09-18 21:16:09 +01:00
82a98f6da0
Prior denoising. ( #889 )
2023-09-18 16:51:38 +01:00
5082954c52
Fix the W clip embeddings. ( #887 )
...
* Fix the W clip embeddings.
* Add the specialized ddpm scheduler.
2023-09-18 14:50:14 +01:00