Commit Graph

1349 Commits

Author SHA1 Message Date
37dbbff261 Use full tensors for zeros and ones (#1071)
* Only optimize float tensors.

* Use full tensors for zeros and ones.
2023-10-11 08:16:04 +01:00
9fea56d28e Only optimize float tensors. (#1069) 2023-10-10 09:05:41 +01:00
bc3351bce4 Tracing for StableLM and quantized StableLM. (#1068) 2023-10-10 08:09:25 +02:00
b34d7f0248 Remove some unusued bits. (#1067) 2023-10-09 19:49:57 +01:00
4d04ac83c7 Override the repo for SDXL f16 vae weights. (#1064)
* Override the repo for SDXL f16 vae weights.

* Slightly simpler change.
2023-10-09 06:52:28 +01:00
392fe02fba Move the common quantized-nn code to a shared module. (#1063) 2023-10-09 06:22:22 +01:00
59ab6d7832 Quantized version of StableLM. (#1058)
* Quantized version of StableLM.

* Adapt the stable-lm example to support quantizsed.

* Use some separate hub repo.

* Another repo name tweak.
2023-10-08 15:42:38 +01:00
783735cf22 Use softmax-last-dim where possible. (#1057) 2023-10-08 13:16:42 +01:00
9abeddd750 Make the cuda rng seedable. (#1056) 2023-10-08 09:32:36 +01:00
2e5fb0b251 Do not use the kv-cache on external key-value states. (#1054) 2023-10-07 22:37:19 +01:00
823fe23f9b Add flash-attn support for stable-lm. (#1052) 2023-10-07 21:12:54 +01:00
d833527fda Use candle_nn::LSTM in encodec. (#1051)
* Use candle_nn::LSTM in encodec.

* More Encodec implementation.

* Decoder implementation.
2023-10-07 19:43:06 +01:00
a4967600d0 More general seq forward functions for RNNs. (#1050) 2023-10-07 15:08:01 +01:00
aa53368aeb Better control on the optional dequantization in QMatMul (#1049)
* Cosmetic change to the quantized whisper model.

* Fix the dequantization.

* Add the dequantize all variable.
2023-10-07 10:16:18 +01:00
955e00b2e8 Add to the readmes for stable-lm. (#1047) 2023-10-06 21:26:04 +01:00
d5f7267087 Add the stable-lm example. (#1046)
* Add the stable-lm example.

* Get stable-lm to generate some proper text.
2023-10-06 19:20:35 +01:00
904bbdae65 Make the Python Wrapper more Hackable and simplify Quantization (#1010)
* Some first `Module` implementations

* Add `state_dict` and `load_state_dict` functionality

* Move modules around and create `candle.nn.Linear`

* Add `nn.Embedding` and `nn.LayerNorm`

* Add BERT implementation

* Batch q-matmul

* Automatically dequantize `QTensors` if a `Tensor` is expected

* Add Module `.to()`, `.cuda()`, `cpu()` and `.type()` functionality

* Unittests for `Module`, `Tensor` and `candle.utils`

* Add `pytorch` like slicing to `Tensor`

* Cleanup and BERT fixes

* `black` formatting + unit-test for `nn.Linear`

* Refactor slicing implementation
2023-10-06 19:01:07 +01:00
b0442eff8a Sketch the stable-lm model. (#1045) 2023-10-06 18:19:06 +01:00
4631c48273 Remove some todos. (#1042) 2023-10-05 22:42:20 +01:00
716883e9b0 Add the clamping for stable-diffusion. (#1041) 2023-10-05 22:20:39 +01:00
47c25a567b feat: [SAM] able to download the result as png (#1035)
* feat: able to download the result as png

* feat: update function and wording
2023-10-05 22:14:47 +01:00
7f7d95e2c3 Add the round-to function. (#1039) 2023-10-05 20:28:09 +01:00
f47bd9bab5 Delete invalid comment (#1038) 2023-10-05 19:28:08 +01:00
8f7973958c fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 (#1037)
* fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0

* cargo fmt
2023-10-05 18:46:13 +01:00
f0c619a4af Use AsRef<str> for set_one. (#1033) 2023-10-05 06:05:44 +01:00
b86ac0c507 Quant t5: Add coedit model to wasm demo and readme (#1031) 2023-10-04 20:57:33 +01:00
27e70a5093 Whisper quantized wasm (#1028)
* [Whisper] Update to use quantized model

* [whisper] add language detection

* [whisper] change assets location

* [whisper] adapt js example with quantized models

* [whisper] better task parsing

* [whisper] minor fixes
2023-10-04 20:22:57 +01:00
c18a856e76 Add the rounding operators. (#1030)
* Add the rounding operators.

* Avoid tracking gradients for the rounding operations.

* Add some rounding tests.
2023-10-04 17:58:44 +01:00
3349c89252 Add quantized t5 args for weight and config (#1029) 2023-10-04 17:02:49 +01:00
11d3687cc6 Simd128 optimized q8k vecdot. (#1026) 2023-10-03 15:29:48 +01:00
dac73edb34 AVX optimized q8k vecdot. (#1024) 2023-10-03 12:10:58 +01:00
b4da19d1be Merge pull request #1023 from evgenyigumnov/simlified-book-polish
small misspeling and polish fix
2023-10-03 12:29:41 +02:00
ff513314fc small misspeling and polish fix 2023-10-03 15:47:04 +06:00
043cc25766 Fix for the index-select cuda setup. (#1022)
* Fix for index-select.

* Better fix + add some testing.
2023-10-03 10:21:46 +01:00
7b06872f90 Merge pull request #926 from evgenyigumnov/book-trainin-simplified
Book train simlified example
2023-10-03 10:41:30 +02:00
65825e7240 [SAM] Add undo button and background point mode (#1020)
* [SAM] Add undo button and background point mode

* [SAM] remove pts on near clicks

* [SAM] check shiftKey toggle point mode

* [SAM] clear points when clearing image
2023-10-02 23:33:46 +01:00
7670fe7d1f neon optimized q8k multiplication. (#1021)
* neon optimized q8k multiplication.

* Bugfixes.

* simdification.
2023-10-02 23:26:34 +01:00
cddfc3944c Add the q8k vec-dot multiplication. (#1019) 2023-10-02 21:53:34 +01:00
089fc3b584 Improve the quantized whisper setup. (#1018)
* Improve the quantized whisper setup.

* Fix the config file paths.

* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
e04c789230 Add a quantized variant of whisper (#1017)
* Add the quantized-whisper model.

* Quantized the whisper model.

* Adapt the whisper example to handle quantization.

* Add the quantized flag.

* Load the proper weights.
2023-10-02 14:59:53 +01:00
263a172202 Improve the testing of the optimized quantized vec-dot ops (#1016)
* Expose the unopt functions for testing.

* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
638ccf9f46 Fix include code. 2023-10-02 10:22:44 +02:00
0baf5a1e19 Fixed PR warnings. 2023-10-02 10:15:10 +02:00
5130a7da32 Simd128 version of q6k vec-dot. (#1015)
* Add a specific function for the simd128 q6k vec-dot.

* Simdification.

* More simdification.
2023-10-01 19:44:12 +01:00
41143db1af [segment-anything] add multi point logic for demo site (#1002)
* [segment-anything] add multi point logic for demo site

* [segment-anything] remove libs and update functions
2023-10-01 18:25:22 +01:00
096dee7073 Bump the version to 0.3.0. (#1014)
* Bump the version to 0.3.0.

* Changelog update.
2023-10-01 13:51:57 +01:00
f6054e9d60 Fix the prompt for mistral when using instruct/interactive mode. (#1013) 2023-10-01 06:44:30 +01:00
328167ec04 Integrate TheBloke quantized mistral weights. (#1012) 2023-09-30 22:39:42 +01:00
4e55aaa51f Simd128 version of the q2k-q8k vecdot product. (#1011)
* Sketch the simd128 version of q2k vecdot.

* Use a single accumulator.

* Simdify the q2k-q8k vecdot product.

* Cosmetic change.
2023-09-30 20:12:41 +01:00
deee7612da Quantized version of mistral. (#1009)
* Quantized version of mistral.

* Integrate the quantized mistral variant.

* Use the quantized weight files.

* Tweak the quantization command.

* Fix the dtype when computing the rotary embeddings.

* Update the readme with the quantized version.

* Fix the decoding of the remaining tokens.
2023-09-30 18:25:47 +01:00