75629981bc
feat: parse Cuda compute cap from env ( #1066 )
...
* feat: add support for multiple compute caps
* Revert to one compute cap
* fmt
* fix
2023-10-16 15:37:38 +01:00
0106b0b04c
Read all the tensors in a PyTorch pth file. ( #1106 )
2023-10-16 13:50:07 +01:00
588ad4835a
Fix the verbose prompt for phi. ( #1097 )
2023-10-15 10:53:25 +01:00
b73c35cc57
Improve the reshape error messages. ( #1096 )
...
* Improve the reshape error messages.
* Add the verbose-prompt flag to the phi example.
2023-10-15 10:43:10 +01:00
8f310cc666
Avoid trying to backprop through non-differentiable layers. ( #1094 )
2023-10-14 22:03:41 +01:00
8921d5027c
Add support for phi-1.0 ( #1093 )
...
* Add support for phi-1.0
* Update the readme.
2023-10-14 20:15:43 +01:00
29c7f2565d
Add some reinforcement learning example. ( #1090 )
...
* Add some reinforcement learning example.
* Python initialization.
* Get the example to run.
* Vectorized gym envs for the atari wrappers.
* Get some simulation loop to run.
2023-10-14 16:46:43 +01:00
9309cfc47d
Create a new curand instead of reseeding. ( #1089 )
2023-10-14 10:03:59 +01:00
a193bf5f60
Another gemm update. ( #1088 )
2023-10-14 09:36:52 +01:00
2c110ac7d9
Add the pooling operators to the pyo3 layer. ( #1086 )
2023-10-13 20:18:10 +01:00
75989fc3b7
Use an attention mask in the e5 padding case. ( #1085 )
2023-10-13 18:53:40 +01:00
07af87a1d8
Typos. ( #1084 )
2023-10-13 16:21:20 +01:00
eefad2b95f
Update to gemm 0.16.1 ( #1083 )
2023-10-13 06:40:20 +01:00
5e6df4a3f7
Update to gemm-0.16. ( #1082 )
...
* Update to gemm-0.16.
* Enable wasm-simd128.
2023-10-12 21:56:59 +01:00
7473c4ceca
Fix the npy read function and add some testing. ( #1080 )
2023-10-12 15:25:05 +02:00
c096f02411
Add a matvec cpu benchmark. ( #1076 )
2023-10-12 09:29:18 +01:00
e7560443e4
Convmixer example ( #1074 )
...
* Add a convmixer based example.
* Mention the model in the readme.
2023-10-11 19:51:10 +01:00
89b525b5e7
Convmixer ( #1073 )
...
* Only optimize float tensors.
* Use full tensors for zeros and ones.
* Add a benchmark for the matmul slowness.
* Add the convmixer model.
* Proper adaptive pooling.
2023-10-11 18:24:32 +01:00
37dbbff261
Use full tensors for zeros and ones ( #1071 )
...
* Only optimize float tensors.
* Use full tensors for zeros and ones.
2023-10-11 08:16:04 +01:00
9fea56d28e
Only optimize float tensors. ( #1069 )
2023-10-10 09:05:41 +01:00
bc3351bce4
Tracing for StableLM and quantized StableLM. ( #1068 )
2023-10-10 08:09:25 +02:00
b34d7f0248
Remove some unusued bits. ( #1067 )
2023-10-09 19:49:57 +01:00
4d04ac83c7
Override the repo for SDXL f16 vae weights. ( #1064 )
...
* Override the repo for SDXL f16 vae weights.
* Slightly simpler change.
2023-10-09 06:52:28 +01:00
392fe02fba
Move the common quantized-nn code to a shared module. ( #1063 )
2023-10-09 06:22:22 +01:00
59ab6d7832
Quantized version of StableLM. ( #1058 )
...
* Quantized version of StableLM.
* Adapt the stable-lm example to support quantizsed.
* Use some separate hub repo.
* Another repo name tweak.
2023-10-08 15:42:38 +01:00
783735cf22
Use softmax-last-dim where possible. ( #1057 )
2023-10-08 13:16:42 +01:00
9abeddd750
Make the cuda rng seedable. ( #1056 )
2023-10-08 09:32:36 +01:00
2e5fb0b251
Do not use the kv-cache on external key-value states. ( #1054 )
2023-10-07 22:37:19 +01:00
823fe23f9b
Add flash-attn support for stable-lm. ( #1052 )
2023-10-07 21:12:54 +01:00
d833527fda
Use candle_nn::LSTM in encodec. ( #1051 )
...
* Use candle_nn::LSTM in encodec.
* More Encodec implementation.
* Decoder implementation.
2023-10-07 19:43:06 +01:00
a4967600d0
More general seq forward functions for RNNs. ( #1050 )
2023-10-07 15:08:01 +01:00
aa53368aeb
Better control on the optional dequantization in QMatMul ( #1049 )
...
* Cosmetic change to the quantized whisper model.
* Fix the dequantization.
* Add the dequantize all variable.
2023-10-07 10:16:18 +01:00
955e00b2e8
Add to the readmes for stable-lm. ( #1047 )
2023-10-06 21:26:04 +01:00
d5f7267087
Add the stable-lm example. ( #1046 )
...
* Add the stable-lm example.
* Get stable-lm to generate some proper text.
2023-10-06 19:20:35 +01:00
904bbdae65
Make the Python Wrapper more Hackable and simplify Quantization ( #1010 )
...
* Some first `Module` implementations
* Add `state_dict` and `load_state_dict` functionality
* Move modules around and create `candle.nn.Linear`
* Add `nn.Embedding` and `nn.LayerNorm`
* Add BERT implementation
* Batch q-matmul
* Automatically dequantize `QTensors` if a `Tensor` is expected
* Add Module `.to()`, `.cuda()`, `cpu()` and `.type()` functionality
* Unittests for `Module`, `Tensor` and `candle.utils`
* Add `pytorch` like slicing to `Tensor`
* Cleanup and BERT fixes
* `black` formatting + unit-test for `nn.Linear`
* Refactor slicing implementation
2023-10-06 19:01:07 +01:00
b0442eff8a
Sketch the stable-lm model. ( #1045 )
2023-10-06 18:19:06 +01:00
4631c48273
Remove some todos. ( #1042 )
2023-10-05 22:42:20 +01:00
716883e9b0
Add the clamping for stable-diffusion. ( #1041 )
2023-10-05 22:20:39 +01:00
47c25a567b
feat: [SAM] able to download the result as png ( #1035 )
...
* feat: able to download the result as png
* feat: update function and wording
2023-10-05 22:14:47 +01:00
7f7d95e2c3
Add the round-to function. ( #1039 )
2023-10-05 20:28:09 +01:00
f47bd9bab5
Delete invalid comment ( #1038 )
2023-10-05 19:28:08 +01:00
8f7973958c
fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 ( #1037 )
...
* fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0
* cargo fmt
2023-10-05 18:46:13 +01:00
f0c619a4af
Use AsRef<str> for set_one. ( #1033 )
2023-10-05 06:05:44 +01:00
b86ac0c507
Quant t5: Add coedit model to wasm demo and readme ( #1031 )
2023-10-04 20:57:33 +01:00
27e70a5093
Whisper quantized wasm ( #1028 )
...
* [Whisper] Update to use quantized model
* [whisper] add language detection
* [whisper] change assets location
* [whisper] adapt js example with quantized models
* [whisper] better task parsing
* [whisper] minor fixes
2023-10-04 20:22:57 +01:00
c18a856e76
Add the rounding operators. ( #1030 )
...
* Add the rounding operators.
* Avoid tracking gradients for the rounding operations.
* Add some rounding tests.
2023-10-04 17:58:44 +01:00
3349c89252
Add quantized t5 args for weight and config ( #1029 )
2023-10-04 17:02:49 +01:00
11d3687cc6
Simd128 optimized q8k vecdot. ( #1026 )
2023-10-03 15:29:48 +01:00
dac73edb34
AVX optimized q8k vecdot. ( #1024 )
2023-10-03 12:10:58 +01:00
b4da19d1be
Merge pull request #1023 from evgenyigumnov/simlified-book-polish
...
small misspeling and polish fix
2023-10-03 12:29:41 +02:00