5b1690fffa
Tweak the llama example. ( #450 )
2023-08-15 12:18:20 +01:00
3cc87058b7
Support local weights & dynamic outputs ( #447 )
...
* Support local weights & dynamic outputs
* Revise as suggested
* Cargo code format
2023-08-15 11:51:57 +01:00
531f23b4d0
Rename vec-dot to vec-ops. ( #449 )
...
* Rename vec-dot to vec-ops.
* Also bump the crate version.
* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
495e0b7580
Simd support ( #448 )
...
* Import the simd intrinsics in candle-core.
* simd version of reduce-sum.
* Bugfix.
* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
90374097dc
Cudnn support ( #445 )
...
* Add a cudnn feature to be used for conv2d.
* Allocate the proper workspace.
* Only create a single cudnn handle per cuda device.
* Proper cudnn usage.
* Bugfix.
2023-08-14 21:30:41 +01:00
c84883ecf2
Add a cuda kernel for upsampling. ( #441 )
...
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
a094dc503d
Add a cuda kernel for avg-pool2d. ( #440 )
...
* Add a cuda kernel for avg-pool2d.
* Avoid running out of bounds.
* Finish wiring the avg pool kernel + add some testing.
* Support for max-pool + testing.
2023-08-14 12:32:05 +01:00
34f4b3187e
Add a naive conv2d cuda kernel. ( #438 )
...
* Add a naive conv2d cuda kernel.
* Proper conv2d support on the rust side.
* Conv1d testing on gpu.
* Also use the test on gpus.
* Fix the clean-ptx target.
2023-08-14 10:34:42 +01:00
eab54e4490
Fix the tests for mkl. ( #437 )
2023-08-14 08:09:27 +01:00
9e7e6e0288
Add dequantization for ggmls q4_0
, q4_1
, q5_0
, q5_1
and q8_0
( #407 )
...
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`
* expose `tensor_from_ggml` for external usage
* bugfixes & example
2023-08-13 23:22:57 +01:00
8bd2b22b33
Optimize the logit computations in the whisper example. ( #434 )
2023-08-13 22:00:13 +01:00
d379a76a9e
Add a softmax bench. ( #433 )
...
* Add a softmax bench.
* Add the vectorized sum reduce.
2023-08-13 20:09:18 +01:00
9af438ac1b
Track the conv2d operations in stable-diffusion. ( #431 )
...
* Track the conv2d operations in stable-diffusion.
* Add more tracing to stable-diffusion.
* Also trace the resnet bits.
* Trace the attention blocks.
* Also trace the attention inner part.
* Small tweak.
2023-08-13 15:58:26 +01:00
b1ff78f762
Allow using accelerate with stable-diffusion. ( #430 )
2023-08-13 14:14:20 +01:00
5a63b51f14
Add a matmul benchmark. ( #429 )
2023-08-13 13:41:03 +01:00
6d694554b8
Support longer sequences in language detection. ( #428 )
2023-08-13 13:16:15 +01:00
9aca398a4f
More accelerate optimizations ( #427 )
...
* Add more tracing to the whisper example.
* Support accelerate in more examples.
* Use accelerate for pointwise functions.
* Use accelerate for binary operations too.
* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
60cd1551ca
Add a KV cache to whisper. ( #426 )
2023-08-12 21:17:08 +01:00
a0908d212c
Add a -language argument. ( #425 )
2023-08-12 17:08:40 +01:00
972078e1ae
Update the readme with the discord server and common errors. ( #423 )
2023-08-12 16:45:58 +01:00
16b89f5b83
fix: can directly save the loaded weights ( #421 )
2023-08-12 16:33:29 +01:00
0741ebbd51
More multilingual support for whisper. ( #419 )
...
* More multilingual support for whisper.
* Use the language token appropriately.
2023-08-12 15:32:52 +01:00
0c3f109faa
Basic multilingual support for whisper ( #417 )
...
* Multi-lingual support for whisper.
* Avoid hardcoding the token names.
* More multi-lingual support.
* Remove the todo.
2023-08-12 11:23:04 +01:00
2ba6b2826f
Fix the readme instructions for stable-diffusion. ( #415 )
2023-08-11 18:59:04 +01:00
1d0157bbc4
Stable diffusion: retrieve the model files from the HF hub. ( #414 )
...
* Retrieve the model files from the HF hub in the stable diffusion example.
* Add to the readme.
2023-08-11 18:57:06 +01:00
91dbf907d3
Add more whisper variants. ( #413 )
2023-08-11 17:33:55 +01:00
e12372021b
Expose the tensor write-bytes function. ( #412 )
2023-08-11 17:13:42 +01:00
55e428c8ae
Expose the varmap inner data. ( #411 )
2023-08-11 16:58:56 +01:00
01ea57da8c
Fix the conv tests. ( #409 )
2023-08-11 14:59:54 +01:00
662db45fc3
Use zero padding in conv1d and conv2d (same as pytorch). ( #408 )
2023-08-11 14:53:05 +01:00
906c0f3eb5
Remove the checkpoint conversion script. ( #405 )
...
* Remove the checkpoint conversion script.
* Remove references to the script.
2023-08-11 05:59:48 +01:00
e29c7809ec
Parallelise the CPU kernels for the conv ops. ( #401 )
...
* Parallelise the conv2d op.
* Tighter control on threading.
* Also parallelise conv1d.
* Add some safety comment.
2023-08-11 05:51:58 +01:00
a325c1aa50
Upsample test + bugfix. ( #399 )
2023-08-10 21:02:35 +02:00
b6cf26e48e
Merge pull request #393 from huggingface/older_gpus
...
Working on older GPUs (still not compute 52 it seems but > 6 could be OK)
2023-08-10 20:49:23 +02:00
379eadc68e
Working now.
2023-08-10 19:43:25 +02:00
7e4fbc1e17
[DO NOT MERGE] temporary PR so users can try out on older GPUs.
2023-08-10 19:36:31 +02:00
80f0482f26
Fix the stable-diffusion vae. ( #398 )
...
* Fix the stable-diffusion vae.
* Fix for saving images.
2023-08-10 18:24:31 +01:00
94eff56aee
Optimize the cpu conv2d kernel ( #396 )
...
* Conv2d simd optimization.
* Fix the contiguous copying.
* Small tweak.
2023-08-10 17:40:09 +01:00
a55133effd
Merge pull request #395 from huggingface/fix_compat_windows
...
Compat windows.
2023-08-10 18:05:12 +02:00
ff53f38467
Small example for benchmarking some cpu ops ( #394 )
...
* Refactor the benchmark example.
* Rename the example.
* Add some comments.
2023-08-10 17:00:17 +01:00
4a95d34c83
Compat windows.
2023-08-10 17:46:47 +02:00
7f710a573d
Merge pull request #374 from Rocketknight1/readme_fixes
...
README.md typos and grammar fixes
2023-08-10 16:34:19 +02:00
c8039579a5
Conv1d optimize ( #392 )
...
* Reorder the conv1d loops in the cpu backend.
* Optimize the 1d convolution.
* Conv1D optimize.
* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
0b0fa56978
Merge pull request #386 from huggingface/enabling_61_maybe
...
This is duplicated code on Cuda 12.2.
2023-08-10 16:23:17 +02:00
385f0d261c
Normalize embeddings in the bert example. ( #390 )
2023-08-10 13:05:55 +01:00
b765f2c37f
Update the wasm build instructions. ( #389 )
2023-08-10 11:29:43 +01:00
66d1c093e0
This is duplicated code on Cuda 12.2.
...
Without it we can compile for 52 (but I get Operation Not supported
when actually trying to use those kernels).
2023-08-10 09:20:18 +02:00
de7c31bfe9
Merge pull request #368 from huggingface/add_cuda_ci
...
Adding cuda CI
2023-08-10 08:49:39 +02:00
8e7ef96588
Fix CI cuda.
2023-08-10 08:47:15 +02:00
f3fe730a30
Npy tweaks & error with path ( #384 )
...
* Simplify the npy writing.
* Wrap the file path so as to provide better errors.
2023-08-10 06:21:58 +01:00