Commit Graph

1142 Commits

Author SHA1 Message Date
eab54e4490 Fix the tests for mkl. (#437) 2023-08-14 08:09:27 +01:00
9e7e6e0288 Add dequantization for ggmls q4_0, q4_1, q5_0, q5_1 and q8_0 (#407)
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`

* expose `tensor_from_ggml` for external usage

* bugfixes & example
2023-08-13 23:22:57 +01:00
8bd2b22b33 Optimize the logit computations in the whisper example. (#434) 2023-08-13 22:00:13 +01:00
d379a76a9e Add a softmax bench. (#433)
* Add a softmax bench.

* Add the vectorized sum reduce.
2023-08-13 20:09:18 +01:00
9af438ac1b Track the conv2d operations in stable-diffusion. (#431)
* Track the conv2d operations in stable-diffusion.

* Add more tracing to stable-diffusion.

* Also trace the resnet bits.

* Trace the attention blocks.

* Also trace the attention inner part.

* Small tweak.
2023-08-13 15:58:26 +01:00
b1ff78f762 Allow using accelerate with stable-diffusion. (#430) 2023-08-13 14:14:20 +01:00
5a63b51f14 Add a matmul benchmark. (#429) 2023-08-13 13:41:03 +01:00
6d694554b8 Support longer sequences in language detection. (#428) 2023-08-13 13:16:15 +01:00
9aca398a4f More accelerate optimizations (#427)
* Add more tracing to the whisper example.

* Support accelerate in more examples.

* Use accelerate for pointwise functions.

* Use accelerate for binary operations too.

* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
60cd1551ca Add a KV cache to whisper. (#426) 2023-08-12 21:17:08 +01:00
a0908d212c Add a -language argument. (#425) 2023-08-12 17:08:40 +01:00
972078e1ae Update the readme with the discord server and common errors. (#423) 2023-08-12 16:45:58 +01:00
16b89f5b83 fix: can directly save the loaded weights (#421) 2023-08-12 16:33:29 +01:00
0741ebbd51 More multilingual support for whisper. (#419)
* More multilingual support for whisper.

* Use the language token appropriately.
2023-08-12 15:32:52 +01:00
0c3f109faa Basic multilingual support for whisper (#417)
* Multi-lingual support for whisper.

* Avoid hardcoding the token names.

* More multi-lingual support.

* Remove the todo.
2023-08-12 11:23:04 +01:00
2ba6b2826f Fix the readme instructions for stable-diffusion. (#415) 2023-08-11 18:59:04 +01:00
1d0157bbc4 Stable diffusion: retrieve the model files from the HF hub. (#414)
* Retrieve the model files from the HF hub in the stable diffusion example.

* Add to the readme.
2023-08-11 18:57:06 +01:00
91dbf907d3 Add more whisper variants. (#413) 2023-08-11 17:33:55 +01:00
e12372021b Expose the tensor write-bytes function. (#412) 2023-08-11 17:13:42 +01:00
55e428c8ae Expose the varmap inner data. (#411) 2023-08-11 16:58:56 +01:00
01ea57da8c Fix the conv tests. (#409) 2023-08-11 14:59:54 +01:00
662db45fc3 Use zero padding in conv1d and conv2d (same as pytorch). (#408) 2023-08-11 14:53:05 +01:00
906c0f3eb5 Remove the checkpoint conversion script. (#405)
* Remove the checkpoint conversion script.

* Remove references to the script.
2023-08-11 05:59:48 +01:00
e29c7809ec Parallelise the CPU kernels for the conv ops. (#401)
* Parallelise the conv2d op.

* Tighter control on threading.

* Also parallelise conv1d.

* Add some safety comment.
2023-08-11 05:51:58 +01:00
a325c1aa50 Upsample test + bugfix. (#399) 2023-08-10 21:02:35 +02:00
b6cf26e48e Merge pull request #393 from huggingface/older_gpus
Working on older GPUs (still not compute 52 it seems but > 6 could be OK)
2023-08-10 20:49:23 +02:00
379eadc68e Working now. 2023-08-10 19:43:25 +02:00
7e4fbc1e17 [DO NOT MERGE] temporary PR so users can try out on older GPUs. 2023-08-10 19:36:31 +02:00
80f0482f26 Fix the stable-diffusion vae. (#398)
* Fix the stable-diffusion vae.

* Fix for saving images.
2023-08-10 18:24:31 +01:00
94eff56aee Optimize the cpu conv2d kernel (#396)
* Conv2d simd optimization.

* Fix the contiguous copying.

* Small tweak.
2023-08-10 17:40:09 +01:00
a55133effd Merge pull request #395 from huggingface/fix_compat_windows
Compat windows.
2023-08-10 18:05:12 +02:00
ff53f38467 Small example for benchmarking some cpu ops (#394)
* Refactor the benchmark example.

* Rename the example.

* Add some comments.
2023-08-10 17:00:17 +01:00
4a95d34c83 Compat windows. 2023-08-10 17:46:47 +02:00
7f710a573d Merge pull request #374 from Rocketknight1/readme_fixes
README.md typos and grammar fixes
2023-08-10 16:34:19 +02:00
c8039579a5 Conv1d optimize (#392)
* Reorder the conv1d loops in the cpu backend.

* Optimize the 1d convolution.

* Conv1D optimize.

* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
0b0fa56978 Merge pull request #386 from huggingface/enabling_61_maybe
This is duplicated code on Cuda 12.2.
2023-08-10 16:23:17 +02:00
385f0d261c Normalize embeddings in the bert example. (#390) 2023-08-10 13:05:55 +01:00
b765f2c37f Update the wasm build instructions. (#389) 2023-08-10 11:29:43 +01:00
66d1c093e0 This is duplicated code on Cuda 12.2.
Without it we can compile for 52 (but I get Operation Not supported
when actually trying to use those kernels).
2023-08-10 09:20:18 +02:00
de7c31bfe9 Merge pull request #368 from huggingface/add_cuda_ci
Adding cuda CI
2023-08-10 08:49:39 +02:00
8e7ef96588 Fix CI cuda. 2023-08-10 08:47:15 +02:00
f3fe730a30 Npy tweaks & error with path (#384)
* Simplify the npy writing.

* Wrap the file path so as to provide better errors.
2023-08-10 06:21:58 +01:00
c7f92f985e Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. (#383) 2023-08-10 05:48:19 +01:00
Lei
3bbc08a8df Fix randn cpu (#382)
* Change distributions

Standard generates in [0, 1), Normal is correct.

* Add test

Not sure if this is the best place to put  the test

* Remove unnecessary use
2023-08-10 05:33:44 +01:00
6a2137af4f Update README.md 2023-08-10 00:19:58 +01:00
0dc1e5f387 Merge branch 'main' into readme_fixes 2023-08-10 00:19:20 +01:00
bd2fb6216b Testing in release mode because debug is too slow. 2023-08-09 23:19:55 +02:00
3542b26143 ssl update. 2023-08-09 23:11:45 +02:00
a690f14a77 Fix by hardcoding paths 2023-08-09 23:08:50 +02:00
90d778c059 ? 2023-08-09 23:02:11 +02:00