55e428c8ae
Expose the varmap inner data. ( #411 )
2023-08-11 16:58:56 +01:00
01ea57da8c
Fix the conv tests. ( #409 )
2023-08-11 14:59:54 +01:00
662db45fc3
Use zero padding in conv1d and conv2d (same as pytorch). ( #408 )
2023-08-11 14:53:05 +01:00
906c0f3eb5
Remove the checkpoint conversion script. ( #405 )
...
* Remove the checkpoint conversion script.
* Remove references to the script.
2023-08-11 05:59:48 +01:00
e29c7809ec
Parallelise the CPU kernels for the conv ops. ( #401 )
...
* Parallelise the conv2d op.
* Tighter control on threading.
* Also parallelise conv1d.
* Add some safety comment.
2023-08-11 05:51:58 +01:00
a325c1aa50
Upsample test + bugfix. ( #399 )
2023-08-10 21:02:35 +02:00
b6cf26e48e
Merge pull request #393 from huggingface/older_gpus
...
Working on older GPUs (still not compute 52 it seems but > 6 could be OK)
2023-08-10 20:49:23 +02:00
379eadc68e
Working now.
2023-08-10 19:43:25 +02:00
7e4fbc1e17
[DO NOT MERGE] temporary PR so users can try out on older GPUs.
2023-08-10 19:36:31 +02:00
80f0482f26
Fix the stable-diffusion vae. ( #398 )
...
* Fix the stable-diffusion vae.
* Fix for saving images.
2023-08-10 18:24:31 +01:00
94eff56aee
Optimize the cpu conv2d kernel ( #396 )
...
* Conv2d simd optimization.
* Fix the contiguous copying.
* Small tweak.
2023-08-10 17:40:09 +01:00
a55133effd
Merge pull request #395 from huggingface/fix_compat_windows
...
Compat windows.
2023-08-10 18:05:12 +02:00
ff53f38467
Small example for benchmarking some cpu ops ( #394 )
...
* Refactor the benchmark example.
* Rename the example.
* Add some comments.
2023-08-10 17:00:17 +01:00
4a95d34c83
Compat windows.
2023-08-10 17:46:47 +02:00
7f710a573d
Merge pull request #374 from Rocketknight1/readme_fixes
...
README.md typos and grammar fixes
2023-08-10 16:34:19 +02:00
c8039579a5
Conv1d optimize ( #392 )
...
* Reorder the conv1d loops in the cpu backend.
* Optimize the 1d convolution.
* Conv1D optimize.
* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
0b0fa56978
Merge pull request #386 from huggingface/enabling_61_maybe
...
This is duplicated code on Cuda 12.2.
2023-08-10 16:23:17 +02:00
385f0d261c
Normalize embeddings in the bert example. ( #390 )
2023-08-10 13:05:55 +01:00
b765f2c37f
Update the wasm build instructions. ( #389 )
2023-08-10 11:29:43 +01:00
66d1c093e0
This is duplicated code on Cuda 12.2.
...
Without it we can compile for 52 (but I get Operation Not supported
when actually trying to use those kernels).
2023-08-10 09:20:18 +02:00
de7c31bfe9
Merge pull request #368 from huggingface/add_cuda_ci
...
Adding cuda CI
2023-08-10 08:49:39 +02:00
8e7ef96588
Fix CI cuda.
2023-08-10 08:47:15 +02:00
f3fe730a30
Npy tweaks & error with path ( #384 )
...
* Simplify the npy writing.
* Wrap the file path so as to provide better errors.
2023-08-10 06:21:58 +01:00
c7f92f985e
Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. ( #383 )
2023-08-10 05:48:19 +01:00
3bbc08a8df
Fix randn cpu ( #382 )
...
* Change distributions
Standard generates in [0, 1), Normal is correct.
* Add test
Not sure if this is the best place to put the test
* Remove unnecessary use
2023-08-10 05:33:44 +01:00
6a2137af4f
Update README.md
2023-08-10 00:19:58 +01:00
0dc1e5f387
Merge branch 'main' into readme_fixes
2023-08-10 00:19:20 +01:00
bd2fb6216b
Testing in release mode because debug is too slow.
2023-08-09 23:19:55 +02:00
3542b26143
ssl update.
2023-08-09 23:11:45 +02:00
a690f14a77
Fix by hardcoding paths
2023-08-09 23:08:50 +02:00
90d778c059
?
2023-08-09 23:02:11 +02:00
171fcbe539
CI ssh in the meantime.
2023-08-09 22:58:47 +02:00
07e83c55c0
Attempt nb2
2023-08-09 22:47:01 +02:00
25ec2d9f6b
fix: remove incorrect unwrap ( #379 )
2023-08-09 21:45:24 +01:00
da26e2832c
Update gemm to 0.15.6. ( #378 )
2023-08-09 21:04:28 +01:00
fcfdcbd337
Add a conv1d benchmark based on the whisper sizes. ( #377 )
...
* Add a conv1d benchmark based on the whisper sizes.
* Enforce the batch-dim in conv1d.
2023-08-09 20:27:03 +01:00
653ec5abc1
Update README.md ( #376 )
...
add missing word
2023-08-09 20:09:21 +01:00
c3a0761e62
Add some tracing to the whisper example. ( #375 )
2023-08-09 19:58:36 +01:00
0cef3998fd
README.md typos and grammar fixes
2023-08-09 19:36:03 +01:00
e5f510d209
SSH to debug.
2023-08-09 19:54:40 +02:00
0dd94eff4c
Merge pull request #367 from eltociear/eltociear-patch-1
...
Update README.md
2023-08-09 19:48:31 +02:00
a3b1699409
Embed the mel filters in the whisper binary. ( #373 )
2023-08-09 18:27:26 +01:00
5b79b38bc7
Remove extra square bracket ( #372 )
2023-08-09 18:24:28 +01:00
a5c5a893aa
add max_pool2d ( #371 )
...
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local >
2023-08-09 18:05:26 +01:00
e6ce47f9e0
?
2023-08-09 19:00:25 +02:00
1892bd139c
Extract the strides in the conv ops. ( #370 )
2023-08-09 17:57:05 +01:00
749c8c7f51
Better rust GH action.
2023-08-09 18:42:53 +02:00
d9b4fef189
Chnage name
2023-08-09 18:14:29 +02:00
8fa329aca2
Adding cuda CI
2023-08-09 18:13:27 +02:00
cd225bd3b1
More testing for avg-pool2d. ( #366 )
...
* More testing for avg-pool2d.
* Another fix.
* Add a max-pool test with non-divisible kernel sizes.
2023-08-09 16:12:23 +01:00