03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
d99cac3ec3
Move the avx specific bits to a separate file. ( #481 )
2023-08-17 09:01:06 +01:00
306c8eee7a
AVX version of the vecdot for q4_0. ( #474 )
...
* AVX version of the vecdot for q4_0.
* Tweak the avx bits.
* Add a qmatmul benchmark.
* Fix the quantized test.
2023-08-17 07:03:32 +01:00
098909de40
Add vecdot for q6k-q8k. ( #476 )
...
* Add vecdot for q6k-q8k.
* Add some testing for q8k.
* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
3bedba1fce
Use a zipped iterator. ( #475 )
...
* Use a zipped iterator.
* Add to/from float for q8k.
2023-08-16 20:15:11 +01:00
575e88a999
Add a quantized test that use negative values. ( #470 )
...
* Add a quantized test that use negative values.
* Add a default tokenizer.
2023-08-16 16:32:58 +01:00
a9101700b6
Add a kv-cache to the quantized llama example. ( #466 )
...
* Add a kv-cache to the quantized llama example.
* Also print the prompt.
* Bugfix in q6k dequantizing.
* Another bugfix.
2023-08-16 14:28:42 +01:00
3071134788
Get the ggml based llama to generate some text. ( #464 )
...
* Add more stats to the ggml example.
* Build a quantized model from the file content.
* Move the tensor retrieval in the main crate.
* Start adding the forward pass.
* Add more to the forward pass of the quantized llama.
* Apply the attention layers.
* Add the sampling loop.
* Get the sampling loop to work.
* Minor tweak.
* Add a quantize/dequantize test.
* Bugfix.
* Add a comment + swap the order.
* Bugfixes.
2023-08-16 12:41:07 +01:00
965597a873
Add a test for qmatmul. ( #459 )
2023-08-16 06:36:27 +01:00
ca449f9ee1
Add quantized tensors. ( #458 )
...
* Add quantized tensors.
* Implement the debug trait for QTensor.
* Add the QMatMul custom op.
2023-08-15 22:45:53 +01:00
b8263aa15c
Quantized support for f16 and f32 ( #457 )
...
* Add f32 as a quantized type.
* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4
Split out the quantized file. ( #456 )
2023-08-15 20:26:27 +01:00
08effe3762
More quantization support ( #455 )
...
* Properly initialize wdata.
* Simplify the matmul bits.
* Add from_float for q4_0.
* Fix a couple bugs.
* Get the test to work.
* Get clippy to be happy.
2023-08-15 18:58:04 +01:00
5e49922be2
Basic quantization support ( #453 )
...
* Add a vecdot trait.
* Start implementing mul_mat.
* Add to the mul mat implementation.
* Add q8_0 quantization.
* Implement the GgmlType trait for all types.
* Add the missing block.
* Add a TODO.
2023-08-15 15:53:19 +01:00
531f23b4d0
Rename vec-dot to vec-ops. ( #449 )
...
* Rename vec-dot to vec-ops.
* Also bump the crate version.
* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
495e0b7580
Simd support ( #448 )
...
* Import the simd intrinsics in candle-core.
* simd version of reduce-sum.
* Bugfix.
* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
90374097dc
Cudnn support ( #445 )
...
* Add a cudnn feature to be used for conv2d.
* Allocate the proper workspace.
* Only create a single cudnn handle per cuda device.
* Proper cudnn usage.
* Bugfix.
2023-08-14 21:30:41 +01:00
c84883ecf2
Add a cuda kernel for upsampling. ( #441 )
...
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
a094dc503d
Add a cuda kernel for avg-pool2d. ( #440 )
...
* Add a cuda kernel for avg-pool2d.
* Avoid running out of bounds.
* Finish wiring the avg pool kernel + add some testing.
* Support for max-pool + testing.
2023-08-14 12:32:05 +01:00
34f4b3187e
Add a naive conv2d cuda kernel. ( #438 )
...
* Add a naive conv2d cuda kernel.
* Proper conv2d support on the rust side.
* Conv1d testing on gpu.
* Also use the test on gpus.
* Fix the clean-ptx target.
2023-08-14 10:34:42 +01:00
9e7e6e0288
Add dequantization for ggmls q4_0
, q4_1
, q5_0
, q5_1
and q8_0
( #407 )
...
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`
* expose `tensor_from_ggml` for external usage
* bugfixes & example
2023-08-13 23:22:57 +01:00
d379a76a9e
Add a softmax bench. ( #433 )
...
* Add a softmax bench.
* Add the vectorized sum reduce.
2023-08-13 20:09:18 +01:00
9af438ac1b
Track the conv2d operations in stable-diffusion. ( #431 )
...
* Track the conv2d operations in stable-diffusion.
* Add more tracing to stable-diffusion.
* Also trace the resnet bits.
* Trace the attention blocks.
* Also trace the attention inner part.
* Small tweak.
2023-08-13 15:58:26 +01:00
5a63b51f14
Add a matmul benchmark. ( #429 )
2023-08-13 13:41:03 +01:00
9aca398a4f
More accelerate optimizations ( #427 )
...
* Add more tracing to the whisper example.
* Support accelerate in more examples.
* Use accelerate for pointwise functions.
* Use accelerate for binary operations too.
* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
16b89f5b83
fix: can directly save the loaded weights ( #421 )
2023-08-12 16:33:29 +01:00
e12372021b
Expose the tensor write-bytes function. ( #412 )
2023-08-11 17:13:42 +01:00
01ea57da8c
Fix the conv tests. ( #409 )
2023-08-11 14:59:54 +01:00
662db45fc3
Use zero padding in conv1d and conv2d (same as pytorch). ( #408 )
2023-08-11 14:53:05 +01:00
e29c7809ec
Parallelise the CPU kernels for the conv ops. ( #401 )
...
* Parallelise the conv2d op.
* Tighter control on threading.
* Also parallelise conv1d.
* Add some safety comment.
2023-08-11 05:51:58 +01:00
a325c1aa50
Upsample test + bugfix. ( #399 )
2023-08-10 21:02:35 +02:00
94eff56aee
Optimize the cpu conv2d kernel ( #396 )
...
* Conv2d simd optimization.
* Fix the contiguous copying.
* Small tweak.
2023-08-10 17:40:09 +01:00
ff53f38467
Small example for benchmarking some cpu ops ( #394 )
...
* Refactor the benchmark example.
* Rename the example.
* Add some comments.
2023-08-10 17:00:17 +01:00
c8039579a5
Conv1d optimize ( #392 )
...
* Reorder the conv1d loops in the cpu backend.
* Optimize the 1d convolution.
* Conv1D optimize.
* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
f3fe730a30
Npy tweaks & error with path ( #384 )
...
* Simplify the npy writing.
* Wrap the file path so as to provide better errors.
2023-08-10 06:21:58 +01:00
c7f92f985e
Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. ( #383 )
2023-08-10 05:48:19 +01:00
3bbc08a8df
Fix randn cpu ( #382 )
...
* Change distributions
Standard generates in [0, 1), Normal is correct.
* Add test
Not sure if this is the best place to put the test
* Remove unnecessary use
2023-08-10 05:33:44 +01:00
25ec2d9f6b
fix: remove incorrect unwrap ( #379 )
2023-08-09 21:45:24 +01:00
fcfdcbd337
Add a conv1d benchmark based on the whisper sizes. ( #377 )
...
* Add a conv1d benchmark based on the whisper sizes.
* Enforce the batch-dim in conv1d.
2023-08-09 20:27:03 +01:00
a5c5a893aa
add max_pool2d ( #371 )
...
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local >
2023-08-09 18:05:26 +01:00
1892bd139c
Extract the strides in the conv ops. ( #370 )
2023-08-09 17:57:05 +01:00
cd225bd3b1
More testing for avg-pool2d. ( #366 )
...
* More testing for avg-pool2d.
* Another fix.
* Add a max-pool test with non-divisible kernel sizes.
2023-08-09 16:12:23 +01:00
dece0b8a76
Merge pull request #263 from huggingface/book_3
...
Book 3 (advanced loading + hub)
2023-08-09 16:50:11 +02:00
b80348d22f
Bugfix for avg-pool + add some test. ( #365 )
2023-08-09 15:44:16 +01:00
dbc6f281c9
Conv1d test with padding. ( #356 )
2023-08-09 05:45:38 +01:00
cf965ecaa8
Simplify the conv1d and conv2d code. ( #352 )
2023-08-08 22:10:59 +01:00
b9864e1357
Fix size-in-bytes for u8. ( #351 )
2023-08-08 21:15:18 +01:00
608b2358c6
Add some conv1d test + bugfix using padding. ( #349 )
2023-08-08 20:50:20 +01:00
1e6dbeac01
Add some conv2d tests. ( #347 )
...
* Add some conv2d tests.
* Add a simpler conv2d test.
* More conv2d testing + bugfix.
* Add a todo.
2023-08-08 19:02:42 +01:00
13ce68ff9b
Bugfix for conv2d. ( #343 )
2023-08-08 15:20:00 +01:00