Commit Graph

862 Commits

Author SHA1 Message Date
3071134788 Get the ggml based llama to generate some text. (#464)
* Add more stats to the ggml example.

* Build a quantized model from the file content.

* Move the tensor retrieval in the main crate.

* Start adding the forward pass.

* Add more to the forward pass of the quantized llama.

* Apply the attention layers.

* Add the sampling loop.

* Get the sampling loop to work.

* Minor tweak.

* Add a quantize/dequantize test.

* Bugfix.

* Add a comment + swap the order.

* Bugfixes.
2023-08-16 12:41:07 +01:00
fec87e86f5 Merge pull request #465 from huggingface/llama_hub_config
Using the real config from the hub when available.
2023-08-16 13:28:59 +02:00
33c882ea74 Clippy. 2023-08-16 10:41:00 +02:00
76804730c6 Using the real config from the hub when available. 2023-08-16 10:36:01 +02:00
965597a873 Add a test for qmatmul. (#459) 2023-08-16 06:36:27 +01:00
ca449f9ee1 Add quantized tensors. (#458)
* Add quantized tensors.

* Implement the debug trait for QTensor.

* Add the QMatMul custom op.
2023-08-15 22:45:53 +01:00
b8263aa15c Quantized support for f16 and f32 (#457)
* Add f32 as a quantized type.

* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4 Split out the quantized file. (#456) 2023-08-15 20:26:27 +01:00
08effe3762 More quantization support (#455)
* Properly initialize wdata.

* Simplify the matmul bits.

* Add from_float for q4_0.

* Fix a couple bugs.

* Get the test to work.

* Get clippy to be happy.
2023-08-15 18:58:04 +01:00
8ad4a21ffc Add a basic optimizer example. (#454) 2023-08-15 17:19:18 +01:00
5e49922be2 Basic quantization support (#453)
* Add a vecdot trait.

* Start implementing mul_mat.

* Add to the mul mat implementation.

* Add q8_0 quantization.

* Implement the GgmlType trait for all types.

* Add the missing block.

* Add a TODO.
2023-08-15 15:53:19 +01:00
ebcfd96d94 add c++17 flags (#452) 2023-08-15 15:29:34 +01:00
5b1690fffa Tweak the llama example. (#450) 2023-08-15 12:18:20 +01:00
3cc87058b7 Support local weights & dynamic outputs (#447)
* Support local weights & dynamic outputs

* Revise as suggested

* Cargo code format
2023-08-15 11:51:57 +01:00
531f23b4d0 Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
495e0b7580 Simd support (#448)
* Import the simd intrinsics in candle-core.

* simd version of reduce-sum.

* Bugfix.

* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
90374097dc Cudnn support (#445)
* Add a cudnn feature to be used for conv2d.

* Allocate the proper workspace.

* Only create a single cudnn handle per cuda device.

* Proper cudnn usage.

* Bugfix.
2023-08-14 21:30:41 +01:00
c84883ecf2 Add a cuda kernel for upsampling. (#441)
* Add a cuda kernel for upsampling.

* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
a094dc503d Add a cuda kernel for avg-pool2d. (#440)
* Add a cuda kernel for avg-pool2d.

* Avoid running out of bounds.

* Finish wiring the avg pool kernel + add some testing.

* Support for max-pool + testing.
2023-08-14 12:32:05 +01:00
34f4b3187e Add a naive conv2d cuda kernel. (#438)
* Add a naive conv2d cuda kernel.

* Proper conv2d support on the rust side.

* Conv1d testing on gpu.

* Also use the test on gpus.

* Fix the clean-ptx target.
2023-08-14 10:34:42 +01:00
eab54e4490 Fix the tests for mkl. (#437) 2023-08-14 08:09:27 +01:00
9e7e6e0288 Add dequantization for ggmls q4_0, q4_1, q5_0, q5_1 and q8_0 (#407)
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`

* expose `tensor_from_ggml` for external usage

* bugfixes & example
2023-08-13 23:22:57 +01:00
8bd2b22b33 Optimize the logit computations in the whisper example. (#434) 2023-08-13 22:00:13 +01:00
d379a76a9e Add a softmax bench. (#433)
* Add a softmax bench.

* Add the vectorized sum reduce.
2023-08-13 20:09:18 +01:00
9af438ac1b Track the conv2d operations in stable-diffusion. (#431)
* Track the conv2d operations in stable-diffusion.

* Add more tracing to stable-diffusion.

* Also trace the resnet bits.

* Trace the attention blocks.

* Also trace the attention inner part.

* Small tweak.
2023-08-13 15:58:26 +01:00
b1ff78f762 Allow using accelerate with stable-diffusion. (#430) 2023-08-13 14:14:20 +01:00
5a63b51f14 Add a matmul benchmark. (#429) 2023-08-13 13:41:03 +01:00
6d694554b8 Support longer sequences in language detection. (#428) 2023-08-13 13:16:15 +01:00
9aca398a4f More accelerate optimizations (#427)
* Add more tracing to the whisper example.

* Support accelerate in more examples.

* Use accelerate for pointwise functions.

* Use accelerate for binary operations too.

* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
60cd1551ca Add a KV cache to whisper. (#426) 2023-08-12 21:17:08 +01:00
a0908d212c Add a -language argument. (#425) 2023-08-12 17:08:40 +01:00
972078e1ae Update the readme with the discord server and common errors. (#423) 2023-08-12 16:45:58 +01:00
16b89f5b83 fix: can directly save the loaded weights (#421) 2023-08-12 16:33:29 +01:00
0741ebbd51 More multilingual support for whisper. (#419)
* More multilingual support for whisper.

* Use the language token appropriately.
2023-08-12 15:32:52 +01:00
0c3f109faa Basic multilingual support for whisper (#417)
* Multi-lingual support for whisper.

* Avoid hardcoding the token names.

* More multi-lingual support.

* Remove the todo.
2023-08-12 11:23:04 +01:00
2ba6b2826f Fix the readme instructions for stable-diffusion. (#415) 2023-08-11 18:59:04 +01:00
1d0157bbc4 Stable diffusion: retrieve the model files from the HF hub. (#414)
* Retrieve the model files from the HF hub in the stable diffusion example.

* Add to the readme.
2023-08-11 18:57:06 +01:00
91dbf907d3 Add more whisper variants. (#413) 2023-08-11 17:33:55 +01:00
e12372021b Expose the tensor write-bytes function. (#412) 2023-08-11 17:13:42 +01:00
55e428c8ae Expose the varmap inner data. (#411) 2023-08-11 16:58:56 +01:00
01ea57da8c Fix the conv tests. (#409) 2023-08-11 14:59:54 +01:00
662db45fc3 Use zero padding in conv1d and conv2d (same as pytorch). (#408) 2023-08-11 14:53:05 +01:00
906c0f3eb5 Remove the checkpoint conversion script. (#405)
* Remove the checkpoint conversion script.

* Remove references to the script.
2023-08-11 05:59:48 +01:00
e29c7809ec Parallelise the CPU kernels for the conv ops. (#401)
* Parallelise the conv2d op.

* Tighter control on threading.

* Also parallelise conv1d.

* Add some safety comment.
2023-08-11 05:51:58 +01:00
a325c1aa50 Upsample test + bugfix. (#399) 2023-08-10 21:02:35 +02:00
b6cf26e48e Merge pull request #393 from huggingface/older_gpus
Working on older GPUs (still not compute 52 it seems but > 6 could be OK)
2023-08-10 20:49:23 +02:00
379eadc68e Working now. 2023-08-10 19:43:25 +02:00
7e4fbc1e17 [DO NOT MERGE] temporary PR so users can try out on older GPUs. 2023-08-10 19:36:31 +02:00
80f0482f26 Fix the stable-diffusion vae. (#398)
* Fix the stable-diffusion vae.

* Fix for saving images.
2023-08-10 18:24:31 +01:00
94eff56aee Optimize the cpu conv2d kernel (#396)
* Conv2d simd optimization.

* Fix the contiguous copying.

* Small tweak.
2023-08-10 17:40:09 +01:00