Commit Graph

1423 Commits

Author SHA1 Message Date
4826a4212e Adding support for codellama in examples.
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
afc10a3232 AVX version for the q8-0 multiplications. (#598) 2023-08-25 10:14:49 +01:00
d728e646c2 Use resolver 2 explicitely. (#597) 2023-08-25 09:35:40 +01:00
c093b03d51 Generic implementation of vecdot for q80. (#596)
* Generic implementation of vecdot for q80.

* Add support for code-llama 7b.

* Support more code-llama.
2023-08-25 09:04:05 +01:00
d8ba0452dc Fail on bf16. (#594) 2023-08-25 06:10:38 +01:00
189442a0fa Add the pose estimation head for yolo. (#589)
* Add the pose estimation head for yolo.

* Properly handle the added position dimensions.

* Integrate the pose estimation head in the forward pass.

* Renaming.

* Fix for pose estimation.
2023-08-24 22:12:34 +01:00
2cde0cb74b More pickle support. (#588)
* More pickle support.

* Be more verbose.
2023-08-24 18:45:10 +01:00
e21c686cdc Fixes for clippy 1.72. (#587) 2023-08-24 17:46:17 +01:00
c265ac50fa Add a function to write gguf files. (#585)
* Add a function to write gguf files.

* More GGUF file writing.

* Write the tensor data in GGUF files.
2023-08-24 17:03:06 +01:00
a87c6f7652 Merge pull request #561 from patrickvonplaten/add_installation
Improve installation section and "get started"
2023-08-24 16:25:52 +02:00
afd965f77c More non square testing (#582)
* Add more non square testing.

* More testing.
2023-08-24 13:01:04 +01:00
d2f42ab086 Referenze implementations of q2k and q3k vec-dot functions (#580)
* add `q2k` vec-dot

* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
ca318a6ec7 Add to the cuda example a reproduction of the issue. (#579)
* Add to the cuda example a reproduction of the issue.

* Tweak.

* Add a test using non-square matrixes.

* Fix the conv2d kernel.

* Display the error.

* And tweak the comment.
2023-08-24 12:07:31 +01:00
dd64465899 Add a test for conv2d with padding + bugfix the random number generation on cuda. (#578)
* Add a test for conv2d with padding.

* Cosmetic changes.

* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
79916c2edb Use the hub weights for efficientnet. (#573) 2023-08-23 18:20:21 +01:00
431051cc32 Add Efficientnet (#572)
* EfficientNet.

* Complete the efficientnet implementation.

* Improve group handling.

* Get the efficientnet to work.
2023-08-23 18:02:58 +01:00
eedd85ffa7 Move the imagenet specific bits to a separate file. (#571) 2023-08-23 16:42:09 +01:00
7478dda255 Cosmetic tweaks. (#570) 2023-08-23 15:45:40 +01:00
329f661d9b Trace softmax (#568)
* Trace the softmax op.

* Inline the sum.

* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
075b505480 Mirror GGML's unit tests (#569)
* Add ggml unit tests

* simplify random matmul test for other test cases
2023-08-23 15:25:17 +01:00
aba1e90797 Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
1f58bdbb1d Apply suggestions from code review 2023-08-23 13:33:45 +02:00
c98d3cfd8b Update candle-book/src/guide/installation.md 2023-08-23 13:31:54 +02:00
c5e43ad0ab Apply suggestions from code review 2023-08-23 13:27:29 +02:00
2c280007e8 Apply suggestions from code review 2023-08-23 13:26:21 +02:00
4ee1cf038a Get the rms epsilon from GGUF. (#565) 2023-08-23 11:40:20 +01:00
0f4ff8a739 Fix the quantized example. (#564) 2023-08-23 11:09:55 +01:00
89a00b56cc add chat models in quantized example (#551)
* add chat models in quantized example

* cargo fmt
2023-08-23 11:05:33 +01:00
9a5c7db91a Add support for i64 (#563)
* Add the i64 dtype.

* Adapt the cuda kernels.
2023-08-23 10:42:19 +01:00
649202024c fix code snippets 2023-08-23 09:05:07 +00:00
283f6c048d fix code snippets 2023-08-23 09:04:36 +00:00
c8211fc474 fix code snippets 2023-08-23 09:04:08 +00:00
7732bf6238 correct 2023-08-23 08:54:48 +00:00
7c0ca80d3a move installation to book 2023-08-23 08:52:53 +00:00
b558d08b85 improve 2023-08-23 08:42:47 +00:00
34cb9f924f improve 2023-08-23 08:40:23 +00:00
d4968295a0 improve 2023-08-23 08:37:08 +00:00
65e146c72d Add installation section 2023-08-23 08:32:59 +00:00
3743bed2d7 Fix the ? operator cannot be applied to type Device of example (#560)
According to the API:

```rust
inp = inp.to_device(&Device::Cuda(0)?)?;
```

cannot work as `Cuda(...)` expects a type `Device` not an integer.

I'd recommend to instead use `new_cuda(...)`
2023-08-23 09:29:50 +01:00
508d34daf2 GGUF support in the quantized model. (#559)
* GGUF support in the quantized model.

* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
0764741cc4 Handle GGUF files in tensor-tools. (#558) 2023-08-23 06:32:07 +01:00
6a30ecefad Preliminary GGUF support. (#557)
* Preliminary GGUF support.

* Tensor reading.
2023-08-23 00:14:10 +01:00
7687a0f453 Also fix the aspect ratio in the wasm example. (#556)
* Also fix the aspect ratio in the wasm example.

* Add the yolo lib.

* Update the build script.
2023-08-22 22:20:08 +01:00
f9ecc84477 GQA support in the quantized model. (#555)
* GQA support in the quantized model.

* Fix the reshaping.

* Fix the main llama model.

* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
07067b01dc Avoid some mutable variables (take 2). (#554)
* Avoid some mutable variables (take 2).

* Fix.
2023-08-22 18:51:20 +01:00
cc22d4db20 Put the transcribe token before the language one. (#553) 2023-08-22 16:46:34 +01:00
ec665acad7 Revert "Avoid some mut in quantized functions. (#550)" (#552)
This reverts commit cf27b9b636.
2023-08-22 15:57:46 +01:00
cf27b9b636 Avoid some mut in quantized functions. (#550)
* Avoid a couple more 'let mut'.

* Tweaks.
2023-08-22 15:44:26 +01:00
352383cbc3 Add quantization support for q2k, q3k, q4k and q5k (#524)
* first q2 implementation

* First Q4K and Q5K implementations

* fix `q2k` and `q5k`

* Some first cleanups

* run `clippy` on tests

* finally implement `q3k`

* deactivate `q3k` test on macos

* also disable the test on linux

* Fix floating bits in `q3k` dequantization

* Refactoring pass + reorder quants in file

* `fmt`

* Re-add `src` asserts and redefine `dst`
2023-08-22 15:04:55 +01:00
9bc811a247 Improve the aspect ratio handling on yolo-v8. (#549)
* Fix the aspect ratio handling in yolo-v8.

* Typo.
2023-08-22 14:55:33 +01:00