26e1b40992
Repeat-penalty in the falcon example. ( #634 )
2023-08-28 08:13:40 +01:00
1da71a5da1
Neon optimized version of the q4k vecdot product. ( #632 )
2023-08-27 21:30:47 +01:00
24dda44c27
Add wasm support for yolo-v8 pose detection. ( #630 )
...
* Add wasm support for yolo-v8 pose detection.
* Better bbox handling.
* Add the pose model in the wasm example lib.
2023-08-27 19:49:24 +01:00
72ebb12bca
Remove some dead-code annotations. ( #629 )
...
* Remove some dead-code annotations.
* More dead code removal.
* One more.
* CI fix.
2023-08-27 18:52:33 +01:00
a3f97c143d
Bump the crate version + update CHANGELOG. ( #628 )
2023-08-27 18:17:11 +01:00
4c338b0cd9
VarBuilder cleanup ( #627 )
...
* VarBuilder cleanup.
* Implement the basic varbuilders.
* Add the sharded code.
* Proper support for tensor sharding.
2023-08-27 18:03:26 +01:00
be471d50ab
Llama quantization. ( #625 )
2023-08-27 14:08:15 +01:00
7151f2cf63
Add the quantize command. ( #624 )
...
* Add the quantize command.
* Bugfix for writing gguf files.
* And add a comment.
2023-08-27 11:35:19 +01:00
6e485f2deb
Add some optional repeat penalty. ( #623 )
...
* Add some optional repeat penalty.
* Add the missing files.
2023-08-27 10:48:45 +01:00
5320aa6b7d
Move the test-utils bits to a shared place. ( #619 )
2023-08-27 09:42:22 +01:00
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
28658054ff
More missing quantized bits. ( #615 )
...
* Q4_1 support.
* Add Q5_1 quantization.
* Tweak.
2023-08-27 07:52:26 +01:00
ab36a7f3e3
Fix for when f16c is not available. ( #614 )
2023-08-27 07:19:52 +01:00
f704e39761
Missing quants ops ( #611 )
...
* Another transmute tweak.
* Changelog tweak.
* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
fdf15f0e05
Another transmute tweak. ( #610 )
...
* Another transmute tweak.
* Changelog tweak.
2023-08-26 13:00:24 +01:00
06b37ea7ad
Avoid using tmp values. ( #609 )
2023-08-26 12:28:28 +01:00
c72eb3d75b
Add reference implementation for q4k
and q5k
( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
864227edbf
[WIP] Improve Yolo WASM UI example ( #591 )
...
* return detections with classes names
* ignore .DS_Store
* example how to load wasm module
* add param to set model size
* add param for model size
* accept iou and confidence threshold on run
* conf and iou thresholds
* clamp only
* remove images from branch
* a couple of renamings, add readme with instructions
* final design
* minor font + border update
2023-08-26 11:40:41 +01:00
b23b347b35
Merge pull request #601 from huggingface/repair_bf16_f16_cast
...
Repairing cast bf16/f16
2023-08-26 12:34:41 +02:00
71518caeee
Align tensor device print more with PyTorch ( #590 )
...
* Improve tensor print
* Use CudaDevice only if enabled with cuda feature
* run rust fmt
* up
* improve
* rustfmt
2023-08-26 11:20:22 +01:00
6559eae72c
Avoid some transmutes. ( #607 )
2023-08-25 18:21:37 +01:00
46eb225ba5
Add some missing entries to the changelog. ( #606 )
2023-08-25 18:01:38 +01:00
aa67e5107d
Merge pull request #600 from huggingface/codellama_gpu_support
...
Adding support for codellama in examples.
2023-08-25 18:25:26 +02:00
c105550405
s/panic/bail/
2023-08-25 18:05:07 +02:00
ca6c050b04
Cleanup the pose reporting code. ( #605 )
2023-08-25 16:49:21 +01:00
9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. ( #604 )
...
* Neon intrinsics for the q8_0 vecdot.
* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
0afbc435df
Add some configurable legend for yolo detection. ( #603 )
...
* Add some configurable legend for yolo detection.
* Clippyness.
2023-08-25 13:50:31 +01:00
d4e75d5825
Let's keep the dirty code on its own.
2023-08-25 12:01:58 +00:00
be371e827c
Intermediary float cast is necessary for cuda 11.8
2023-08-25 11:54:30 +00:00
97909e5068
Move the yolo model bits in a separate file. ( #602 )
...
* Move the yolo model bits in a separate file.
* Improve the drawing.
* Bugfix.
2023-08-25 12:47:55 +01:00
1c1e34735e
static_cast
?
2023-08-25 11:40:36 +00:00
db8bab8b7a
Different casting ?
2023-08-25 10:49:22 +00:00
bc131b402b
Repairing cast bf16/f16
2023-08-25 10:38:19 +00:00
8bc5fffa45
More support for pose estimation in yolo-v8. ( #599 )
...
* More support for pose estimation in yolo-v8.
* Support both object detection and pose-estimation in the yolo-v8 example.
2023-08-25 11:21:11 +01:00
4826a4212e
Adding support for codellama in examples.
...
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
afc10a3232
AVX version for the q8-0 multiplications. ( #598 )
2023-08-25 10:14:49 +01:00
d728e646c2
Use resolver 2 explicitely. ( #597 )
2023-08-25 09:35:40 +01:00
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00
d8ba0452dc
Fail on bf16. ( #594 )
2023-08-25 06:10:38 +01:00
189442a0fa
Add the pose estimation head for yolo. ( #589 )
...
* Add the pose estimation head for yolo.
* Properly handle the added position dimensions.
* Integrate the pose estimation head in the forward pass.
* Renaming.
* Fix for pose estimation.
2023-08-24 22:12:34 +01:00
2cde0cb74b
More pickle support. ( #588 )
...
* More pickle support.
* Be more verbose.
2023-08-24 18:45:10 +01:00
e21c686cdc
Fixes for clippy 1.72. ( #587 )
2023-08-24 17:46:17 +01:00
c265ac50fa
Add a function to write gguf files. ( #585 )
...
* Add a function to write gguf files.
* More GGUF file writing.
* Write the tensor data in GGUF files.
2023-08-24 17:03:06 +01:00
a87c6f7652
Merge pull request #561 from patrickvonplaten/add_installation
...
Improve installation section and "get started"
2023-08-24 16:25:52 +02:00
afd965f77c
More non square testing ( #582 )
...
* Add more non square testing.
* More testing.
2023-08-24 13:01:04 +01:00
d2f42ab086
Referenze implementations of q2k
and q3k
vec-dot functions ( #580 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
ca318a6ec7
Add to the cuda example a reproduction of the issue. ( #579 )
...
* Add to the cuda example a reproduction of the issue.
* Tweak.
* Add a test using non-square matrixes.
* Fix the conv2d kernel.
* Display the error.
* And tweak the comment.
2023-08-24 12:07:31 +01:00
dd64465899
Add a test for conv2d with padding + bugfix the random number generation on cuda. ( #578 )
...
* Add a test for conv2d with padding.
* Cosmetic changes.
* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
79916c2edb
Use the hub weights for efficientnet. ( #573 )
2023-08-23 18:20:21 +01:00