26e1b40992
Repeat-penalty in the falcon example. ( #634 )
2023-08-28 08:13:40 +01:00
72ebb12bca
Remove some dead-code annotations. ( #629 )
...
* Remove some dead-code annotations.
* More dead code removal.
* One more.
* CI fix.
2023-08-27 18:52:33 +01:00
4c338b0cd9
VarBuilder cleanup ( #627 )
...
* VarBuilder cleanup.
* Implement the basic varbuilders.
* Add the sharded code.
* Proper support for tensor sharding.
2023-08-27 18:03:26 +01:00
6e485f2deb
Add some optional repeat penalty. ( #623 )
...
* Add some optional repeat penalty.
* Add the missing files.
2023-08-27 10:48:45 +01:00
aa67e5107d
Merge pull request #600 from huggingface/codellama_gpu_support
...
Adding support for codellama in examples.
2023-08-25 18:25:26 +02:00
c105550405
s/panic/bail/
2023-08-25 18:05:07 +02:00
ca6c050b04
Cleanup the pose reporting code. ( #605 )
2023-08-25 16:49:21 +01:00
0afbc435df
Add some configurable legend for yolo detection. ( #603 )
...
* Add some configurable legend for yolo detection.
* Clippyness.
2023-08-25 13:50:31 +01:00
97909e5068
Move the yolo model bits in a separate file. ( #602 )
...
* Move the yolo model bits in a separate file.
* Improve the drawing.
* Bugfix.
2023-08-25 12:47:55 +01:00
8bc5fffa45
More support for pose estimation in yolo-v8. ( #599 )
...
* More support for pose estimation in yolo-v8.
* Support both object detection and pose-estimation in the yolo-v8 example.
2023-08-25 11:21:11 +01:00
4826a4212e
Adding support for codellama in examples.
...
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00
189442a0fa
Add the pose estimation head for yolo. ( #589 )
...
* Add the pose estimation head for yolo.
* Properly handle the added position dimensions.
* Integrate the pose estimation head in the forward pass.
* Renaming.
* Fix for pose estimation.
2023-08-24 22:12:34 +01:00
79916c2edb
Use the hub weights for efficientnet. ( #573 )
2023-08-23 18:20:21 +01:00
431051cc32
Add Efficientnet ( #572 )
...
* EfficientNet.
* Complete the efficientnet implementation.
* Improve group handling.
* Get the efficientnet to work.
2023-08-23 18:02:58 +01:00
eedd85ffa7
Move the imagenet specific bits to a separate file. ( #571 )
2023-08-23 16:42:09 +01:00
329f661d9b
Trace softmax ( #568 )
...
* Trace the softmax op.
* Inline the sum.
* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
aba1e90797
Add some group parameter to convolutions. ( #566 )
...
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
2023-08-23 12:58:55 +01:00
4ee1cf038a
Get the rms epsilon from GGUF. ( #565 )
2023-08-23 11:40:20 +01:00
0f4ff8a739
Fix the quantized example. ( #564 )
2023-08-23 11:09:55 +01:00
89a00b56cc
add chat models in quantized example ( #551 )
...
* add chat models in quantized example
* cargo fmt
2023-08-23 11:05:33 +01:00
508d34daf2
GGUF support in the quantized model. ( #559 )
...
* GGUF support in the quantized model.
* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
f9ecc84477
GQA support in the quantized model. ( #555 )
...
* GQA support in the quantized model.
* Fix the reshaping.
* Fix the main llama model.
* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
cc22d4db20
Put the transcribe token before the language one. ( #553 )
2023-08-22 16:46:34 +01:00
9bc811a247
Improve the aspect ratio handling on yolo-v8. ( #549 )
...
* Fix the aspect ratio handling in yolo-v8.
* Typo.
2023-08-22 14:55:33 +01:00
bb69d89e28
Move the yolo shared bits to a common place. ( #548 )
...
* Move the yolo shared bits to a common place.
* Share more code.
* Configurable thresholds.
2023-08-22 13:03:07 +01:00
20ce3e9f39
Sketch the yolo wasm example. ( #546 )
...
* Sketch the yolo wasm example.
* Web ui.
* Get the web ui to work.
* UI tweaks.
* More UI tweaks.
* Use the natural width/height.
* Add a link to the hf space in the readme.
2023-08-22 11:56:43 +01:00
44420d8ae1
Add some llama-v2 variants. ( #545 )
2023-08-22 08:35:15 +01:00
f16bb97401
Use the yolo-v8 weights from the hub. ( #544 )
...
* Use the weights from the hub.
* Add to the readme.
2023-08-21 22:07:36 +01:00
3507e14c0c
Yolo v8 fixes ( #542 )
...
* Fixes for the yolo-v8 layout.
* Bugfixes.
* Another silly bugfix.
* Remove the hf-hub dependency.
* Remove the transformers dependency.
2023-08-21 21:05:40 +01:00
de50e66af1
Add yolo v8 as an example ( #541 )
...
* Sketching yolo-v8.
* Get the model to load.
* yolo-v8 forward pass.
* Complete(?) the forward pass.
* Fix some shape issues.
* Add the missing padding.
* Process the predictions.
2023-08-21 18:40:09 +01:00
cc2d6cf2e0
Improve the timestamps support in whisper ( #539 )
...
* Timestamp support for whisper.
* Properly display the timestamps.
* Bugfix for the timestamp units.
2023-08-21 12:26:59 +01:00
e3b71851e6
Retrieve the yolo-v3 weights from the hub. ( #537 )
2023-08-21 10:55:09 +01:00
4300864ce9
Add some optional repeat penalty. ( #535 )
2023-08-21 09:59:13 +01:00
11c7e7bd67
Some fixes for yolo-v3. ( #529 )
...
* Some fixes for yolo-v3.
* Use the running stats for inference in the batch-norm layer.
* Get some proper predictions for yolo.
* Avoid the quadratic insertion.
2023-08-20 23:19:15 +01:00
a1812f934f
Add a yolo-v3 example. ( #528 )
...
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
2023-08-20 18:19:37 +01:00
aa207f2dd9
Print some per-step timings in stable-diffusion. ( #520 )
...
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
* Add some timings to stable-diffusion.
2023-08-20 05:45:12 +01:00
d73ca3d28e
Line up the llama.cpp implementation with the candle one. ( #518 )
...
* Separate the prompt stats from the post-prompt ones in the quantized example.
* Slightly nicer output printing.
* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
b64e782c2d
Use the hub to retrieve dinov2 model weights. ( #507 )
2023-08-18 18:27:31 +01:00
e5dd5fd1b3
Print the recognized categories in dino-v2. ( #506 )
2023-08-18 17:32:58 +01:00
cb069d6063
Add the permute op (similar to pytorch). ( #504 )
...
* Add the permute op (similar to pytorch).
* Add the backprop for dimension permutation.
2023-08-18 16:30:53 +01:00
4f1541526c
dinov2 - read images from disk and compute the class probabilities ( #503 )
...
* Load the image from disk and convert it to a tensor.
* Tweak the function name.
2023-08-18 15:50:33 +01:00
95462c6a2e
Add a vision transformer example (dino-v2). ( #502 )
...
* Add a vision transformer example (dino-v2).
* Add some documentation + test.
* CI fix.
* Another fix (still unable to replicate the errors locally :( )
2023-08-18 11:58:06 +01:00
c78ce76501
Add a simple Module trait and implement it for the various nn layers ( #500 )
...
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
13401df4d1
Add an abstract type for RmsNorm. ( #499 )
2023-08-18 08:52:14 +01:00
26fd37b348
Use the main branch of the HF repo where possible. ( #498 )
...
* Use the main branch of the HF repo where possible.
* And add the large model.
2023-08-18 08:18:30 +01:00
f056dcab21
Add medium model ( #497 )
2023-08-18 08:08:59 +01:00
557b2c28dd
Q6K quantization ( #495 )
...
* Print the detected arch options.
* Add the q6k quantization.
* Add a currently broken test.
* Bugfix.
* Bugfix.
* Another bugfix.
* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
3164cd24fa
Replicate the sot-token logic from the Python implementation more acc… ( #491 )
...
* Replicate the sot-token logic from the Python implementation more accurately.
* Add a flag to control the timestamp mode.
2023-08-17 16:59:36 +01:00
5f30c1e1e0
Add the whisper small model. ( #490 )
2023-08-17 15:48:34 +01:00