Commit Graph

795 Commits

Author SHA1 Message Date
6e485f2deb Add some optional repeat penalty. (#623)
* Add some optional repeat penalty.

* Add the missing files.
2023-08-27 10:48:45 +01:00
aa67e5107d Merge pull request #600 from huggingface/codellama_gpu_support
Adding support for codellama in examples.
2023-08-25 18:25:26 +02:00
c105550405 s/panic/bail/ 2023-08-25 18:05:07 +02:00
ca6c050b04 Cleanup the pose reporting code. (#605) 2023-08-25 16:49:21 +01:00
0afbc435df Add some configurable legend for yolo detection. (#603)
* Add some configurable legend for yolo detection.

* Clippyness.
2023-08-25 13:50:31 +01:00
97909e5068 Move the yolo model bits in a separate file. (#602)
* Move the yolo model bits in a separate file.

* Improve the drawing.

* Bugfix.
2023-08-25 12:47:55 +01:00
8bc5fffa45 More support for pose estimation in yolo-v8. (#599)
* More support for pose estimation in yolo-v8.

* Support both object detection and pose-estimation in the yolo-v8 example.
2023-08-25 11:21:11 +01:00
4826a4212e Adding support for codellama in examples.
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
c093b03d51 Generic implementation of vecdot for q80. (#596)
* Generic implementation of vecdot for q80.

* Add support for code-llama 7b.

* Support more code-llama.
2023-08-25 09:04:05 +01:00
189442a0fa Add the pose estimation head for yolo. (#589)
* Add the pose estimation head for yolo.

* Properly handle the added position dimensions.

* Integrate the pose estimation head in the forward pass.

* Renaming.

* Fix for pose estimation.
2023-08-24 22:12:34 +01:00
79916c2edb Use the hub weights for efficientnet. (#573) 2023-08-23 18:20:21 +01:00
431051cc32 Add Efficientnet (#572)
* EfficientNet.

* Complete the efficientnet implementation.

* Improve group handling.

* Get the efficientnet to work.
2023-08-23 18:02:58 +01:00
eedd85ffa7 Move the imagenet specific bits to a separate file. (#571) 2023-08-23 16:42:09 +01:00
329f661d9b Trace softmax (#568)
* Trace the softmax op.

* Inline the sum.

* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
aba1e90797 Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
4ee1cf038a Get the rms epsilon from GGUF. (#565) 2023-08-23 11:40:20 +01:00
0f4ff8a739 Fix the quantized example. (#564) 2023-08-23 11:09:55 +01:00
89a00b56cc add chat models in quantized example (#551)
* add chat models in quantized example

* cargo fmt
2023-08-23 11:05:33 +01:00
508d34daf2 GGUF support in the quantized model. (#559)
* GGUF support in the quantized model.

* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
f9ecc84477 GQA support in the quantized model. (#555)
* GQA support in the quantized model.

* Fix the reshaping.

* Fix the main llama model.

* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
cc22d4db20 Put the transcribe token before the language one. (#553) 2023-08-22 16:46:34 +01:00
9bc811a247 Improve the aspect ratio handling on yolo-v8. (#549)
* Fix the aspect ratio handling in yolo-v8.

* Typo.
2023-08-22 14:55:33 +01:00
bb69d89e28 Move the yolo shared bits to a common place. (#548)
* Move the yolo shared bits to a common place.

* Share more code.

* Configurable thresholds.
2023-08-22 13:03:07 +01:00
20ce3e9f39 Sketch the yolo wasm example. (#546)
* Sketch the yolo wasm example.

* Web ui.

* Get the web ui to work.

* UI tweaks.

* More UI tweaks.

* Use the natural width/height.

* Add a link to the hf space in the readme.
2023-08-22 11:56:43 +01:00
44420d8ae1 Add some llama-v2 variants. (#545) 2023-08-22 08:35:15 +01:00
f16bb97401 Use the yolo-v8 weights from the hub. (#544)
* Use the weights from the hub.

* Add to the readme.
2023-08-21 22:07:36 +01:00
3507e14c0c Yolo v8 fixes (#542)
* Fixes for the yolo-v8 layout.

* Bugfixes.

* Another silly bugfix.

* Remove the hf-hub dependency.

* Remove the transformers dependency.
2023-08-21 21:05:40 +01:00
de50e66af1 Add yolo v8 as an example (#541)
* Sketching yolo-v8.

* Get the model to load.

* yolo-v8 forward pass.

* Complete(?) the forward pass.

* Fix some shape issues.

* Add the missing padding.

* Process the predictions.
2023-08-21 18:40:09 +01:00
cc2d6cf2e0 Improve the timestamps support in whisper (#539)
* Timestamp support for whisper.

* Properly display the timestamps.

* Bugfix for the timestamp units.
2023-08-21 12:26:59 +01:00
e3b71851e6 Retrieve the yolo-v3 weights from the hub. (#537) 2023-08-21 10:55:09 +01:00
4300864ce9 Add some optional repeat penalty. (#535) 2023-08-21 09:59:13 +01:00
11c7e7bd67 Some fixes for yolo-v3. (#529)
* Some fixes for yolo-v3.

* Use the running stats for inference in the batch-norm layer.

* Get some proper predictions for yolo.

* Avoid the quadratic insertion.
2023-08-20 23:19:15 +01:00
a1812f934f Add a yolo-v3 example. (#528)
* Add a couple functions required for yolo.

* Add the yolo-v3 example.

* Add minimum and maximum.

* Use the newly introduced maximum.

* Cuda support for min/max + add some testing.

* Allow for more tests to work with accelerate.

* Fix a typo.
2023-08-20 18:19:37 +01:00
a8f61e66cc Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
aa207f2dd9 Print some per-step timings in stable-diffusion. (#520)
* Skeleton files for neon support of quantization.

* SIMD version for q4 vecdot.

* Also simdify the q6k multiplication.

* Add some timings to stable-diffusion.
2023-08-20 05:45:12 +01:00
d73ca3d28e Line up the llama.cpp implementation with the candle one. (#518)
* Separate the prompt stats from the post-prompt ones in the quantized example.

* Slightly nicer output printing.

* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
b64e782c2d Use the hub to retrieve dinov2 model weights. (#507) 2023-08-18 18:27:31 +01:00
e5dd5fd1b3 Print the recognized categories in dino-v2. (#506) 2023-08-18 17:32:58 +01:00
cb069d6063 Add the permute op (similar to pytorch). (#504)
* Add the permute op (similar to pytorch).

* Add the backprop for dimension permutation.
2023-08-18 16:30:53 +01:00
4f1541526c dinov2 - read images from disk and compute the class probabilities (#503)
* Load the image from disk and convert it to a tensor.

* Tweak the function name.
2023-08-18 15:50:33 +01:00
95462c6a2e Add a vision transformer example (dino-v2). (#502)
* Add a vision transformer example (dino-v2).

* Add some documentation + test.

* CI fix.

* Another fix (still unable to replicate the errors locally :( )
2023-08-18 11:58:06 +01:00
b9661a1c25 Enable the image crate by default in examples (#501)
* Enable the image crate by default so that it's easier to compile the stable diffusion example.

* Also update the readme.
2023-08-18 10:00:05 +01:00
c78ce76501 Add a simple Module trait and implement it for the various nn layers (#500)
* Start adding the module trait.

* Use the module trait.

* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
13401df4d1 Add an abstract type for RmsNorm. (#499) 2023-08-18 08:52:14 +01:00
26fd37b348 Use the main branch of the HF repo where possible. (#498)
* Use the main branch of the HF repo where possible.

* And add the large model.
2023-08-18 08:18:30 +01:00
f056dcab21 Add medium model (#497) 2023-08-18 08:08:59 +01:00
557b2c28dd Q6K quantization (#495)
* Print the detected arch options.

* Add the q6k quantization.

* Add a currently broken test.

* Bugfix.

* Bugfix.

* Another bugfix.

* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
3164cd24fa Replicate the sot-token logic from the Python implementation more acc… (#491)
* Replicate the sot-token logic from the Python implementation more accurately.

* Add a flag to control the timestamp mode.
2023-08-17 16:59:36 +01:00
5f30c1e1e0 Add the whisper small model. (#490) 2023-08-17 15:48:34 +01:00
ad7c53953b Add a verbose-prompt mode, similar to llama.cpp. (#489) 2023-08-17 15:26:44 +01:00