f9ecc84477
GQA support in the quantized model. ( #555 )
...
* GQA support in the quantized model.
* Fix the reshaping.
* Fix the main llama model.
* Infer the proper gqa from the model kind.
2023-08-22 19:41:10 +01:00
cc22d4db20
Put the transcribe token before the language one. ( #553 )
2023-08-22 16:46:34 +01:00
9bc811a247
Improve the aspect ratio handling on yolo-v8. ( #549 )
...
* Fix the aspect ratio handling in yolo-v8.
* Typo.
2023-08-22 14:55:33 +01:00
bb69d89e28
Move the yolo shared bits to a common place. ( #548 )
...
* Move the yolo shared bits to a common place.
* Share more code.
* Configurable thresholds.
2023-08-22 13:03:07 +01:00
20ce3e9f39
Sketch the yolo wasm example. ( #546 )
...
* Sketch the yolo wasm example.
* Web ui.
* Get the web ui to work.
* UI tweaks.
* More UI tweaks.
* Use the natural width/height.
* Add a link to the hf space in the readme.
2023-08-22 11:56:43 +01:00
44420d8ae1
Add some llama-v2 variants. ( #545 )
2023-08-22 08:35:15 +01:00
f16bb97401
Use the yolo-v8 weights from the hub. ( #544 )
...
* Use the weights from the hub.
* Add to the readme.
2023-08-21 22:07:36 +01:00
3507e14c0c
Yolo v8 fixes ( #542 )
...
* Fixes for the yolo-v8 layout.
* Bugfixes.
* Another silly bugfix.
* Remove the hf-hub dependency.
* Remove the transformers dependency.
2023-08-21 21:05:40 +01:00
de50e66af1
Add yolo v8 as an example ( #541 )
...
* Sketching yolo-v8.
* Get the model to load.
* yolo-v8 forward pass.
* Complete(?) the forward pass.
* Fix some shape issues.
* Add the missing padding.
* Process the predictions.
2023-08-21 18:40:09 +01:00
cc2d6cf2e0
Improve the timestamps support in whisper ( #539 )
...
* Timestamp support for whisper.
* Properly display the timestamps.
* Bugfix for the timestamp units.
2023-08-21 12:26:59 +01:00
e3b71851e6
Retrieve the yolo-v3 weights from the hub. ( #537 )
2023-08-21 10:55:09 +01:00
4300864ce9
Add some optional repeat penalty. ( #535 )
2023-08-21 09:59:13 +01:00
11c7e7bd67
Some fixes for yolo-v3. ( #529 )
...
* Some fixes for yolo-v3.
* Use the running stats for inference in the batch-norm layer.
* Get some proper predictions for yolo.
* Avoid the quadratic insertion.
2023-08-20 23:19:15 +01:00
a1812f934f
Add a yolo-v3 example. ( #528 )
...
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
2023-08-20 18:19:37 +01:00
aa207f2dd9
Print some per-step timings in stable-diffusion. ( #520 )
...
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
* Add some timings to stable-diffusion.
2023-08-20 05:45:12 +01:00
d73ca3d28e
Line up the llama.cpp implementation with the candle one. ( #518 )
...
* Separate the prompt stats from the post-prompt ones in the quantized example.
* Slightly nicer output printing.
* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
b64e782c2d
Use the hub to retrieve dinov2 model weights. ( #507 )
2023-08-18 18:27:31 +01:00
e5dd5fd1b3
Print the recognized categories in dino-v2. ( #506 )
2023-08-18 17:32:58 +01:00
cb069d6063
Add the permute op (similar to pytorch). ( #504 )
...
* Add the permute op (similar to pytorch).
* Add the backprop for dimension permutation.
2023-08-18 16:30:53 +01:00
4f1541526c
dinov2 - read images from disk and compute the class probabilities ( #503 )
...
* Load the image from disk and convert it to a tensor.
* Tweak the function name.
2023-08-18 15:50:33 +01:00
95462c6a2e
Add a vision transformer example (dino-v2). ( #502 )
...
* Add a vision transformer example (dino-v2).
* Add some documentation + test.
* CI fix.
* Another fix (still unable to replicate the errors locally :( )
2023-08-18 11:58:06 +01:00
c78ce76501
Add a simple Module trait and implement it for the various nn layers ( #500 )
...
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
13401df4d1
Add an abstract type for RmsNorm. ( #499 )
2023-08-18 08:52:14 +01:00
26fd37b348
Use the main branch of the HF repo where possible. ( #498 )
...
* Use the main branch of the HF repo where possible.
* And add the large model.
2023-08-18 08:18:30 +01:00
f056dcab21
Add medium model ( #497 )
2023-08-18 08:08:59 +01:00
557b2c28dd
Q6K quantization ( #495 )
...
* Print the detected arch options.
* Add the q6k quantization.
* Add a currently broken test.
* Bugfix.
* Bugfix.
* Another bugfix.
* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
3164cd24fa
Replicate the sot-token logic from the Python implementation more acc… ( #491 )
...
* Replicate the sot-token logic from the Python implementation more accurately.
* Add a flag to control the timestamp mode.
2023-08-17 16:59:36 +01:00
5f30c1e1e0
Add the whisper small model. ( #490 )
2023-08-17 15:48:34 +01:00
ad7c53953b
Add a verbose-prompt mode, similar to llama.cpp. ( #489 )
2023-08-17 15:26:44 +01:00
5d99026fd2
F16 support for stable diffusion ( #488 )
...
* F16 support for stable diffusion.
* Keep the attention bits in F32.
* Keep more of the attention bits in F32.
* More mixed precision support.
2023-08-17 13:48:56 +01:00
c3176f0dfb
Flash-attention support in stable diffusion ( #487 )
...
* Add flash-attention for the stable-diffusion example.
* Change the dtype.
* Silly fix.
* Another fix.
* Revert the dtype back to the query dtype after apply flash-attn.
2023-08-17 12:16:40 +01:00
03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
d32e8199cd
Layer norm tweaks ( #482 )
...
* Add some options to make layer-norm more configurable.
* Add the rms-norm variant.
* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
d99cac3ec3
Move the avx specific bits to a separate file. ( #481 )
2023-08-17 09:01:06 +01:00
098909de40
Add vecdot for q6k-q8k. ( #476 )
...
* Add vecdot for q6k-q8k.
* Add some testing for q8k.
* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
c5f45887dc
Add some tracing to the quantized example. ( #473 )
2023-08-16 18:49:08 +01:00
fa4590d7fd
Merge pull request #469 from huggingface/fix_llama_v1
...
Fixing llamav1
2023-08-16 17:47:40 +02:00
2e206e269d
Add the model argument. ( #471 )
2023-08-16 16:41:06 +01:00
575e88a999
Add a quantized test that use negative values. ( #470 )
...
* Add a quantized test that use negative values.
* Add a default tokenizer.
2023-08-16 16:32:58 +01:00
a9101700b6
Add a kv-cache to the quantized llama example. ( #466 )
...
* Add a kv-cache to the quantized llama example.
* Also print the prompt.
* Bugfix in q6k dequantizing.
* Another bugfix.
2023-08-16 14:28:42 +01:00
102fa4c2e3
Fixing llamav1
2023-08-16 14:53:29 +02:00
3071134788
Get the ggml based llama to generate some text. ( #464 )
...
* Add more stats to the ggml example.
* Build a quantized model from the file content.
* Move the tensor retrieval in the main crate.
* Start adding the forward pass.
* Add more to the forward pass of the quantized llama.
* Apply the attention layers.
* Add the sampling loop.
* Get the sampling loop to work.
* Minor tweak.
* Add a quantize/dequantize test.
* Bugfix.
* Add a comment + swap the order.
* Bugfixes.
2023-08-16 12:41:07 +01:00
33c882ea74
Clippy.
2023-08-16 10:41:00 +02:00
76804730c6
Using the real config from the hub when available.
2023-08-16 10:36:01 +02:00
ca449f9ee1
Add quantized tensors. ( #458 )
...
* Add quantized tensors.
* Implement the debug trait for QTensor.
* Add the QMatMul custom op.
2023-08-15 22:45:53 +01:00
b8263aa15c
Quantized support for f16 and f32 ( #457 )
...
* Add f32 as a quantized type.
* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4
Split out the quantized file. ( #456 )
2023-08-15 20:26:27 +01:00
5b1690fffa
Tweak the llama example. ( #450 )
2023-08-15 12:18:20 +01:00
3cc87058b7
Support local weights & dynamic outputs ( #447 )
...
* Support local weights & dynamic outputs
* Revise as suggested
* Cargo code format
2023-08-15 11:51:57 +01:00
c84883ecf2
Add a cuda kernel for upsampling. ( #441 )
...
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00