44420d8ae1
Add some llama-v2 variants. ( #545 )
2023-08-22 08:35:15 +01:00
f16bb97401
Use the yolo-v8 weights from the hub. ( #544 )
...
* Use the weights from the hub.
* Add to the readme.
2023-08-21 22:07:36 +01:00
3507e14c0c
Yolo v8 fixes ( #542 )
...
* Fixes for the yolo-v8 layout.
* Bugfixes.
* Another silly bugfix.
* Remove the hf-hub dependency.
* Remove the transformers dependency.
2023-08-21 21:05:40 +01:00
de50e66af1
Add yolo v8 as an example ( #541 )
...
* Sketching yolo-v8.
* Get the model to load.
* yolo-v8 forward pass.
* Complete(?) the forward pass.
* Fix some shape issues.
* Add the missing padding.
* Process the predictions.
2023-08-21 18:40:09 +01:00
cc2d6cf2e0
Improve the timestamps support in whisper ( #539 )
...
* Timestamp support for whisper.
* Properly display the timestamps.
* Bugfix for the timestamp units.
2023-08-21 12:26:59 +01:00
e3b71851e6
Retrieve the yolo-v3 weights from the hub. ( #537 )
2023-08-21 10:55:09 +01:00
4300864ce9
Add some optional repeat penalty. ( #535 )
2023-08-21 09:59:13 +01:00
d70cffdab6
Fix the minimum/maximum gradient computations. ( #534 )
2023-08-21 08:28:41 +01:00
912561614f
Better handling of zero temperatures. ( #532 )
2023-08-21 07:51:46 +01:00
8c232d706b
Small tweaks to the pickle handling to be able to use libtorch files. ( #530 )
...
* Small tweaks to the pickle handling to be able to use libtorch files.
* Move the pytorch specific bits in a different function.
2023-08-20 23:25:34 +01:00
11c7e7bd67
Some fixes for yolo-v3. ( #529 )
...
* Some fixes for yolo-v3.
* Use the running stats for inference in the batch-norm layer.
* Get some proper predictions for yolo.
* Avoid the quadratic insertion.
2023-08-20 23:19:15 +01:00
a1812f934f
Add a yolo-v3 example. ( #528 )
...
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
2023-08-20 18:19:37 +01:00
e3d2786ffb
Add a couple functions required for yolo. ( #527 )
2023-08-20 17:02:05 +01:00
372f8912c5
Minor readme tweaks. ( #526 )
2023-08-20 14:33:21 +01:00
d2622a8160
Move the VarMap to a separate file ( #525 )
...
* Move the var-map struct in a separate file.
* Fix some typos.
2023-08-20 14:25:07 +01:00
2fcb386f17
Add a broadcast variant to matmul. ( #523 )
...
* Add a broadcast variant to matmul.
* Get the test to pass.
2023-08-20 13:20:42 +01:00
a8f61e66cc
Bump the crates version to 0.1.2. ( #522 )
2023-08-20 08:07:07 +01:00
aa207f2dd9
Print some per-step timings in stable-diffusion. ( #520 )
...
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
* Add some timings to stable-diffusion.
2023-08-20 05:45:12 +01:00
82410995a2
Neon support for quantization. ( #519 )
...
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
2023-08-19 22:07:29 +01:00
d73ca3d28e
Line up the llama.cpp implementation with the candle one. ( #518 )
...
* Separate the prompt stats from the post-prompt ones in the quantized example.
* Slightly nicer output printing.
* Line up with the llama.cpp implementation.
2023-08-19 20:12:07 +01:00
551409092e
Small tweaks to tensor-tools. ( #517 )
2023-08-19 16:50:26 +01:00
6431140250
Retrieve tensor data from PyTorch files. ( #516 )
2023-08-19 15:57:18 +01:00
607ffb9f1e
Retrieve more information from PyTorch checkpoints. ( #515 )
...
* Retrieve more information from PyTorch checkpoints.
* Add enough support to load dino-v2 backbone weights.
2023-08-19 15:05:34 +01:00
f861a9df6e
Add ggml support to tensor-tools ( #512 )
...
* Pickle work-in-progress.
* More unpickling.
* More pickling.
* Proper handling of setitems.
* Clippy.
* Again more pickling.
* Restore the example.
* Add enough pickle support to get the list of tensors.
* Read the data from zip files.
* Retrieve the tensor shape.
* Extract the size and dtype.
* More storage types.
* Improve the destructuring.
* Also support ggml files.
2023-08-19 11:45:22 +01:00
ad33715c61
Preliminary support for importing PyTorch weights. ( #511 )
...
* Pickle work-in-progress.
* More unpickling.
* More pickling.
* Proper handling of setitems.
* Clippy.
* Again more pickling.
* Restore the example.
* Add enough pickle support to get the list of tensors.
* Read the data from zip files.
* Retrieve the tensor shape.
* Extract the size and dtype.
* More storage types.
* Improve the destructuring.
2023-08-19 11:26:32 +01:00
90ff04e77e
Add the tensor-tools binary. ( #510 )
2023-08-19 09:06:44 +01:00
42e1cc8062
Add a batch normalization layer ( #508 )
...
* Add BatchNormalization.
* More batch-norm.
* Add some validation of the inputs.
* More validation.
2023-08-18 20:05:56 +01:00
b64e782c2d
Use the hub to retrieve dinov2 model weights. ( #507 )
2023-08-18 18:27:31 +01:00
e5dd5fd1b3
Print the recognized categories in dino-v2. ( #506 )
2023-08-18 17:32:58 +01:00
cb069d6063
Add the permute op (similar to pytorch). ( #504 )
...
* Add the permute op (similar to pytorch).
* Add the backprop for dimension permutation.
2023-08-18 16:30:53 +01:00
4f1541526c
dinov2 - read images from disk and compute the class probabilities ( #503 )
...
* Load the image from disk and convert it to a tensor.
* Tweak the function name.
2023-08-18 15:50:33 +01:00
95462c6a2e
Add a vision transformer example (dino-v2). ( #502 )
...
* Add a vision transformer example (dino-v2).
* Add some documentation + test.
* CI fix.
* Another fix (still unable to replicate the errors locally :( )
2023-08-18 11:58:06 +01:00
b9661a1c25
Enable the image crate by default in examples ( #501 )
...
* Enable the image crate by default so that it's easier to compile the stable diffusion example.
* Also update the readme.
2023-08-18 10:00:05 +01:00
109e95b189
Basic qmatmul
parallelization ( #492 )
...
* Basic `par_iter` parallelization
* Pass errors up
* Disable `avx` for x86 macs
2023-08-18 09:45:37 +01:00
c78ce76501
Add a simple Module trait and implement it for the various nn layers ( #500 )
...
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
13401df4d1
Add an abstract type for RmsNorm. ( #499 )
2023-08-18 08:52:14 +01:00
a22b1bed7b
Tensor -> QTensor conversion ( #496 )
...
* Sketch some qmatmul test.
* Add the quantization function.
* More testing.
* Make the test smaller and faster.
* Add some shape checking.
2023-08-18 08:19:20 +01:00
26fd37b348
Use the main branch of the HF repo where possible. ( #498 )
...
* Use the main branch of the HF repo where possible.
* And add the large model.
2023-08-18 08:18:30 +01:00
f056dcab21
Add medium model ( #497 )
2023-08-18 08:08:59 +01:00
557b2c28dd
Q6K quantization ( #495 )
...
* Print the detected arch options.
* Add the q6k quantization.
* Add a currently broken test.
* Bugfix.
* Bugfix.
* Another bugfix.
* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
fc81af1712
AVX version of the q6k vec-dot. ( #493 )
...
* AVX version of the q6k vec-dot.
* Use the avx sum.
2023-08-17 20:13:18 +01:00
3164cd24fa
Replicate the sot-token logic from the Python implementation more acc… ( #491 )
...
* Replicate the sot-token logic from the Python implementation more accurately.
* Add a flag to control the timestamp mode.
2023-08-17 16:59:36 +01:00
5f30c1e1e0
Add the whisper small model. ( #490 )
2023-08-17 15:48:34 +01:00
ad7c53953b
Add a verbose-prompt mode, similar to llama.cpp. ( #489 )
2023-08-17 15:26:44 +01:00
5d99026fd2
F16 support for stable diffusion ( #488 )
...
* F16 support for stable diffusion.
* Keep the attention bits in F32.
* Keep more of the attention bits in F32.
* More mixed precision support.
2023-08-17 13:48:56 +01:00
c3176f0dfb
Flash-attention support in stable diffusion ( #487 )
...
* Add flash-attention for the stable-diffusion example.
* Change the dtype.
* Silly fix.
* Another fix.
* Revert the dtype back to the query dtype after apply flash-attn.
2023-08-17 12:16:40 +01:00
03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
d32e8199cd
Layer norm tweaks ( #482 )
...
* Add some options to make layer-norm more configurable.
* Add the rms-norm variant.
* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
d99cac3ec3
Move the avx specific bits to a separate file. ( #481 )
2023-08-17 09:01:06 +01:00
f708efb19c
Add some accelerate details on the readme. ( #480 )
2023-08-17 08:26:02 +01:00