Few fixes.
Going back on remote metal-rs.
Reusing a single buffer (for now) to speed things up.
Adding some half kernels.
All tests are panicking instead of random failure.
Putting back f16 index select.
Add erf.
Working version for llama2-c.
Fixes + cache compute_pipeline_state.
BF16 metal fix.
Remove some prints.
new_owned -> new()..to_owned().
Better batched matmul.
Metal operational.
Reuse buffers on our own reference counts.
Tmp gemm.
Revert "Tmp gemm."
This reverts commit c65f68e988.
Interleave committing.
Speeding up copies using blit.
Fmt.
Fmt.
Remove the assert!
Fmt all.
Fixes after big rebase.
Add softmax for half and bfloat + tests
Fixing Llama example + accumulate softmax in float.
* add bce with logit loss
* add bce with logit loss
* remove imports
* fix tiny bug
* add test documentation and refactor function
* fix test cases and formatting
* distilbet files
* Apply various cleanups.
* More cleanups.
* More polish.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Fix linspace implementation
`steps` should be strictly greater than 1 to make it consistent with the context.
* Handle steps == 0 and steps == 1.
* Fix rustfmt.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add OpenChat to quantized examples
* Add chat prompt
* Make the openchat example more in line with the other models.
* Fix a typo.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
Updating the readme to coincide with other examples. If you try to run it as previously written, you will get a "cannot find the path specified" error.
* Add support to UL2 model family
* Update docs with UL2
* Create ActivationWithOptionalGating to avoid polluting activations
* Also refactor quantized t5
* Remove useless conversion
* Revert Activation::NewGelu name change
* Remove useless return
* Apply rustfmt and clippy recommendations
* Reuse t5::ActivationWithOptionalGating in quantized version
* (cosmetic change) use a match rather than ifs + avoid early returns.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* add bce with logit loss
* add bce with logit loss
* remove imports
* fix tiny bug
* add test documentation and refactor function
* fix test cases and formatting
* add trocr model
* fix formatting
* commit the actual model lol
* more formatting
* remove tokenizer config
* Support the shape op in ONNX.
* Share the axis normalization bits.
* Add some limited support for gather.
* Unsqueeze.
* Comparison with broadcasting.
* Add Not + handle i32.
* Tweaks for the quantized model.
* Support the shape op in ONNX.
* Share the axis normalization bits.
* Add some limited support for gather.
* Unsqueeze.
* Comparison with broadcasting.
* Add Not + handle i32.
* Adds check for 7b-zephyr and uses correct template
* Handle zephyr as mistral.
* Disable the protoc bits of the CI.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Add more models to the onnx example.
* Input validation.
* Input validation.
* Bugfix.
* Implement clip.
* BatchNorm support.
* Get the efficientnet onnx to work.
* Negative and `*args` shape handling
* Rename to `PyShapeWithHole` + validate that only one hole exists
* Regenerate stubs
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Skeleton files for the marian MT model.
* Marian initialization.
* Implement the attention forward method.
* Forward pass for the encoder side.
* Expose the encoder and decoder.
* Start plugging the decoder.
* Forward pass for the decoder layer.
* Set up the marian example.
* Add some missing backtraces.
* Bugfix.
* feat: implement VGG13, VGG16 and VGG19
* Cosmetic fixes.
* More cosmetic tweaks + avoid re-loading the weights on each final layer.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Fix Gym wrapper
- It was returning things in the wrong order
- Gym now differentiates between terminated and truncated
* Add DDPG
* Apply fixes
* Remove Result annotations
* Also remove Vec annotation
* rustfmt
* Various small improvements (avoid cloning, mutability, get clippy to pass, ...)
---------
Co-authored-by: Travis Hammond <travis.hammond@alexanderthamm.com>
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Add the jina-bert model.
* Use alibi.
* Remove the unused pragma.
* Recompute the alibi embeddings.
* Generate the token type ids.
* Use the module trait.
* Add the jina-bert example.
* DType fix.
* Get the inference to work.