- Most kernels just copy themselfs to get the shapes correct
- Matmul works only in 1 case and simply empty allocates otherwise
- Logits and randomized to make the demo finish itself.
Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal.
Couln't get it super high by removing the obvious blockers (println + the actual running matmuls).
Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.
* Negative and `*args` shape handling
* Rename to `PyShapeWithHole` + validate that only one hole exists
* Regenerate stubs
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Skeleton files for the marian MT model.
* Marian initialization.
* Implement the attention forward method.
* Forward pass for the encoder side.
* Expose the encoder and decoder.
* Start plugging the decoder.
* Forward pass for the decoder layer.
* Set up the marian example.
* Add some missing backtraces.
* Bugfix.
* feat: implement VGG13, VGG16 and VGG19
* Cosmetic fixes.
* More cosmetic tweaks + avoid re-loading the weights on each final layer.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Fix Gym wrapper
- It was returning things in the wrong order
- Gym now differentiates between terminated and truncated
* Add DDPG
* Apply fixes
* Remove Result annotations
* Also remove Vec annotation
* rustfmt
* Various small improvements (avoid cloning, mutability, get clippy to pass, ...)
---------
Co-authored-by: Travis Hammond <travis.hammond@alexanderthamm.com>
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Add the jina-bert model.
* Use alibi.
* Remove the unused pragma.
* Recompute the alibi embeddings.
* Generate the token type ids.
* Use the module trait.
* Add the jina-bert example.
* DType fix.
* Get the inference to work.
* add bce with logit loss
* add bce with logit loss
* remove imports
* fix tiny bug
* add test documentation and refactor function
* fix test cases and formatting