* Add the onnx protos.
* Move the reading bits.
* Install protoc on the CI.
* Install protoc on the cuda CI too.
* Use clap for the onnx tool.
* Tweak the CI protoc install.
* Add some simple evalution function.
* Add some binary operator support.
* add phi wasm module
* replace input with textarea
* trim input prompt
* stop on <|endoftext|>
* formatting
* clean up
* add blurb, and syntax highlighting
* add phi-v1.5 wasm
* add note
* hide Options on details
* add first token to generated text
* whitespaces for new line
* fix: abort -> aborted
* Use yoke to provide a self-referential container for mmaped safetensor files.
* Add the new self-owned type for safetensor files without removing the previous version.
* Add routing.
* Add an initializer for the case of multiple files.
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
* Sketch the yolo wasm example.
* Web ui.
* Get the web ui to work.
* UI tweaks.
* More UI tweaks.
* Use the natural width/height.
* Add a link to the hf space in the readme.
* Change distributions
Standard generates in [0, 1), Normal is correct.
* Add test
Not sure if this is the best place to put the test
* Remove unnecessary use
* Move the vision datasets to a separate crate.
* Move the batcher bits.
* Update the readme.
* Move the tiny-stories bits.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.
* More flash attn.
* Set up the flash attn parameters.
* Get things to compile locally.
* Move the flash attention files in a different directory.
* Build the static C library with nvcc.
* Add more flash attention.
* Update the build part.
* Better caching.
* Exclude flash attention from the default workspace.
* Put flash-attn behind a feature gate.
* Get the flash attn kernel to run.
* Move the flags to a more appropriate place.
* Enable flash attention in llama.
* Use flash attention in llama.