Commit Graph

1124 Commits

Author SHA1 Message Date
28c87f6a34 Automatic mask generator + point base mask (#773)
* Add more to the automatic mask generator.

* Add the target point.

* Fix.

* Remove the allow-unused.

* Mask post-processing.
2023-09-08 12:26:56 +01:00
c1453f00b1 Improve the safetensor loading in the segment-anything example. (#772)
* Improve the safetensor loading in the segment-anything example.

* Properly handle the labels when embedding the point prompts.
2023-09-08 09:39:10 +01:00
989a4807b1 Use shape with holes. (#771) 2023-09-08 08:50:27 +01:00
0e250aee4f Shape with holes (#770)
* Shape with holes.

* rustfmt.
2023-09-08 08:38:13 +01:00
cfcbec9fc7 Add small customization to the build (#768)
* Add ability to override the compiler used by NVCC from an environment variable

* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR

* Add the compilation failure to the readme, with a possible solution

* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
3898e500de Generate a mask image + the scaled input image. (#769)
* Also round-trip the original image.

* Make it possible to use a safetensors input.
2023-09-08 05:53:08 +01:00
79c27fc489 Segment-anything fixes: avoid normalizing twice. (#767)
* Segment-anything fixes: avoid normalizing twice.

* More fixes for the image aspect ratio.
2023-09-07 21:45:16 +01:00
7396b8ed1a Segment Anything - process images (#766)
* Start processing images.

* Add LayerNorm2d.

* Properly use LayerNorm2d.

* Tweak eps.

* Use LayerNorm on inputs with a rank different from 3.

* Window partitioning.

* Fix a couple todos.

* More todos.

* Hard-code the einsums.

* More padding support.

* Some sizes tweaks.

* Use the hub to get the weights.

* Use a batch matmul.

* Tweaks.

* More fixes.

* Get some predictions to be generated.
2023-09-07 19:22:45 +01:00
7b50f3e106 More segment-anything again. (#764)
* More segment-anything again.

* Transformer block forward.

* Two-ways transformer.

* Position embeddings.

* Sketch the prompt encoder.

* More prompt-encoder.

* More prompt-encoder.

* Add the main sam module.

* Embed the transformer.

* And hook the transformer forward step.

* Build the model.

* Handle the global attn indexes.

* Get the model to load.
2023-09-07 12:06:55 +01:00
8c991df394 More segment-anything. (#763)
* More segment-anything.

* Split the model in multiple files.

* Start adding the transformer.

* Add the attention block.

* Move the MLP Block.
2023-09-07 07:28:30 +01:00
000fa00e31 Expose the conv2d-transpose layers. (#761) 2023-09-07 06:04:52 +01:00
a17a7c42c1 Add a nn layer for conv-transpose2d. (#760) 2023-09-07 05:47:28 +01:00
6527ab81a3 Sketch the segment anything model. (#759)
* Sketch the segment anything model.

* Fix some clippy lint.

* Add the mask decoder.
2023-09-07 05:34:05 +01:00
7b1f2da828 Cudnn fix. (#758) 2023-09-06 17:39:39 +01:00
bdc9d46fe3 Use an arc in the varbuilder rather than rc. (#757)
* Use an arc in the varbuilder rather than rc.

* Require the backends to be send.

* Request send and sync.
2023-09-06 15:29:09 +01:00
dcf708559d Fix for cudnn to work with img2img. (#753) 2023-09-06 07:49:28 +01:00
7299a68353 img2img pipeline for stable diffusion. (#752)
* img2img pipeline for stable diffusion.

* Rename the arguments + fix.

* Fix for zero strength.

* Another fix.

* Another fix.

* Revert.

* Include the backtrace.

* Noise scaling.

* Fix the height/width.
2023-09-06 07:06:49 +01:00
16bf44f6e9 force model cache (#751) 2023-09-06 05:53:31 +02:00
a4f40f3dc8 Use rayon directly rather than constraining the number of threads. (#749) 2023-09-05 20:26:15 +01:00
6a40decc76 Minor WASM UI improvements (#748)
* add stats

* random seed btn

* minor ui improvoments
2023-09-05 19:24:43 +01:00
a0d65585db Softmax implementation for cuda. (#747) 2023-09-05 18:38:03 +01:00
94c6a8d3d3 Add a dedicated cuda kernel for softmax. (#746) 2023-09-05 17:53:20 +02:00
6615daf242 Tweaks to softmax. (#745) 2023-09-05 15:22:27 +01:00
1c9e5394a5 Add a custom softmax implementation. (#744)
* Add a custom softmax implementation.

* Add softmaxlastdim to the benchmarks.

* And add a test.

* Support more dtypes.

* Polish the code.

* Use the slow implementation on cuda.

* Add a todo for the cuda kernel.
2023-09-05 14:20:23 +01:00
a8410bf35e Add some documentation. (#743) 2023-09-05 09:51:12 +01:00
cda45a7443 Let outside CustomOp2 implementations use binary_map/binary_map_vec (#741) 2023-09-05 09:27:32 +01:00
4698eb5cb6 Fix typo in the nll function document (#742) 2023-09-05 09:25:11 +01:00
000487c36f Add a python function to save as safetensors. (#740) 2023-09-04 20:32:14 +01:00
ab0d9fbdd1 Properly set the is_bf16 flag. (#738) 2023-09-04 16:45:26 +01:00
f80fd44201 BF16 support for flash-attn. (#737) 2023-09-04 16:35:43 +01:00
0d00c06a83 Fix clippy lint. (#736) 2023-09-04 16:09:19 +01:00
8395152d20 Llama2c WASM UI improvements (#732)
* pass seed, expose model seq_len

* wip new llama2.c ui

* final new UI example

* small coppy

* copy
2023-09-04 15:59:22 +01:00
e2f9f60ac2 Avoid some redundant clone. (#731) 2023-09-04 09:18:32 +02:00
d0cdea95a5 Add back the bf16 flash-attn kernels. (#730) 2023-09-04 07:50:52 +01:00
20512ba408 Return the metadata in the gguf pyo3 bindings. (#729)
* Return the metadata in the gguf pyo3 bindings.

* Read the metadata in the quantized llama example.

* Get inference to work on gguf files.
2023-09-04 07:07:00 +01:00
9c61b0fc9b Proper log buckets for t5. (#727)
* Proper log buckets for t5.

* Properly pass the position bias.
2023-09-03 20:33:50 +01:00
26cd266e65 Musicgen text embeddings. (#726)
* Musicgen text embeddings.

* Bugfix for layer norm.

* Proper position bias.

* Expose the weights.
2023-09-03 18:27:48 +01:00
bbec527bb9 Fix the musicgen example. (#724)
* Fix the musicgen example.

* Retrieve the weights from the hub.
2023-09-03 14:50:39 +01:00
f7980e07e0 Add ggufv2 support (#725) 2023-09-03 14:41:57 +01:00
74a82c358a Add the mse loss. (#723) 2023-09-03 10:51:40 +01:00
84d003ff53 Handle arbitrary shapes in Tensor::new. (#718) 2023-09-02 19:59:21 +01:00
21109e1983 Recommend using maturin. (#717) 2023-09-02 16:19:35 +01:00
ad796eb4be More quantized llama in python. (#716)
* More quantized llama in python.

* Expose a couple more functions.

* Apply the last layer.

* Use the vocab from the ggml files.
2023-09-02 13:41:48 +01:00
e8e33752f4 Sketch a quantized llama using the pyo3 api. (#715)
* Sketch a quantized llama using the pyo3 api.

* Add more ops.

* Expose a few more functions to use in the quantized model.

* Rope embeddings.

* Get the forward pass to work.
2023-09-02 11:26:05 +01:00
dabaa479b9 Update README.md (#714) 2023-09-02 07:56:12 +01:00
2c1df6bba1 Add a repeat penality to the llama2-c command line example. (#713)
* Add a repeat penality to the llama2-c command line example.

* Another fix attempt.
2023-09-01 20:38:58 +01:00
4d56cef583 Handle the empty sequence case properly. (#712)
* Handle the empty sequence case properly.

* Proper fix.
2023-09-01 20:12:30 +01:00
19042962d5 Whisper fix (#711)
* Remove unnecessary file.

* Whisper fix.
2023-09-01 20:04:07 +01:00
731e3ffb03 Remove unnecessary file. (#710) 2023-09-01 19:42:23 +01:00
2fef14cb14 Add a repeat penalty to the llama2.c wasm example. (#709) 2023-09-01 19:32:28 +01:00