a910ec5993
CustomOp for einsum.
2023-09-08 20:46:30 +01:00
acf8f10ae1
Get the comparison operation to work on scalar values. ( #780 )
...
* Get the comparison operation to work on scalar values.
* Add some time measurement.
2023-09-08 20:13:29 +01:00
0906acab91
Automatic mask generation ( #779 )
...
* A few more contiguous fixes for cuda.
* Mask generation.
* Generic bbox.
* Generate all the masks.
2023-09-08 19:11:34 +01:00
158ff3c609
Add tracing to segment-anything ( #777 )
...
* Tracing support for segment-anything.
* More tracing.
* Handle the empty slice case.
2023-09-08 15:31:29 +01:00
e5703d2f56
Draw the mask on a merged image. ( #775 )
...
* Draw the mask on a merged image.
* Clippy fix.
* Enable the target point by default.
* Add to the readme.
2023-09-08 14:04:34 +01:00
98172d46fa
Fix some errors about BlockQ8_1 ( #776 )
...
* use int8 type instead of uint8 for BlockQ8_1.qs
The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ4_1
Ref: ebc96086af/ggml.c (L2840)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ5_1
Ref: ebc96086af/ggml.c (L3490)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
---------
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
2023-09-08 13:29:40 +01:00
28c87f6a34
Automatic mask generator + point base mask ( #773 )
...
* Add more to the automatic mask generator.
* Add the target point.
* Fix.
* Remove the allow-unused.
* Mask post-processing.
2023-09-08 12:26:56 +01:00
c1453f00b1
Improve the safetensor loading in the segment-anything example. ( #772 )
...
* Improve the safetensor loading in the segment-anything example.
* Properly handle the labels when embedding the point prompts.
2023-09-08 09:39:10 +01:00
989a4807b1
Use shape with holes. ( #771 )
2023-09-08 08:50:27 +01:00
0e250aee4f
Shape with holes ( #770 )
...
* Shape with holes.
* rustfmt.
2023-09-08 08:38:13 +01:00
cfcbec9fc7
Add small customization to the build ( #768 )
...
* Add ability to override the compiler used by NVCC from an environment variable
* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR
* Add the compilation failure to the readme, with a possible solution
* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
3898e500de
Generate a mask image + the scaled input image. ( #769 )
...
* Also round-trip the original image.
* Make it possible to use a safetensors input.
2023-09-08 05:53:08 +01:00
79c27fc489
Segment-anything fixes: avoid normalizing twice. ( #767 )
...
* Segment-anything fixes: avoid normalizing twice.
* More fixes for the image aspect ratio.
2023-09-07 21:45:16 +01:00
7396b8ed1a
Segment Anything - process images ( #766 )
...
* Start processing images.
* Add LayerNorm2d.
* Properly use LayerNorm2d.
* Tweak eps.
* Use LayerNorm on inputs with a rank different from 3.
* Window partitioning.
* Fix a couple todos.
* More todos.
* Hard-code the einsums.
* More padding support.
* Some sizes tweaks.
* Use the hub to get the weights.
* Use a batch matmul.
* Tweaks.
* More fixes.
* Get some predictions to be generated.
2023-09-07 19:22:45 +01:00
7b50f3e106
More segment-anything again. ( #764 )
...
* More segment-anything again.
* Transformer block forward.
* Two-ways transformer.
* Position embeddings.
* Sketch the prompt encoder.
* More prompt-encoder.
* More prompt-encoder.
* Add the main sam module.
* Embed the transformer.
* And hook the transformer forward step.
* Build the model.
* Handle the global attn indexes.
* Get the model to load.
2023-09-07 12:06:55 +01:00
8c991df394
More segment-anything. ( #763 )
...
* More segment-anything.
* Split the model in multiple files.
* Start adding the transformer.
* Add the attention block.
* Move the MLP Block.
2023-09-07 07:28:30 +01:00
000fa00e31
Expose the conv2d-transpose layers. ( #761 )
2023-09-07 06:04:52 +01:00
a17a7c42c1
Add a nn layer for conv-transpose2d. ( #760 )
2023-09-07 05:47:28 +01:00
6527ab81a3
Sketch the segment anything model. ( #759 )
...
* Sketch the segment anything model.
* Fix some clippy lint.
* Add the mask decoder.
2023-09-07 05:34:05 +01:00
7b1f2da828
Cudnn fix. ( #758 )
2023-09-06 17:39:39 +01:00
bdc9d46fe3
Use an arc in the varbuilder rather than rc. ( #757 )
...
* Use an arc in the varbuilder rather than rc.
* Require the backends to be send.
* Request send and sync.
2023-09-06 15:29:09 +01:00
dcf708559d
Fix for cudnn to work with img2img. ( #753 )
2023-09-06 07:49:28 +01:00
7299a68353
img2img pipeline for stable diffusion. ( #752 )
...
* img2img pipeline for stable diffusion.
* Rename the arguments + fix.
* Fix for zero strength.
* Another fix.
* Another fix.
* Revert.
* Include the backtrace.
* Noise scaling.
* Fix the height/width.
2023-09-06 07:06:49 +01:00
16bf44f6e9
force model cache ( #751 )
2023-09-06 05:53:31 +02:00
a4f40f3dc8
Use rayon directly rather than constraining the number of threads. ( #749 )
2023-09-05 20:26:15 +01:00
6a40decc76
Minor WASM UI improvements ( #748 )
...
* add stats
* random seed btn
* minor ui improvoments
2023-09-05 19:24:43 +01:00
a0d65585db
Softmax implementation for cuda. ( #747 )
2023-09-05 18:38:03 +01:00
94c6a8d3d3
Add a dedicated cuda kernel for softmax. ( #746 )
2023-09-05 17:53:20 +02:00
6615daf242
Tweaks to softmax. ( #745 )
2023-09-05 15:22:27 +01:00
1c9e5394a5
Add a custom softmax implementation. ( #744 )
...
* Add a custom softmax implementation.
* Add softmaxlastdim to the benchmarks.
* And add a test.
* Support more dtypes.
* Polish the code.
* Use the slow implementation on cuda.
* Add a todo for the cuda kernel.
2023-09-05 14:20:23 +01:00
a8410bf35e
Add some documentation. ( #743 )
2023-09-05 09:51:12 +01:00
cda45a7443
Let outside CustomOp2 implementations use binary_map/binary_map_vec ( #741 )
2023-09-05 09:27:32 +01:00
4698eb5cb6
Fix typo in the nll function document ( #742 )
2023-09-05 09:25:11 +01:00
000487c36f
Add a python function to save as safetensors. ( #740 )
2023-09-04 20:32:14 +01:00
ab0d9fbdd1
Properly set the is_bf16 flag. ( #738 )
2023-09-04 16:45:26 +01:00
f80fd44201
BF16 support for flash-attn. ( #737 )
2023-09-04 16:35:43 +01:00
0d00c06a83
Fix clippy lint. ( #736 )
2023-09-04 16:09:19 +01:00
8395152d20
Llama2c WASM UI improvements ( #732 )
...
* pass seed, expose model seq_len
* wip new llama2.c ui
* final new UI example
* small coppy
* copy
2023-09-04 15:59:22 +01:00
e2f9f60ac2
Avoid some redundant clone. ( #731 )
2023-09-04 09:18:32 +02:00
d0cdea95a5
Add back the bf16 flash-attn kernels. ( #730 )
2023-09-04 07:50:52 +01:00
20512ba408
Return the metadata in the gguf pyo3 bindings. ( #729 )
...
* Return the metadata in the gguf pyo3 bindings.
* Read the metadata in the quantized llama example.
* Get inference to work on gguf files.
2023-09-04 07:07:00 +01:00
9c61b0fc9b
Proper log buckets for t5. ( #727 )
...
* Proper log buckets for t5.
* Properly pass the position bias.
2023-09-03 20:33:50 +01:00
26cd266e65
Musicgen text embeddings. ( #726 )
...
* Musicgen text embeddings.
* Bugfix for layer norm.
* Proper position bias.
* Expose the weights.
2023-09-03 18:27:48 +01:00
bbec527bb9
Fix the musicgen example. ( #724 )
...
* Fix the musicgen example.
* Retrieve the weights from the hub.
2023-09-03 14:50:39 +01:00
f7980e07e0
Add ggufv2
support ( #725 )
2023-09-03 14:41:57 +01:00
74a82c358a
Add the mse loss. ( #723 )
2023-09-03 10:51:40 +01:00
84d003ff53
Handle arbitrary shapes in Tensor::new. ( #718 )
2023-09-02 19:59:21 +01:00
21109e1983
Recommend using maturin. ( #717 )
2023-09-02 16:19:35 +01:00
ad796eb4be
More quantized llama in python. ( #716 )
...
* More quantized llama in python.
* Expose a couple more functions.
* Apply the last layer.
* Use the vocab from the ggml files.
2023-09-02 13:41:48 +01:00
e8e33752f4
Sketch a quantized llama using the pyo3 api. ( #715 )
...
* Sketch a quantized llama using the pyo3 api.
* Add more ops.
* Expose a few more functions to use in the quantized model.
* Rope embeddings.
* Get the forward pass to work.
2023-09-02 11:26:05 +01:00