805bf9ffa7
Implement top_p / nucleus sampling ( #819 )
...
* Implement top_p / nucleus sampling
* Update changelog
* rustfmt
* Add tests
* Fix clippy warning
* Fix another clippy error
2023-09-12 18:10:16 +02:00
42da17694a
Segment Anything readme ( #827 )
...
* Add a readme for the segment-anything model.
* Add the original image.
* Clean-up the segment anything cli example.
* Also print the mask id in the outputs.
2023-09-12 14:35:55 +01:00
25aacda28e
Add useful libraries section ( #825 )
...
* Add useful libraries section
* Add link
2023-09-12 11:06:21 +01:00
7a62aad24a
Add a readme for yolo-v8. ( #824 )
2023-09-12 11:01:06 +01:00
bb23b90b1d
Add a small readme for the quantized example. ( #823 )
2023-09-12 10:17:31 +01:00
2257f4d475
Bump the crate version + update the changelog. ( #822 )
2023-09-12 06:39:24 +01:00
871efc0307
Bugfix for the conv2d cpu kernel. ( #820 )
2023-09-11 23:11:27 +01:00
c5a058b169
Use the module trait in stable-diffusion. ( #817 )
2023-09-11 20:40:07 +01:00
59e63d690c
Add weight, bias, and hidden_size methods ( #816 )
...
* Add weight, bias methods to Conv(1|2)
* Add hidden_size method to Embedding
* Expose hidden_size
2023-09-11 16:01:11 +01:00
dbd4561416
im2col version of the conv1d kernel. ( #815 )
...
* im2col version of the cuda conv1d kernel.
* im2col version of the conv1d cpu kernel.
2023-09-11 14:40:09 +01:00
5c35fbbb13
Stable-Diffusion readme ( #814 )
...
* Stable Diffusion readme.
* Fix the image path.
* Move the assets.
* Resize the sample image.
* Lower resolution.
2023-09-11 13:06:29 +01:00
70f38c2069
Proper error on unsupported dtypes when using gemm. ( #813 )
2023-09-11 12:10:51 +01:00
d7b9fec849
Move the stable-diffusion modeling code so that it's easier to re-use. ( #812 )
2023-09-11 11:45:57 +01:00
84ee870efd
Use softmax-last-dim in whisper. ( #810 )
2023-09-11 11:05:05 +01:00
df712ecf64
Handle the case where the kernel is not contiguous in the cuda backend. ( #809 )
2023-09-11 09:48:31 +01:00
6fb665004c
Enable im2col on the cpu side. ( #805 )
...
* Enable im2col on the cpu side.
* Hook im2col on the cpu backend.
* Use the kernel offset.
* Avoid an unnecessary copy.
* Handle non-contiguous kernels.
* Add a const to select the conv2d kernel.
2023-09-11 09:28:13 +01:00
1cd74129d4
Add Im2Col support on the gpu side. ( #808 )
...
* Add Im2Col support on the gpu side.
* Actually enable.
2023-09-11 08:52:33 +01:00
98d1242b8f
im2col based conv2d ( #802 )
...
* im2col implementation for conv2d.
* Fix for the im2col implementation to match the current conv2d.
* Small optimization.
* Add a cuda kernel.
* Handle arbitrary layouts.
* Im2Col cuda code.
2023-09-10 21:02:42 +01:00
18d6db2180
more doc fixes ( #804 )
2023-09-10 20:36:29 +01:00
4f18180fc7
Bugfix so that im2col produce the same results as conv2d. ( #801 )
2023-09-10 16:59:46 +01:00
559944146f
Add an im2col based benchmark. ( #800 )
...
* Add an im2col based benchmark.
* Reshape the final result.
2023-09-10 16:56:28 +01:00
3dd5804299
Fix typo in readme. ( #799 )
2023-09-10 13:49:47 +01:00
90e077e409
Return the low res mask in the wasm segment-anything module. ( #798 )
...
* Return the low res mask.
* Add some validations.
2023-09-10 13:03:02 +01:00
584171cae1
Add a wasm module for the segment anything example. ( #797 )
2023-09-10 12:29:37 +01:00
6c58fc59fd
Little docs changes ( #791 )
...
* Little doc fixes
* change imports in lib
* rename candle_core to candle
* revert "rename candle_core to candle"
2023-09-10 12:02:52 +01:00
35f72514f5
Move more models to candle-transformers ( #796 )
...
* Move dinov2.
* Move efficientnet.
* Move the quantized llama model.
* Move segment-anything.
2023-09-10 10:20:18 +01:00
d3f05eae8c
Move some models to candle-transformers so that it's easier to re-use. ( #794 )
...
* Move some models to candle-transformers so that they can be shared.
* Also move falcon.
* Move Llama.
* Move whisper (partial).
2023-09-10 09:40:27 +01:00
258ac32c38
Fix cuda randn when generating an odd number of values. ( #793 )
2023-09-09 18:44:21 +01:00
31936c08fe
ViT tracing. ( #790 )
2023-09-09 17:26:39 +01:00
74ad4deb42
Get the MobileSAM TinyViT based version to work. ( #789 )
...
* More TinyViT support in SA.
* More mobilesam work.
* Add the mobile-sam weights to the hub.
2023-09-09 16:21:44 +01:00
b7cd58473b
TinyViT backbone for segment-anything. ( #787 )
...
* TinyViT.
* More TinyViT.
* Add more to the tinyvit backbone.
* Proper padding.
* Plus ViT.
* Add the tiniest vit spec.
2023-09-09 15:10:06 +01:00
3cd7e7b51d
Fuse the rel-pos additions via a custom-op. ( #786 )
...
* Fuse the rel-pos additions via a custom-op.
* Run with rayon.
* Add more tracing.
2023-09-09 10:46:09 +01:00
722c50bb0c
Use byteorder in mnist. ( #785 )
2023-09-09 09:03:59 +01:00
976a1086ee
feat: u32 from_be_bytes ( #765 )
2023-09-09 08:55:35 +01:00
c88d6fd4b9
Remove set_training. ( #784 )
2023-09-09 08:27:37 +01:00
057f7909bc
Accelerate support for gelu. ( #782 )
2023-09-08 21:58:56 +01:00
acf8f10ae1
Get the comparison operation to work on scalar values. ( #780 )
...
* Get the comparison operation to work on scalar values.
* Add some time measurement.
2023-09-08 20:13:29 +01:00
0906acab91
Automatic mask generation ( #779 )
...
* A few more contiguous fixes for cuda.
* Mask generation.
* Generic bbox.
* Generate all the masks.
2023-09-08 19:11:34 +01:00
158ff3c609
Add tracing to segment-anything ( #777 )
...
* Tracing support for segment-anything.
* More tracing.
* Handle the empty slice case.
2023-09-08 15:31:29 +01:00
e5703d2f56
Draw the mask on a merged image. ( #775 )
...
* Draw the mask on a merged image.
* Clippy fix.
* Enable the target point by default.
* Add to the readme.
2023-09-08 14:04:34 +01:00
98172d46fa
Fix some errors about BlockQ8_1 ( #776 )
...
* use int8 type instead of uint8 for BlockQ8_1.qs
The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ4_1
Ref: ebc96086af/ggml.c (L2840)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ5_1
Ref: ebc96086af/ggml.c (L3490)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
---------
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
2023-09-08 13:29:40 +01:00
28c87f6a34
Automatic mask generator + point base mask ( #773 )
...
* Add more to the automatic mask generator.
* Add the target point.
* Fix.
* Remove the allow-unused.
* Mask post-processing.
2023-09-08 12:26:56 +01:00
c1453f00b1
Improve the safetensor loading in the segment-anything example. ( #772 )
...
* Improve the safetensor loading in the segment-anything example.
* Properly handle the labels when embedding the point prompts.
2023-09-08 09:39:10 +01:00
989a4807b1
Use shape with holes. ( #771 )
2023-09-08 08:50:27 +01:00
0e250aee4f
Shape with holes ( #770 )
...
* Shape with holes.
* rustfmt.
2023-09-08 08:38:13 +01:00
cfcbec9fc7
Add small customization to the build ( #768 )
...
* Add ability to override the compiler used by NVCC from an environment variable
* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR
* Add the compilation failure to the readme, with a possible solution
* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
3898e500de
Generate a mask image + the scaled input image. ( #769 )
...
* Also round-trip the original image.
* Make it possible to use a safetensors input.
2023-09-08 05:53:08 +01:00
79c27fc489
Segment-anything fixes: avoid normalizing twice. ( #767 )
...
* Segment-anything fixes: avoid normalizing twice.
* More fixes for the image aspect ratio.
2023-09-07 21:45:16 +01:00
7396b8ed1a
Segment Anything - process images ( #766 )
...
* Start processing images.
* Add LayerNorm2d.
* Properly use LayerNorm2d.
* Tweak eps.
* Use LayerNorm on inputs with a rank different from 3.
* Window partitioning.
* Fix a couple todos.
* More todos.
* Hard-code the einsums.
* More padding support.
* Some sizes tweaks.
* Use the hub to get the weights.
* Use a batch matmul.
* Tweaks.
* More fixes.
* Get some predictions to be generated.
2023-09-07 19:22:45 +01:00
7b50f3e106
More segment-anything again. ( #764 )
...
* More segment-anything again.
* Transformer block forward.
* Two-ways transformer.
* Position embeddings.
* Sketch the prompt encoder.
* More prompt-encoder.
* More prompt-encoder.
* Add the main sam module.
* Embed the transformer.
* And hook the transformer forward step.
* Build the model.
* Handle the global attn indexes.
* Get the model to load.
2023-09-07 12:06:55 +01:00