Commit Graph

1141 Commits

Author SHA1 Message Date
6c58fc59fd Little docs changes (#791)
* Little doc fixes

* change imports in lib

* rename candle_core to candle

* revert "rename candle_core to candle"
2023-09-10 12:02:52 +01:00
35f72514f5 Move more models to candle-transformers (#796)
* Move dinov2.

* Move efficientnet.

* Move the quantized llama model.

* Move segment-anything.
2023-09-10 10:20:18 +01:00
d3f05eae8c Move some models to candle-transformers so that it's easier to re-use. (#794)
* Move some models to candle-transformers so that they can be shared.

* Also move falcon.

* Move Llama.

* Move whisper (partial).
2023-09-10 09:40:27 +01:00
258ac32c38 Fix cuda randn when generating an odd number of values. (#793) 2023-09-09 18:44:21 +01:00
31936c08fe ViT tracing. (#790) 2023-09-09 17:26:39 +01:00
74ad4deb42 Get the MobileSAM TinyViT based version to work. (#789)
* More TinyViT support in SA.

* More mobilesam work.

* Add the mobile-sam weights to the hub.
2023-09-09 16:21:44 +01:00
b7cd58473b TinyViT backbone for segment-anything. (#787)
* TinyViT.

* More TinyViT.

* Add more to the tinyvit backbone.

* Proper padding.

* Plus ViT.

* Add the tiniest vit spec.
2023-09-09 15:10:06 +01:00
3cd7e7b51d Fuse the rel-pos additions via a custom-op. (#786)
* Fuse the rel-pos additions via a custom-op.

* Run with rayon.

* Add more tracing.
2023-09-09 10:46:09 +01:00
722c50bb0c Use byteorder in mnist. (#785) 2023-09-09 09:03:59 +01:00
976a1086ee feat: u32 from_be_bytes (#765) 2023-09-09 08:55:35 +01:00
c88d6fd4b9 Remove set_training. (#784) 2023-09-09 08:27:37 +01:00
057f7909bc Accelerate support for gelu. (#782) 2023-09-08 21:58:56 +01:00
acf8f10ae1 Get the comparison operation to work on scalar values. (#780)
* Get the comparison operation to work on scalar values.

* Add some time measurement.
2023-09-08 20:13:29 +01:00
0906acab91 Automatic mask generation (#779)
* A few more contiguous fixes for cuda.

* Mask generation.

* Generic bbox.

* Generate all the masks.
2023-09-08 19:11:34 +01:00
158ff3c609 Add tracing to segment-anything (#777)
* Tracing support for segment-anything.

* More tracing.

* Handle the empty slice case.
2023-09-08 15:31:29 +01:00
e5703d2f56 Draw the mask on a merged image. (#775)
* Draw the mask on a merged image.

* Clippy fix.

* Enable the target point by default.

* Add to the readme.
2023-09-08 14:04:34 +01:00
98172d46fa Fix some errors about BlockQ8_1 (#776)
* use int8 type instead of uint8 for BlockQ8_1.qs

The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>

* fix sum error in vec_dot of BlockQ4_1

Ref: ebc96086af/ggml.c (L2840)

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>

* fix sum error in vec_dot of BlockQ5_1

Ref: ebc96086af/ggml.c (L3490)

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>

---------

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>
2023-09-08 13:29:40 +01:00
28c87f6a34 Automatic mask generator + point base mask (#773)
* Add more to the automatic mask generator.

* Add the target point.

* Fix.

* Remove the allow-unused.

* Mask post-processing.
2023-09-08 12:26:56 +01:00
c1453f00b1 Improve the safetensor loading in the segment-anything example. (#772)
* Improve the safetensor loading in the segment-anything example.

* Properly handle the labels when embedding the point prompts.
2023-09-08 09:39:10 +01:00
989a4807b1 Use shape with holes. (#771) 2023-09-08 08:50:27 +01:00
0e250aee4f Shape with holes (#770)
* Shape with holes.

* rustfmt.
2023-09-08 08:38:13 +01:00
cfcbec9fc7 Add small customization to the build (#768)
* Add ability to override the compiler used by NVCC from an environment variable

* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR

* Add the compilation failure to the readme, with a possible solution

* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
3898e500de Generate a mask image + the scaled input image. (#769)
* Also round-trip the original image.

* Make it possible to use a safetensors input.
2023-09-08 05:53:08 +01:00
79c27fc489 Segment-anything fixes: avoid normalizing twice. (#767)
* Segment-anything fixes: avoid normalizing twice.

* More fixes for the image aspect ratio.
2023-09-07 21:45:16 +01:00
7396b8ed1a Segment Anything - process images (#766)
* Start processing images.

* Add LayerNorm2d.

* Properly use LayerNorm2d.

* Tweak eps.

* Use LayerNorm on inputs with a rank different from 3.

* Window partitioning.

* Fix a couple todos.

* More todos.

* Hard-code the einsums.

* More padding support.

* Some sizes tweaks.

* Use the hub to get the weights.

* Use a batch matmul.

* Tweaks.

* More fixes.

* Get some predictions to be generated.
2023-09-07 19:22:45 +01:00
7b50f3e106 More segment-anything again. (#764)
* More segment-anything again.

* Transformer block forward.

* Two-ways transformer.

* Position embeddings.

* Sketch the prompt encoder.

* More prompt-encoder.

* More prompt-encoder.

* Add the main sam module.

* Embed the transformer.

* And hook the transformer forward step.

* Build the model.

* Handle the global attn indexes.

* Get the model to load.
2023-09-07 12:06:55 +01:00
8c991df394 More segment-anything. (#763)
* More segment-anything.

* Split the model in multiple files.

* Start adding the transformer.

* Add the attention block.

* Move the MLP Block.
2023-09-07 07:28:30 +01:00
000fa00e31 Expose the conv2d-transpose layers. (#761) 2023-09-07 06:04:52 +01:00
a17a7c42c1 Add a nn layer for conv-transpose2d. (#760) 2023-09-07 05:47:28 +01:00
6527ab81a3 Sketch the segment anything model. (#759)
* Sketch the segment anything model.

* Fix some clippy lint.

* Add the mask decoder.
2023-09-07 05:34:05 +01:00
7b1f2da828 Cudnn fix. (#758) 2023-09-06 17:39:39 +01:00
bdc9d46fe3 Use an arc in the varbuilder rather than rc. (#757)
* Use an arc in the varbuilder rather than rc.

* Require the backends to be send.

* Request send and sync.
2023-09-06 15:29:09 +01:00
dcf708559d Fix for cudnn to work with img2img. (#753) 2023-09-06 07:49:28 +01:00
7299a68353 img2img pipeline for stable diffusion. (#752)
* img2img pipeline for stable diffusion.

* Rename the arguments + fix.

* Fix for zero strength.

* Another fix.

* Another fix.

* Revert.

* Include the backtrace.

* Noise scaling.

* Fix the height/width.
2023-09-06 07:06:49 +01:00
16bf44f6e9 force model cache (#751) 2023-09-06 05:53:31 +02:00
a4f40f3dc8 Use rayon directly rather than constraining the number of threads. (#749) 2023-09-05 20:26:15 +01:00
6a40decc76 Minor WASM UI improvements (#748)
* add stats

* random seed btn

* minor ui improvoments
2023-09-05 19:24:43 +01:00
a0d65585db Softmax implementation for cuda. (#747) 2023-09-05 18:38:03 +01:00
94c6a8d3d3 Add a dedicated cuda kernel for softmax. (#746) 2023-09-05 17:53:20 +02:00
6615daf242 Tweaks to softmax. (#745) 2023-09-05 15:22:27 +01:00
1c9e5394a5 Add a custom softmax implementation. (#744)
* Add a custom softmax implementation.

* Add softmaxlastdim to the benchmarks.

* And add a test.

* Support more dtypes.

* Polish the code.

* Use the slow implementation on cuda.

* Add a todo for the cuda kernel.
2023-09-05 14:20:23 +01:00
a8410bf35e Add some documentation. (#743) 2023-09-05 09:51:12 +01:00
cda45a7443 Let outside CustomOp2 implementations use binary_map/binary_map_vec (#741) 2023-09-05 09:27:32 +01:00
4698eb5cb6 Fix typo in the nll function document (#742) 2023-09-05 09:25:11 +01:00
000487c36f Add a python function to save as safetensors. (#740) 2023-09-04 20:32:14 +01:00
ab0d9fbdd1 Properly set the is_bf16 flag. (#738) 2023-09-04 16:45:26 +01:00
f80fd44201 BF16 support for flash-attn. (#737) 2023-09-04 16:35:43 +01:00
0d00c06a83 Fix clippy lint. (#736) 2023-09-04 16:09:19 +01:00
8395152d20 Llama2c WASM UI improvements (#732)
* pass seed, expose model seq_len

* wip new llama2.c ui

* final new UI example

* small coppy

* copy
2023-09-04 15:59:22 +01:00
e2f9f60ac2 Avoid some redundant clone. (#731) 2023-09-04 09:18:32 +02:00