Commit Graph

86 Commits

Author SHA1 Message Date
fb660b8d43 Add a cudnn feature to candle-nn/candle-transformers. (#2890) 2025-04-13 17:43:41 +02:00
e3370c6316 Add the SNAC audio tokenizer. (#2869)
* Add the SNAC audio tokenizer.

* More snac.

* Again more snac.

* Add some example code for snac.

* Get the weights to load.

* Add to the snac model.

* Fixes.

* Get round-tripping to work.

* Save/load code files.

* Clippy fix.

* Fmt fix.
2025-04-06 22:15:36 +02:00
e4c3a71f11 Fix GLM4 alignment issue (#2723)
* Fix GLM4 alignment issue

* Cleanups.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-01-20 22:51:46 +01:00
00d8a0c178 Remove some unused macros. (#2618)
* Remove some unused macros.

* More unused fixes.
2024-11-15 16:46:55 +01:00
3d1dc06cdb Enable stable-diffusion 3 on metal. (#2560) 2024-10-14 08:59:12 +02:00
ca7cf5cb3b Add Stable Diffusion 3 Example (#2558)
* Add stable diffusion 3 example

Add get_qkv_linear to handle different dimensionality in linears

Add stable diffusion 3 example

Add use_quant_conv and use_post_quant_conv for vae in stable diffusion

adapt existing AutoEncoderKLConfig to the change

add forward_until_encoder_layer to ClipTextTransformer

rename sd3 config to sd3_medium in mmdit; minor clean-up

Enable flash-attn for mmdit impl when the feature is enabled.

Add sd3 example codebase

add document

crediting references

pass the cargo fmt test

pass the clippy test

* fix typos

* expose cfg_scale and time_shift as options

* Replace the sample image with JPG version. Change image output format accordingly.

* make meaningful error messages

* remove the tail-end assignment in sd3_vae_vb_rename

* remove the CUDA requirement

* use default_value in clap args

* add use_flash_attn to turn on/off flash-attn for MMDiT at runtime

* resolve clippy errors and warnings

* use default_value_t

* Pin the web-sys dependency.

* Clippy fix.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-10-13 22:08:40 +02:00
f856b5c3a7 pyo3 update. (#2545)
* pyo3 update.

* Stub fix.
2024-10-06 10:09:38 +02:00
888d886dd8 Add ColPali (#2524)
* add colpali

* cleanup

* fix clippy
2024-10-01 11:48:39 +02:00
6110ad8d4f Refactor the whisper microphone example. (#2523)
* Refactor the whisper microphone example.

* Tweak the whisper microphone example more.
2024-10-01 00:24:17 +02:00
c58c5d5b01 Add the mimi audio-tokenizer. (#2488)
* Add the mimi audio-tokenizer.

* Formatting tweaks.

* Add a full example.

* Use the transformers names.

* More renamings.

* Get encoding and decoding to work.

* Clippy fixes.
2024-09-20 14:31:20 -06:00
e3c146ada6 silero-vad v5 example (#2321)
* silero-vad v5 example

This change adds an example of how to run silero-vad v5

* PR: rename 'vad' to 'silero-vad'

* Update README.md

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
2024-08-22 22:50:42 +02:00
0f5cbb08b3 Add support for Llama 3.1 (#2359)
* Add Llama 3.1 rope

* Clippy

* Format

* Clippy

* Add support for multiple eos tokens:

* Untagged either

* Remove either dep and fix settings.json

* Make the max positional embeddings configurable
2024-07-26 21:32:26 +02:00
242e006bbb Depth Anything v2 (#2279)
* define structs

* construct ResidualConvUnit

* forward() for ResidualConvUnit

* implement FeatureFusionBlock

* implement Scratch

* implement DPTHead

* add identity module

* implement forward for DTPHead

* add get_intermediate_layers to DinoVisionTransformer

* implement DepthAnythingV2

* some minor tweaks

* fix compile errors

* fix var builder prefixes

* setup initial example

* use fixed patch size of 37 (518 / 14)

* debugged until output

* print min and max values

* add some dynamism to the output location

* scale input image

* extract prep function

* extract output path function

* normalize image with magic mean and std

* add spectral coloring

* squeeze in the right place

* make enterpolation optional

* use bail instead of panic

* omit unnecessary Shape call

* remove empty curly braces

* use bail instead of assert

* use vb and pp

* remove closures

* extract config object

* Apply rustfmt.

* Fix some clippy lints.

* More lints.

* Use the array methods.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-06-24 19:12:52 +02:00
b20acd622c Update for pyo3 0.21. (#1985)
* Update for pyo3 0.21.

* Also adapt the RL example.

* Fix for the pyo3-onnx bindings...

* Print details on failures.

* Revert pyi.
2024-04-01 17:07:02 +02:00
18036c6ccb Update the image crate + use the re-exported version. (#1893)
* Update the image crate + use the re-exported version.

* Update to using ab_glyph.
2024-03-21 10:56:41 +01:00
d365ef32d9 Improve the encodec example: handle resampling. (#1865)
* Improve the encodec example: handle resampling.

* Play the audio directly.
2024-03-18 10:09:40 +01:00
60ee5cfd4d Support more modes in the encodec example. (#1777)
* Support more modes in the encodec example.

* Remove the old encodec model from the musicgen bits.
2024-02-28 09:22:33 +01:00
56e44aabe3 Make some dependencies optional in the examples. (#1776) 2024-02-28 07:17:03 +01:00
13c67226e6 feat: support microphone whisper streaming (#1678)
* feat: support microphone whisper streaming

* fix: cleanup print stmts and adjust how input is read

* fix: remove incorrect comment

* feat: split into new example and simplify

* fix: feature flag example file

* fix: fmt fixes

* feat: simplify and remove redundant files
2024-02-12 18:01:21 +01:00
dfab45e1c8 Supports more audio formats (#1628)
* Supports more audio formats

* Simplify the handling of the different buffer types.

* Check the sample rate.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-02-03 14:26:04 +01:00
89b5a06858 Use bindgen-cuda for the custom-kernel example. (#1536)
* Use bindgen-cuda for the custom-kernel example.

* Only depend on the kernels when cuda is enabled.

* Skip rustfmt.
2024-01-07 17:18:46 +01:00
b4cb982e49 Simplifying our internal cargo dependencies. (#1529) 2024-01-07 12:04:14 +01:00
d35f0a1376 Bump the crate version to 0.3.3. (#1490) 2023-12-28 13:38:30 +01:00
37c539f2b7 Helper function to load sharded safetensors files (#1481)
* Fix the quantized mistral example.

* Add a helper function to load sharded safetensors weights.

* Use the sharded loader.
2023-12-25 21:49:21 +01:00
5b35fd0fcf MMLU evaluation for Phi. (#1474)
* MMLU evaluation for Phi.

* Improve the evaluation.
2023-12-23 15:28:36 +01:00
9fc210fae8 Merge pull request #1318 from huggingface/metal4
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
94817dac56 Bump the crate version to 0.3.2. (#1452) 2023-12-17 05:34:53 -06:00
4349ff1fc2 Starting to fix some tests.
Few fixes.

Going back on remote metal-rs.

Reusing a single buffer (for now) to speed things up.

Adding some half kernels.

All tests are panicking instead of random failure.

Putting back f16 index select.

Add erf.

Working version for llama2-c.

Fixes + cache compute_pipeline_state.

BF16 metal fix.

Remove some prints.

new_owned -> new()..to_owned().

Better batched matmul.

Metal operational.

Reuse buffers on our own reference counts.

Tmp gemm.

Revert "Tmp gemm."

This reverts commit c65f68e988.

Interleave committing.

Speeding up copies using blit.

Fmt.

Fmt.

Remove the assert!

Fmt all.

Fixes after big rebase.

Add softmax for half and bfloat + tests

Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
a209ce8ceb Update for 0.3.1. (#1324) 2023-11-11 18:48:52 +00:00
a773a4b22b [ONNX] Support a couple more ops. (#1284)
* Support the shape op in ONNX.

* Share the axis normalization bits.

* Add some limited support for gather.

* Unsqueeze.

* Comparison with broadcasting.

* Add Not + handle i32.
2023-11-06 22:44:58 +01:00
2a45bcf943 Put the onnx example behind a feature flag. (#1276)
* Put the onnx example behind a feature flag.

* Exclude the onnx bits from the workspace.

* README tweaks.
2023-11-06 07:45:07 +01:00
928a9d906e [ONNX] Do not generate values for constants. (#1272)
* Do not generate values for constants.

* Add an onnx based example using squeezenet.
2023-11-05 11:23:14 +01:00
174b208052 PyO3: Better shape handling (#1143)
* Negative and `*args` shape handling

* Rename to `PyShapeWithHole` + validate that only one hole exists

* Regenerate stubs

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
2023-10-29 15:41:44 +00:00
29c7f2565d Add some reinforcement learning example. (#1090)
* Add some reinforcement learning example.

* Python initialization.

* Get the example to run.

* Vectorized gym envs for the atari wrappers.

* Get some simulation loop to run.
2023-10-14 16:46:43 +01:00
096dee7073 Bump the version to 0.3.0. (#1014)
* Bump the version to 0.3.0.

* Changelog update.
2023-10-01 13:51:57 +01:00
06207332bc Streaming mode for reporting the generated tokens (#1007)
* Token streaming.

* Use the token output stream.

* Flush the output.

* Ensure that the last characters get reported.
2023-09-30 15:04:11 +01:00
bcb0ed8f1c Self-contained safetensors for the multiprocess llama example. (#950) 2023-09-24 06:54:49 +01:00
fb1c2ac535 Add flash-attn support. (#912)
* Add flash-attn support.

* Add the use-flash-attn flag.

* Re-enable flash-attn.
2023-09-20 14:07:55 +01:00
7dd8e12472 Bump the crate versions to v0.2.3. (#886)
* Bump the crate version.

* Also update the python bindings.
2023-09-18 12:14:03 +01:00
2257f4d475 Bump the crate version + update the changelog. (#822) 2023-09-12 06:39:24 +01:00
d3f05eae8c Move some models to candle-transformers so that it's easier to re-use. (#794)
* Move some models to candle-transformers so that they can be shared.

* Also move falcon.

* Move Llama.

* Move whisper (partial).
2023-09-10 09:40:27 +01:00
3cd7e7b51d Fuse the rel-pos additions via a custom-op. (#786)
* Fuse the rel-pos additions via a custom-op.

* Run with rayon.

* Add more tracing.
2023-09-09 10:46:09 +01:00
618f4e4c78 Add some documentation. (#673)
* Add some documentation.

* Bump the crate version.
2023-08-30 11:54:00 +01:00
a3f97c143d Bump the crate version + update CHANGELOG. (#628) 2023-08-27 18:17:11 +01:00
0afbc435df Add some configurable legend for yolo detection. (#603)
* Add some configurable legend for yolo detection.

* Clippyness.
2023-08-25 13:50:31 +01:00
97909e5068 Move the yolo model bits in a separate file. (#602)
* Move the yolo model bits in a separate file.

* Improve the drawing.

* Bugfix.
2023-08-25 12:47:55 +01:00
aba1e90797 Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
a8f61e66cc Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
b9661a1c25 Enable the image crate by default in examples (#501)
* Enable the image crate by default so that it's easier to compile the stable diffusion example.

* Also update the readme.
2023-08-18 10:00:05 +01:00
531f23b4d0 Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00