56a1b7d97e
Apply rustfmt.
2024-06-04 22:47:20 +02:00
47c7ecc948
Merge branch 'refs/heads/leaky_relu' into operators-argmin-argmax-leakyrelu
2024-06-04 21:13:38 +02:00
c441716bd2
Fix a weird automatic RustRover change
2024-06-04 21:13:30 +02:00
a5b81e2c02
Merge branch 'refs/heads/argmin-argmax' into operators-argmin-argmax-leakyrelu
...
# Conflicts:
# candle-onnx/src/eval.rs
# candle-onnx/tests/ops.rs
2024-06-04 21:09:59 +02:00
9182c828e6
Automatically upcast for to_u64 ( #2244 )
2024-06-04 11:32:36 +02:00
3f13ad3d79
Fix dataset id for MNIST ( #2238 )
2024-06-04 06:27:24 +02:00
cd4d941ed1
Add LLaVA support ( #2234 )
...
* first commit
* llava
* clippy and fmt
* some fixes
* minor fixes
* remove useless file
* refactor: Remove llava/constants.rs and update llava/mod.rs
* modify variable name
* modify code after clippy
* Minor tweaks.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-06-03 11:54:09 +02:00
03344d3c19
ONNX: Add Floor and Ceil ( #2235 )
2024-06-02 21:45:20 +02:00
1ec3b2cc18
add where_cond f32 for metal ( #2236 )
2024-06-02 14:30:06 +02:00
f7773d498a
Deactivate some book test that breaks the CI. ( #2233 )
...
* Deactivate some book test that breaks the CI.
* Clippy fix.
2024-06-01 09:44:22 +02:00
7abc3b8cd7
Bump cudarc version to 0.11.4 ( #2230 )
2024-06-01 08:18:35 +02:00
46012ed31f
Another cudarc update. ( #2229 )
2024-05-30 22:27:06 +02:00
f3fade3b03
Update cudarc to 0.11.2. ( #2227 )
2024-05-29 18:50:52 +02:00
ea260aeffd
Add Debug, Clone, Deserialize to moondream config ( #2222 )
2024-05-28 06:08:00 +02:00
0814dfd148
Add a metal kernel for col2im1d. ( #2214 )
...
* Add a metal kernel for col2im1d.
* Enable the col2im variant.
* Bugfix.
* Revert the quantized tweak.
2024-05-25 11:03:23 +02:00
3ceca9901a
Enable the new layer-norm. ( #2213 )
...
* Enable the new layer-norm.
* Shape fixes.
2024-05-24 16:48:21 +02:00
1df2bddccf
Add the layernorm specialized op. ( #2212 )
...
* Add the layernorm cuda kernels.
* Dedicated layer norm op.
* Add the slower variant.
* Plug the cuda implementation.
* Add the metal variant.
* Add a dedicated test.
* Bugfix.
2024-05-24 15:58:01 +02:00
6f0b807ffd
More efficient cuda implementation for ConvTranspose1d. ( #2211 )
...
* More efficient cuda implementation for ConvTranspose1d.
* Small tweak.
2024-05-24 11:05:43 +02:00
d54e02d73d
Avoid a contiguous call in the quantized phi 3 model. ( #2209 )
...
* Simplify the KvCache api.
* Avoid a contiguous call in the quantized phi3 model.
2024-05-23 21:24:55 +02:00
45e235a747
Simplify the KvCache api. ( #2207 )
2024-05-23 17:07:21 +02:00
31cf64147b
Add a couple kv-cache helper functions. ( #2206 )
2024-05-23 16:21:47 +02:00
77ea479a18
Add Phi-3 Medium ( #2205 )
2024-05-23 13:33:17 +02:00
72e7ca529a
Add some missing where-cond kernels for metal. ( #2203 )
2024-05-22 09:44:52 +02:00
7ff921c538
Add RandomNormal ONNX operator ( #2200 )
0.5.1
2024-05-21 21:47:32 +02:00
9b8537a62f
Remove the deprecated wav crate in favor of hound. ( #2202 )
2024-05-21 21:43:35 +02:00
7ebc3548e1
Use flash-attn in gemma. ( #2195 )
...
* Use flash-attn in gemma.
* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
eefc1c77ef
Support flash-attn in quantized phi3. ( #2194 )
2024-05-18 17:12:56 +02:00
01545f7303
Add a slice_set op. ( #2193 )
...
* Add a slice_set op.
* Add some testing.
* Add the dedicated kv-cache module.
* Derive debug and clone.
* Expose more kv-cache functions.
* Return the current data when appending.
* Use the new cache in the quantized phi3 model.
2024-05-18 15:58:18 +02:00
349c3e806a
Support embedding model gte-Qwen1.5-7B-instruct ( #2190 )
...
* Support embedding model gte-Qwen1.5-7B-instruct
This is a text embedding model based on Qwen2. They share same
model architecture except the last MLP module. This commit brings in
minimal modification of the old Qwen2 implementation to support both
models.
An example is provided, and had been verified according to the official
PyTorch implementation.
* Avoid doing the 'last-token filtering' based on the absence of attention mask.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-05-16 21:34:10 +02:00
bdaa34216a
chore: add fix for windows cudarc into the readme ( #2189 )
2024-05-16 14:32:50 +02:00
cc80e065e5
Allow the threshold argumet to be negative in the segment-anything example ( #2187 )
...
Threshold is 0.0 by default, negative values make more points included,
expanding the mask. Positive values make it more picky, making the mask
smaller.
Negative numbers start with a minus sign, which normally makes clap
consider it a flag.
2024-05-15 13:17:20 +02:00
13c64f6828
Fix VarBuilder::from_slice_safetensors ( #2180 )
...
Also implement SimpleBackend for SliceSafetensors
Signed-off-by: Harry Stern <harry@harrystern.net >
2024-05-12 07:26:06 +02:00
21f82a5155
Add SliceSafetensors. ( #2179 )
...
* Add SlicedSafetensors.
* And add some testing.
2024-05-11 13:15:42 +02:00
9cff7bc3f4
Make it possible to use TF32 accumulation in F32 matmuls. ( #2178 )
...
* Allow the use of tf32 accumulation in matmul.
* Better timings.
* Dummy versions for use when cuda is not enabled.
2024-05-11 12:28:39 +02:00
08fd7f7119
Typo fix
2024-05-10 00:51:01 +02:00
2ced31b530
Added a test for LeakyRelu
2024-05-10 00:50:05 +02:00
91b0d526ee
Added LeakyRelu implementation
2024-05-10 00:49:54 +02:00
4de76b89a2
Added tests for ArgMax
2024-05-09 20:45:53 +02:00
8f1119b3e0
Added ArgMax operator implementation
2024-05-09 20:45:41 +02:00
c4743aa570
Added tests from pytorch examples
2024-05-09 20:22:55 +02:00
9a273196b7
ArgMin now returns a tensor with i64
2024-05-09 20:22:22 +02:00
d9bc5ec151
Switch cudarc back to dynamic linking. ( #2176 )
2024-05-09 10:35:44 +02:00
13b88547f7
Added tests for ArgMin
2024-05-09 03:00:22 +02:00
1caf62e4a6
Added ArgMin operator implementation
2024-05-09 03:00:15 +02:00
84328e2b60
Update cudarc requirement from 0.11.0 to 0.11.1 ( #2174 )
...
* Upgrading cudarc dependency from v0.11.0 to v0.11.1 due to that version having resolved a compile-time bug.
See: https://github.com/huggingface/candle/issues/2173
2024-05-08 20:40:36 +02:00
82b641fd27
Update cudarc requirement from 0.10.0 to 0.11.0 ( #2165 )
...
* Update cudarc requirement from 0.10.0 to 0.11.0
Updates the requirements on [cudarc](https://github.com/coreylowman/cudarc ) to permit the latest version.
- [Release notes](https://github.com/coreylowman/cudarc/releases )
- [Commits](https://github.com/coreylowman/cudarc/compare/v0.10.0...v0.10.0 )
---
updated-dependencies:
- dependency-name: cudarc
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
* Use the default cuda version.
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-05-06 17:12:14 +02:00
01794dc16e
Use write rather than try-write on the metal rw-locks. ( #2162 )
2024-05-05 07:22:46 +02:00
a75cd8164f
Force the revision for the phi3-llama quantized models. ( #2159 )
2024-05-04 10:41:18 +02:00
b13a82a438
Separate quantized phi-3 implementation. ( #2157 )
...
* Separate quantized phi-3 implementation.
* Integrate the quantized phi3 model.=
* Small fixes, get the generation to work properly.
* Keep the old llama implementation around.
* Change the default.
2024-05-04 10:14:57 +02:00
59b18d974e
Pin the version used for the quantized phi 3 gguf file. ( #2156 )
2024-05-03 15:03:22 +02:00