f65e90e7ef
Bump the crate version. ( #2248 )
2024-06-05 15:49:15 +02:00
d39462856b
Apply rustfmt. ( #2247 )
2024-06-04 22:54:09 +02:00
cb180eb23a
ONNX: add ArgMin, ArgMax and LeakyRelu ( #2246 )
...
* Add basic RandomUniform implementation
* Use is_some to check if seed is present
* Added Exp operator implementation
* Added ArgMin operator implementation
* Added tests for ArgMin
* ArgMin now returns a tensor with i64
* Added tests from pytorch examples
* Added ArgMax operator implementation
* Added tests for ArgMax
* Added LeakyRelu implementation
* Added a test for LeakyRelu
* Typo fix
* Fix a weird automatic RustRover change
---------
Co-authored-by: Mateusz Okulus <mmokulus@gmail.com >
2024-06-04 22:49:02 +02:00
9182c828e6
Automatically upcast for to_u64 ( #2244 )
2024-06-04 11:32:36 +02:00
3f13ad3d79
Fix dataset id for MNIST ( #2238 )
2024-06-04 06:27:24 +02:00
cd4d941ed1
Add LLaVA support ( #2234 )
...
* first commit
* llava
* clippy and fmt
* some fixes
* minor fixes
* remove useless file
* refactor: Remove llava/constants.rs and update llava/mod.rs
* modify variable name
* modify code after clippy
* Minor tweaks.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-06-03 11:54:09 +02:00
03344d3c19
ONNX: Add Floor and Ceil ( #2235 )
2024-06-02 21:45:20 +02:00
1ec3b2cc18
add where_cond f32 for metal ( #2236 )
2024-06-02 14:30:06 +02:00
f7773d498a
Deactivate some book test that breaks the CI. ( #2233 )
...
* Deactivate some book test that breaks the CI.
* Clippy fix.
2024-06-01 09:44:22 +02:00
7abc3b8cd7
Bump cudarc version to 0.11.4 ( #2230 )
2024-06-01 08:18:35 +02:00
46012ed31f
Another cudarc update. ( #2229 )
2024-05-30 22:27:06 +02:00
f3fade3b03
Update cudarc to 0.11.2. ( #2227 )
2024-05-29 18:50:52 +02:00
ea260aeffd
Add Debug, Clone, Deserialize to moondream config ( #2222 )
2024-05-28 06:08:00 +02:00
0814dfd148
Add a metal kernel for col2im1d. ( #2214 )
...
* Add a metal kernel for col2im1d.
* Enable the col2im variant.
* Bugfix.
* Revert the quantized tweak.
2024-05-25 11:03:23 +02:00
3ceca9901a
Enable the new layer-norm. ( #2213 )
...
* Enable the new layer-norm.
* Shape fixes.
2024-05-24 16:48:21 +02:00
1df2bddccf
Add the layernorm specialized op. ( #2212 )
...
* Add the layernorm cuda kernels.
* Dedicated layer norm op.
* Add the slower variant.
* Plug the cuda implementation.
* Add the metal variant.
* Add a dedicated test.
* Bugfix.
2024-05-24 15:58:01 +02:00
6f0b807ffd
More efficient cuda implementation for ConvTranspose1d. ( #2211 )
...
* More efficient cuda implementation for ConvTranspose1d.
* Small tweak.
2024-05-24 11:05:43 +02:00
d54e02d73d
Avoid a contiguous call in the quantized phi 3 model. ( #2209 )
...
* Simplify the KvCache api.
* Avoid a contiguous call in the quantized phi3 model.
2024-05-23 21:24:55 +02:00
45e235a747
Simplify the KvCache api. ( #2207 )
2024-05-23 17:07:21 +02:00
31cf64147b
Add a couple kv-cache helper functions. ( #2206 )
2024-05-23 16:21:47 +02:00
77ea479a18
Add Phi-3 Medium ( #2205 )
2024-05-23 13:33:17 +02:00
72e7ca529a
Add some missing where-cond kernels for metal. ( #2203 )
2024-05-22 09:44:52 +02:00
7ff921c538
Add RandomNormal ONNX operator ( #2200 )
0.5.1
2024-05-21 21:47:32 +02:00
9b8537a62f
Remove the deprecated wav crate in favor of hound. ( #2202 )
2024-05-21 21:43:35 +02:00
7ebc3548e1
Use flash-attn in gemma. ( #2195 )
...
* Use flash-attn in gemma.
* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
eefc1c77ef
Support flash-attn in quantized phi3. ( #2194 )
2024-05-18 17:12:56 +02:00
01545f7303
Add a slice_set op. ( #2193 )
...
* Add a slice_set op.
* Add some testing.
* Add the dedicated kv-cache module.
* Derive debug and clone.
* Expose more kv-cache functions.
* Return the current data when appending.
* Use the new cache in the quantized phi3 model.
2024-05-18 15:58:18 +02:00
349c3e806a
Support embedding model gte-Qwen1.5-7B-instruct ( #2190 )
...
* Support embedding model gte-Qwen1.5-7B-instruct
This is a text embedding model based on Qwen2. They share same
model architecture except the last MLP module. This commit brings in
minimal modification of the old Qwen2 implementation to support both
models.
An example is provided, and had been verified according to the official
PyTorch implementation.
* Avoid doing the 'last-token filtering' based on the absence of attention mask.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-05-16 21:34:10 +02:00
bdaa34216a
chore: add fix for windows cudarc into the readme ( #2189 )
2024-05-16 14:32:50 +02:00
cc80e065e5
Allow the threshold argumet to be negative in the segment-anything example ( #2187 )
...
Threshold is 0.0 by default, negative values make more points included,
expanding the mask. Positive values make it more picky, making the mask
smaller.
Negative numbers start with a minus sign, which normally makes clap
consider it a flag.
2024-05-15 13:17:20 +02:00
13c64f6828
Fix VarBuilder::from_slice_safetensors ( #2180 )
...
Also implement SimpleBackend for SliceSafetensors
Signed-off-by: Harry Stern <harry@harrystern.net >
2024-05-12 07:26:06 +02:00
21f82a5155
Add SliceSafetensors. ( #2179 )
...
* Add SlicedSafetensors.
* And add some testing.
2024-05-11 13:15:42 +02:00
9cff7bc3f4
Make it possible to use TF32 accumulation in F32 matmuls. ( #2178 )
...
* Allow the use of tf32 accumulation in matmul.
* Better timings.
* Dummy versions for use when cuda is not enabled.
2024-05-11 12:28:39 +02:00
d9bc5ec151
Switch cudarc back to dynamic linking. ( #2176 )
2024-05-09 10:35:44 +02:00
84328e2b60
Update cudarc requirement from 0.11.0 to 0.11.1 ( #2174 )
...
* Upgrading cudarc dependency from v0.11.0 to v0.11.1 due to that version having resolved a compile-time bug.
See: https://github.com/huggingface/candle/issues/2173
2024-05-08 20:40:36 +02:00
82b641fd27
Update cudarc requirement from 0.10.0 to 0.11.0 ( #2165 )
...
* Update cudarc requirement from 0.10.0 to 0.11.0
Updates the requirements on [cudarc](https://github.com/coreylowman/cudarc ) to permit the latest version.
- [Release notes](https://github.com/coreylowman/cudarc/releases )
- [Commits](https://github.com/coreylowman/cudarc/compare/v0.10.0...v0.10.0 )
---
updated-dependencies:
- dependency-name: cudarc
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
* Use the default cuda version.
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-05-06 17:12:14 +02:00
01794dc16e
Use write rather than try-write on the metal rw-locks. ( #2162 )
2024-05-05 07:22:46 +02:00
a75cd8164f
Force the revision for the phi3-llama quantized models. ( #2159 )
2024-05-04 10:41:18 +02:00
b13a82a438
Separate quantized phi-3 implementation. ( #2157 )
...
* Separate quantized phi-3 implementation.
* Integrate the quantized phi3 model.=
* Small fixes, get the generation to work properly.
* Keep the old llama implementation around.
* Change the default.
2024-05-04 10:14:57 +02:00
59b18d974e
Pin the version used for the quantized phi 3 gguf file. ( #2156 )
2024-05-03 15:03:22 +02:00
89f53b9d7b
Bump the version number to 0.5.1. ( #2155 )
...
* Bump the version number to 0.5.1.
* Fix clippy lints for 1.78.
* More clippy fixes.
2024-05-03 11:17:05 +02:00
a09d451d11
Support top-k in tthe llama example. ( #2150 )
2024-05-01 22:25:47 +02:00
fa06f5f5f9
F16/BF16 bugfix (bis). ( #2143 )
...
* F16/BF16 bugfix (bis).
* Another fix.
* Yet another fix.
2024-04-29 14:08:44 +02:00
09d4845aa8
Bugfix the recent f16/bf16 changes. ( #2142 )
2024-04-29 13:30:11 +02:00
a0d03aded1
Bug Fix: When converting a tensor to a variable, clone if the tensor is already a variable. ( #2124 )
...
* When converting a tensor to a variable, clone if the tensor is already a variable.
* Add a test to ensure training a batch norm works with VarMaps
---------
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@Jeffreys-Laptop.local >
2024-04-29 11:21:53 +02:00
3bbb88fcb4
Fix sigmoid gradient calculation and move sigmoid into a specialized op ( #2114 )
...
* add sigmoid op
* small fix
* add as a method on `Tensor`
* implement gradient calculation for sigmoid
* add sigmoid tests
* we should have a specialized op for this
* fix clippy
* fix clippy 2
* Revert all previous commits in favor of a `CustomOp` based solution
* use `CustomOp1` implementation
* fix rustfmt
* experimental add metal impl
* add cuda kernel impl
* fix fmt
* Add a test + reduce some cuda duplication.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-04-29 11:04:43 +02:00
ed7b99f525
Add a toggle for F16/BF16 accumulation in gemm. ( #2141 )
...
* Add a toggle to control f16/bf16 gemm precision.
* Use the faster variant in the quantized example.
* Bugfix.
2024-04-29 09:21:07 +02:00
287013ef28
Add a forward_via_f16 method to the qmatmul op. ( #2138 )
2024-04-28 20:35:01 +02:00
eb26e2467e
Add the cuda dequantize f16 kernels. ( #2137 )
...
* Add the cuda dequantize f16 kernels.
* Expose the cuda kernels.
* Add some testing + fix.
* Test the other cases too.
* A few more tests.
* Add an environment variable to enable the dequantize f16 + matmul behavior.
2024-04-28 20:05:05 +02:00
c68ed8963f
chore: fix some typos in comments ( #2121 )
...
Signed-off-by: hardlydearly <799511800@qq.com >
2024-04-28 08:34:32 +02:00