2ced31b530
Added a test for LeakyRelu
2024-05-10 00:50:05 +02:00
91b0d526ee
Added LeakyRelu implementation
2024-05-10 00:49:54 +02:00
4de76b89a2
Added tests for ArgMax
2024-05-09 20:45:53 +02:00
8f1119b3e0
Added ArgMax operator implementation
2024-05-09 20:45:41 +02:00
c4743aa570
Added tests from pytorch examples
2024-05-09 20:22:55 +02:00
9a273196b7
ArgMin now returns a tensor with i64
2024-05-09 20:22:22 +02:00
13b88547f7
Added tests for ArgMin
2024-05-09 03:00:22 +02:00
1caf62e4a6
Added ArgMin operator implementation
2024-05-09 03:00:15 +02:00
a06b2ded28
Merge branch 'refs/heads/random' into operators-random-exp
...
# Conflicts:
# candle-onnx/tests/ops.rs
2024-04-23 17:36:33 +02:00
a867d652d3
Merge branch 'refs/heads/exp' into operators-random-exp
2024-04-23 17:33:05 +02:00
8a05743a21
Add StorageRef. ( #2113 )
...
* Add the storage-ref bits.
* Add the metal implementation.
2024-04-23 13:23:27 +02:00
b2e816752b
Use the faster rms-norm kernel for llama. ( #2107 )
...
* Use the faster rms-norm kernel for llama.
* Use the fast variant by default.
2024-04-22 18:52:00 +02:00
618ecf5e23
Better time measurement for the llama example. ( #2106 )
2024-04-22 17:54:27 +02:00
267601eec1
Update tokenizers requirement from 0.15.0 to 0.19.1 ( #2104 )
...
Updates the requirements on [tokenizers](https://github.com/huggingface/tokenizers ) to permit the latest version.
- [Release notes](https://github.com/huggingface/tokenizers/releases )
- [Changelog](https://github.com/huggingface/tokenizers/blob/main/RELEASE.md )
- [Commits](https://github.com/huggingface/tokenizers/compare/v0.15.0...v0.15.2 )
---
updated-dependencies:
- dependency-name: tokenizers
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-22 17:10:46 +02:00
08a15cb79e
Update zip requirement from 0.6.6 to 1.1.1 ( #2103 )
...
* Update zip requirement from 0.6.6 to 1.1.1
---
updated-dependencies:
- dependency-name: zip
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
* Fix for the zip crate update.
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-04-22 16:23:27 +02:00
c388be93e7
Updated quantized phi model ( #2099 )
...
* Quantized phi in a separate file.
* Add the quantized phi example + rework the model code.
* Improve the phi model.
* Get some generation out.
* Use the appropriate rope shape.
* Tweak the default prompt.
---------
Co-authored-by: Jane Doe <jane.doe@example.org >
2024-04-21 07:37:07 +02:00
d22f1d4f4e
Derive clone and debug traits for Moondream model ( #2100 )
...
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
* Derive debug and clone traits for Moondream model.
2024-04-21 07:08:28 +02:00
0067fe00a8
Metal Unary: Add benchmarks and process kernels in a tile based fashion ( #2056 )
...
* add basic unary bench for sqrt
* process unary commands in tiles of 4
* re-enable all benchmarks
* rename helper to unary
* modify approach to split up tiled and non-tiled operations
* undo bench ignore for other tests
* update tile size to 2
* only perform the optimization on the contiguous even numbered element case
2024-04-21 00:10:33 +02:00
587ee3bb6f
Small cleanups to the llama multi-process example. ( #2098 )
2024-04-20 22:19:46 +02:00
dd78422701
Handle multiple dimensions in metal QMM + two fixes. ( #2097 )
2024-04-20 18:55:45 +02:00
9215e9ce8c
Add missing onnx operations ( #2096 )
...
* Add missing onnx operations
* Add tests and fix errors
* Run rustfmt
2024-04-20 18:44:22 +02:00
52ae332910
Use llama v3 by default + add to readme. ( #2094 )
2024-04-20 16:11:24 +02:00
8b390ddd29
Only download the weights in the main process (and not in the child processes). ( #2093 )
2024-04-20 13:01:23 +02:00
c97d639fa0
Multiprocess/multi-GPU support for llama 3. ( #2092 )
...
* Multiprocess/multi-GPU support for llama 3.
* Modernize the mp example a bit.
2024-04-20 12:49:21 +02:00
70388c27b6
Added Exp operator implementation
2024-04-19 22:48:05 +02:00
b45c710dbf
Fix for gemma MQA. ( #2091 )
2024-04-19 21:49:55 +02:00
0fa41a791f
Use is_some to check if seed is present
2024-04-19 16:09:45 +02:00
46073c5f73
Add basic RandomUniform implementation
2024-04-19 16:06:43 +02:00
9c532aef47
Also enable llama-v3 8b instruct. ( #2088 )
2024-04-19 08:50:06 +02:00
f7a6468238
Add support for llama3 on the quantized example ( #2086 )
...
* add support for l3b, new tokenizer
* add todo
* Add todo and use k_s model
* Use the official tokenizers.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-04-18 22:52:00 +02:00
2b93dffe64
Use faster rotary embeddings for llama like models. ( #2087 )
2024-04-18 22:34:29 +02:00
e6ee7ba4d4
Llama v3. ( #2085 )
...
* Llama v3.
* Tweak the default params + handle special tokens.
* Small tweak.
2024-04-18 22:19:54 +02:00
1690ab45d2
Fix the silu gradient issue on 0. ( #2083 )
2024-04-18 14:31:41 +02:00
8de0ce6cba
Add more QMMV cuda kernels. ( #2077 )
...
* Add more QMMV cuda kernels.
* Enable the new kernels.
* Adapt the testing.
2024-04-18 08:36:43 +02:00
ce6d08df94
Minor fix to the readme. ( #2080 )
...
Co-authored-by: Jane Doe <jane.doe@example.org >
2024-04-17 22:43:00 +02:00
2817643db9
Add the mmv kernels for small batch sizes. ( #2075 )
...
* Add the mmv kernels for smaller sizes.
* Support more mmv kernels.
* Use the new kernels.
* Fix the call.
* Silly fix.
* Improve the testing.
* Fix for dmmv.
* Add another dedicated test for the batching mmv.
2024-04-16 21:30:51 +02:00
4d14777673
Utilize batches in Stable Diffusion ( #2071 )
...
* Utilize batches in Stable Diffusion that were already there, but unutilized.
Also refactor out the `save_image` function.
* Clippy + cosmetic fixes.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-04-16 06:49:04 +02:00
f135b7963d
Fix for the batch dim in the quantized matmul example. ( #2073 )
...
* Fix for the batch dim in the quantized matmul example.
* Enable more tests on cuda.
* Add a test for qmm with a batch.
* Fix the zeros-dim test on metal.
2024-04-15 20:00:28 +02:00
af955f260c
Make the falcon model cloneable. ( #2067 )
2024-04-15 09:39:03 +02:00
8ad822a983
Add a function to clear the KV cache in falcon. ( #2066 )
...
* Add a function to clear the KV cache in falcon.
* Clippy.
2024-04-15 09:29:25 +02:00
e198bb0816
Handle zero dims in some simple operations. ( #2064 )
...
* Handle zero dims in some simple operations.
* Handle zero-dims in matmul.
* More testing.
2024-04-15 09:18:54 +02:00
f7d5bf5b97
Faster kernels for quantized matmul on cuda ( #2060 )
...
* Hook the quantized matmul cuda kernels.
* Add a (currently broken) test.
* Kernel fixes.
* Fix by transposing the rhs matrix.
* Add the q4-1 kernels.
* Proper block sizes.
* More details in the tests.
2024-04-15 08:32:47 +02:00
c119600d6e
Move image tensor to device in trocr example ( #2063 )
...
Signed-off-by: Harry Stern <harry@harrystern.net >
2024-04-15 06:50:32 +02:00
c449f65b12
Expose the synchronize function on the generic device. ( #2062 )
2024-04-14 23:02:03 +02:00
db7dbf3071
Add missing bfloat unary strided kernels and fix typo ( #2058 )
2024-04-14 20:01:13 +02:00
4ecedb1598
Add the full quantized matmul kernels for cuda. ( #2057 )
2024-04-14 17:52:08 +02:00
53e5380bf6
Add a synchronize method to devices. ( #2055 )
...
* Add a synchronize method to devices.
* Metal version.
2024-04-14 16:32:55 +02:00
50e49ecc5f
Add a quantized version of recurrent-gemma. ( #2054 )
...
* Add a quantized version of recurrent-gemma.
* Share the rglru part.
* Get the quantized gemma model to work.
2024-04-13 20:07:01 +02:00
4c88c3ce06
Add benchmarks for qmatmul operations ( #2048 )
...
* Add qmatmul bench
* add all dtypes
2024-04-13 12:30:14 +02:00
8b8fb630df
Add a convenient way to rename tensors accessed through a varbuilder. ( #2052 )
2024-04-13 12:09:41 +02:00