* update to cudarc to v0.13.5 to support cuda 12.8
* Bump the crate version.
---------
Co-authored-by: Michael McCulloch <michael.james.mcculloch@fastmail.com>
* Stella_en_1.5B_v5
* Separated creation. This is a critical step for numerical accuracy and would be documented in the readme
* EmbedDim would require clone and copy
* WIP: example
* Examples added
* a litte more in README
* WIP: ONNX Reduce-max ops
* WIP: tests for ReduceMin
* Reduce min/ max v18+
* Reformatting tests for better review readability
* Error on empty set, backward compatibility (13 and below) with 'axes'
* candle-onnx: Add Split and Expand operators, Fix Where Op
Implemented based on https://github.com/onnx/onnx/blob/main/docs/Operators.md
Test cases based on those examples.
TODO: Should add the remaining Split examples as tests
TODO: Add.test case that motivates Where fix
* candle-onnx: Add ReduceSum operator
Implemented based on https://github.com/onnx/onnx/blob/main/docs/Operators.md
Test cases based on those examples.
TODO: Should add the remaining ReduceSum examples as tests
* candle-onnx: Add ReduceL2 operator
Implemented based on https://github.com/onnx/onnx/blob/main/docs/Operators.md
Test cases based on those examples.
TODO: Should add the remaining ReduceSum examples as tests
* candle-onnx: Fix Clip operator empty string as default arg issue
Optional input args may be signified by an empty string. The length of the input array is not enough because non optional args may follow optional ones.
I encountered this when trying to use the ONNX model found at https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 for example.
The LSTM op has a utility which I factored to be more generally accessible, and I have used it in the ops I have recently created or debugged.
I believe it is likely that this issue may also manifest in other ops, but I didn't want to change anything that I'm not testing.
* fix formatting
* fix small mistake made during refactor
index_select does not support negative indexing, but
this change adds just enough workarounds in onnx to
allow evaluating silero-vad models (which make use of
negative indices).
* onnx: workaround pow with negative base
rather than fully defining pow in the cpu backend (as in #2318),
this implements a much smaller change which is sufficient to evaluate silero-vad
onnx models. Specifically, checking if pow is run with 2.0 exponent, and if so
evaluate as simply `x*x` instead of the cpu backend of `e^(2.0 * ln(x))`.
* PR: use Tensor::powf insead
powf correctly handles a negative base.
* onnx: fix pad, unsqueeze
both implementations have off-by-one errors:
- Pad 'reflect' cycle for eg `dim==3` is `[0,1,2,1]` which has length of
4 (or `dim*2 - 2`) not 5 (current code `dim*2 - 1`)
- Unsqueeze(-1) for tensor with `dim==3` should be 3 (ie `dim+index+1`)
not 2 (ie currently `dim+index`)
in addition, Pad is incorrectly calculating the starting padding.
If we want to pad out 2 elements to the start, and we have this cycle
of indices of length 6, then we should skip 4 elements, but currently
we skip 2. A more visual representation of what's going on is below:
```
pad_start: 2
data: [a,b,c,d]
indices: [0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, ..] // zigzag between 0..4
actual: skip [ c d| c b a b]
expected: ~ skip ~ [ c b| a b c d]
```
The values between `[` and `|` are padding and the values between
`|` and `]` in the example should match the original data being padded.
* Fix clippy lints.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* feat(gemm): implement Gemm operator in candle-onnx
* feat(onnx): Add support for ArgMax operator in candle-onnx
* Apply rustfmt.
* Remove argmax as it was already present.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* implement if, and pad reflect mode
The intent of this change is to allow eval of the current silero_vad.onnx (v4).
This onnx file uses 'If' and 'Pad' nodes, which had not been supported
by simple_eval until now
* Cleanup (fmt, clippy, minor test tweaks).
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>