* silero-vad v5 example
This change adds an example of how to run silero-vad v5
* PR: rename 'vad' to 'silero-vad'
* Update README.md
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
index_select does not support negative indexing, but
this change adds just enough workarounds in onnx to
allow evaluating silero-vad models (which make use of
negative indices).
* onnx: workaround pow with negative base
rather than fully defining pow in the cpu backend (as in #2318),
this implements a much smaller change which is sufficient to evaluate silero-vad
onnx models. Specifically, checking if pow is run with 2.0 exponent, and if so
evaluate as simply `x*x` instead of the cpu backend of `e^(2.0 * ln(x))`.
* PR: use Tensor::powf insead
powf correctly handles a negative base.
* Start sketching parler-tts support.
* Implement the attention.
* Add the example code.
* Fix the example.
* Add the description + t5 encode it.
* More of the parler forward pass.
* Fix the positional embeddings.
* Support random sampling in generation.
* Handle EOS.
* Add the python decoder.
* Proper causality mask.
* Soft NMS with thresholds
* NMS Test
* Soft nms w/ boxes removed below threshold
* Soft nms test
* No longer removing bounding boxes to fit Soft-NMS focus
* Initialize confidence
* Added comments
* Refactored out updating based on IOU/sigma
* Score_threshold -> confidence_threshold for clarity
* Remove bboxes below confidence threshold
* Softnms basic functionality test
* Softnms confidence decay test
* Softnms confidence threshold test
* Softnms no overlapping bbox test
* Testing confidence after no overlap test
* Single bbox and no bbox tests
* Signify test completion
* Handling result of test functions
* Checking all pairs of bboxes instead of a forward pass
* Equal confidence overlap test
* Clarified tests for implementation
* No longer dropping boxes, just setting to 0.0
* Formatted w/ cargo
* Add the flux autoencoder.
* Add the encoder down-blocks.
* Upsampling in the decoder.
* Sketch the flow matching model.
* More flux model.
* Add some of the positional embeddings.
* Add the rope embeddings.
* Add the sampling functions.
* Add the flux example.
* Fix the T5 bits.
* Proper T5 tokenizer.
* Clip encoder path fix.
* Get the clip embeddings.
* No configurable weights in layer norm.
* More weights related fixes.
* Yet another shape fix.
* DType fix.
* Fix a couple more shape issues.
* DType fixes.
* Fix the latent dims.
* Fix more shape issues.
* Autoencoder fixes.
* Get some generations out.
* Bugfix.
* T5 padding.
* Clippy fix.
* Add the decode only mode.
* Fix.
* More fixes.
* Finally get some generations to work.
* Add readme.
* bert attention mask
* Allow for using None as a mask.
* Revert part of the changes so that the proper default mask applies.
* Cosmetic change.
* Another cosmetic tweak.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Add Llama 3.1 rope
* Clippy
* Format
* Clippy
* Add support for multiple eos tokens:
* Untagged either
* Remove either dep and fix settings.json
* Make the max positional embeddings configurable
* onnx: fix pad, unsqueeze
both implementations have off-by-one errors:
- Pad 'reflect' cycle for eg `dim==3` is `[0,1,2,1]` which has length of
4 (or `dim*2 - 2`) not 5 (current code `dim*2 - 1`)
- Unsqueeze(-1) for tensor with `dim==3` should be 3 (ie `dim+index+1`)
not 2 (ie currently `dim+index`)
in addition, Pad is incorrectly calculating the starting padding.
If we want to pad out 2 elements to the start, and we have this cycle
of indices of length 6, then we should skip 4 elements, but currently
we skip 2. A more visual representation of what's going on is below:
```
pad_start: 2
data: [a,b,c,d]
indices: [0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, ..] // zigzag between 0..4
actual: skip [ c d| c b a b]
expected: ~ skip ~ [ c b| a b c d]
```
The values between `[` and `|` are padding and the values between
`|` and `]` in the example should match the original data being padded.
* Fix clippy lints.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>