be9c200cbb
Expose the t5 config fields + allow t5-large. ( #1987 )
2024-04-01 20:58:34 +02:00
1b12142a02
Add min to buckets in relative_position_bucket ( #1312 )
2023-11-10 11:57:25 +01:00
18d30005c5
Add support to UL2 model family ( #1300 )
...
* Add support to UL2 model family
* Update docs with UL2
* Create ActivationWithOptionalGating to avoid polluting activations
* Also refactor quantized t5
* Remove useless conversion
* Revert Activation::NewGelu name change
* Remove useless return
* Apply rustfmt and clippy recommendations
* Reuse t5::ActivationWithOptionalGating in quantized version
* (cosmetic change) use a match rather than ifs + avoid early returns.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-09 18:55:09 +01:00
f772213e84
Fix bug introduced in madlad PR ( #1298 )
2023-11-08 17:55:46 +01:00
508f811b93
Add support for MADLAD400 ( #1285 )
...
* Add support for madlad
* Add support for quantized MADLAD
2023-11-07 05:35:37 +01:00
612f5b8156
Make more models cloneable. ( #1203 )
2023-10-28 07:43:08 +01:00
783735cf22
Use softmax-last-dim where possible. ( #1057 )
2023-10-08 13:16:42 +01:00
2e5fb0b251
Do not use the kv-cache on external key-value states. ( #1054 )
2023-10-07 22:37:19 +01:00
f47bd9bab5
Delete invalid comment ( #1038 )
2023-10-05 19:28:08 +01:00
b54acfa3d0
Tracing for the phi model ( #936 )
...
* Add some tracing bits to mixformers.
* Add the missing file.
* Add the conv2d layer to with-tracing.
* Improve the tracing usage.
2023-09-23 09:19:34 +01:00
2619c4307f
Add a quantized version of the t5 model. ( #921 )
2023-09-21 11:13:39 +01:00
c89b82b2d4
Add a clear cache function to the t5 model. ( #919 )
2023-09-21 09:01:06 +01:00
ab1d40ea97
Add more t5 tracing. ( #915 )
2023-09-20 20:20:54 +01:00
3a0d3e05df
Add more t5 tracing. ( #914 )
...
* Add more t5 tracing.
* Rever the sm change.
2023-09-20 16:37:51 +01:00
9b24d89d2d
Tracing mode for T5. ( #913 )
...
* Tracing mode for T5.
* Tracing for the linear layer.
2023-09-20 15:03:35 +01:00
05626ef492
Flan T5: Read lm_head when word embeddings are not tied ( #903 )
...
* Read lm_head when word embeddings are not tied
* Fix formatting
* Address comments
2023-09-19 22:36:47 +01:00
8696f64bae
Fix T5 kv cache ( #899 )
...
* Fix T5 kv cache
* Add argument for decoder prompt
* Fix range
2023-09-19 20:36:15 +01:00
7f65af1f0d
Avoid re-encoding the input in the T5 example. ( #875 )
2023-09-17 10:25:54 +01:00
1a276b5da7
Add a KV cache to T5. ( #873 )
...
* Add a KV cache to T5.
* Suggest using release mode.
* Use the kv cache in decoding.
* Add a comment.
2023-09-17 08:00:45 +01:00
3e49f8fce5
Implement T5 decoding ( #864 )
...
* Load t5 decoder
* Run enc, dec, and lm head, but no cross attn
* Cross-attention over key_value_states
* New arg for decoder input ids
* Add mask, don't forward position biases through decoder
* Update t5 examples
* Clippy + rustfmt
2023-09-15 22:05:12 +02:00
49d3f7f708
Add support to flan-t5 ( #840 )
2023-09-13 19:27:20 +02:00
3e94324012
Add some sentence similarity part to the t5 example. ( #835 )
...
* Add some sentence similarity part to the t5 example.
* Clippy fix.
2023-09-13 10:44:02 +01:00
e4553fb355
T5 tweaks ( #831 )
...
* Use default values rather than options.
* Avoid exposing the device field.
* More tweaks.
2023-09-13 07:37:04 +01:00
d801e1d564
Clippy fix. ( #830 )
2023-09-13 07:16:20 +01:00
9daa6dbe87
Extract T5 module and add main function to use it ( #829 )
...
* Extract t5 out of musicgen
* Add main for t5 module
2023-09-13 07:14:05 +01:00