candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-14 18:06:36 +00:00

Author	SHA1	Message	Date
shua	6056fd5c90	onnx: fix pad, unsqueeze (#2317 ) * onnx: fix pad, unsqueeze both implementations have off-by-one errors: - Pad 'reflect' cycle for eg `dim==3` is `[0,1,2,1]` which has length of 4 (or `dim2 - 2`) not 5 (current code `dim2 - 1`) - Unsqueeze(-1) for tensor with `dim==3` should be 3 (ie `dim+index+1`) not 2 (ie currently `dim+index`) in addition, Pad is incorrectly calculating the starting padding. If we want to pad out 2 elements to the start, and we have this cycle of indices of length 6, then we should skip 4 elements, but currently we skip 2. A more visual representation of what's going on is below: ``` pad_start: 2 data: [a,b,c,d] indices: [0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, ..] // zigzag between 0..4 actual: skip [ c d\| c b a b] expected: ~ skip ~ [ c b\| a b c d] ``` The values between `[` and `\|` are padding and the values between `\|` and `]` in the example should match the original data being padded. * Fix clippy lints. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-07-23 23:10:57 +02:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Laurent Mazare	c12ad45562	Add a KV cache to marian decoding. (#1226 )	2023-10-31 08:47:44 +00:00
Laurent Mazare	a11af79e23	Add a quantized blip model. (#1155 ) * Add a quantized blip model. * Integrate the quantized blip model to the actual example.	2023-10-22 20:33:25 +01:00
Laurent Mazare	df2f89b6cf	Add some KV cache to blip. (#1150 ) * Add some KV cache to blip. * Mention BLIP in the readme.	2023-10-22 09:44:48 +01:00
Laurent Mazare	3115fe42e4	Blip attention mask + readme (#1146 ) * Add the attention mask to the blip model. * Add a readme.	2023-10-21 22:44:13 +01:00
Laurent Mazare	2531b13bf8	Blip fixes (#1145 ) * Some fixes for the blip example. * Stop generating on sep tokens. * Clippy fixes. * rustfmt.	2023-10-21 21:34:48 +01:00
Laurent Mazare	0d9bb4eb18	Add the blip example. (#1144 ) * Add the blip example. * Tweak the example. * Implement the cross-attn logic. * Fix some shape mismatches. * Get some logits out. * Get some caption to be generated.	2023-10-21 20:05:02 +01:00

8 Commits