mirror of
https://github.com/huggingface/candle.git
synced 2025-06-15 02:16:37 +00:00

* onnx: fix pad, unsqueeze both implementations have off-by-one errors: - Pad 'reflect' cycle for eg `dim==3` is `[0,1,2,1]` which has length of 4 (or `dim*2 - 2`) not 5 (current code `dim*2 - 1`) - Unsqueeze(-1) for tensor with `dim==3` should be 3 (ie `dim+index+1`) not 2 (ie currently `dim+index`) in addition, Pad is incorrectly calculating the starting padding. If we want to pad out 2 elements to the start, and we have this cycle of indices of length 6, then we should skip 4 elements, but currently we skip 2. A more visual representation of what's going on is below: ``` pad_start: 2 data: [a,b,c,d] indices: [0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, 1, 0, ..] // zigzag between 0..4 actual: skip [ c d| c b a b] expected: ~ skip ~ [ c b| a b c d] ``` The values between `[` and `|` are padding and the values between `|` and `]` in the example should match the original data being padded. * Fix clippy lints. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
candle-llava
LLaVA (Large Language-and-Vision Assistant) is an end-to-end trained large multimodal model. This example is from candle-llava
The code is based on https://github.com/haotian-liu/LLaVA, Hence the llava-hf version of config may perform differently.
model zoo
Right now this has been tested on liuhaotian/llava-v1.6-vicuna-7b
and
llava-hf/llava-v1.6-vicuna-7b-hf
. Memory usage might have room for optimization.
Tokenizer Setup
The llava-hf models contain a tokenizer.json
file so can be used directly with
the -hf
command line flag.
For the original llava models, you can use the following code to generate the tokenizer.json
file.
conda create -n llava python=3.10
pip install transformers protobuf
conda activate llava
python -c "from transformers import AutoTokenizer;tokenizer=AutoTokenizer.from_pretrained('liuhaotian/llava-v1.6-vicuna-7b');tokenizer.save_pretrained('tokenizer')"
Then the tokenizer.json
file should be in tokenizer/tokenizer.json
(which is the default path).
eval
cargo run --example llava --features cuda -- --image-file "llava_logo.png" --prompt "is this a cat?" --hf # default args, use llava-hf/llava-v1.6-vicuna-7b-hf. image-file is required^_^
cargo run --example llava --features cuda -- --model-path liuhaotian/llava-v1.6-vicuna-7b --image-file "llava_logo.png" --prompt "is this a cat?" # use liuhaotian/llava-v1.6-vicuna-7b, tokenizer setup should be done
Major Limitations
- Currently only support llama-2/vicuna llm. Haven't supoort Mistral yet.
- There are some ops like split, nonzero and where are not supported by candle.
- Lack of quantization and LoRA support.