Files
chenwanqq cd4d941ed1 Add LLaVA support (#2234)
* first commit

* llava

* clippy and fmt

* some fixes

* minor fixes

* remove useless file

* refactor: Remove llava/constants.rs and update llava/mod.rs

* modify variable name

* modify code after clippy

* Minor tweaks.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-06-03 11:54:09 +02:00

1.8 KiB

candle-llava

LLaVA (Large Language-and-Vision Assistant) is an end-to-end trained large multimodal model. This example is from candle-llava

The code is based on https://github.com/haotian-liu/LLaVA, Hence the llava-hf version of config may perform differently.

model zoo

Right now this has been tested on liuhaotian/llava-v1.6-vicuna-7b and llava-hf/llava-v1.6-vicuna-7b-hf. Memory usage might have room for optimization.

Tokenizer Setup

The llava-hf models contain a tokenizer.json file so can be used directly with the -hf command line flag.

For the original llava models, you can use the following code to generate the tokenizer.json file.

conda create -n llava python=3.10  
pip install transformers protobuf
conda activate llava
python -c "from transformers import AutoTokenizer;tokenizer=AutoTokenizer.from_pretrained('liuhaotian/llava-v1.6-vicuna-7b');tokenizer.save_pretrained('tokenizer')"

Then the tokenizer.json file should be in tokenizer/tokenizer.json (which is the default path).

eval

cargo run --example llava --features cuda -- --image-file "llava_logo.png" --prompt "is this a cat?" --hf # default args, use  llava-hf/llava-v1.6-vicuna-7b-hf. image-file is required^_^
cargo run --example llava --features cuda -- --model-path liuhaotian/llava-v1.6-vicuna-7b  --image-file "llava_logo.png" --prompt "is this a cat?" # use liuhaotian/llava-v1.6-vicuna-7b, tokenizer setup should be done

Major Limitations

  1. Currently only support llama-2/vicuna llm. Haven't supoort Mistral yet.
  2. There are some ops like split, nonzero and where are not supported by candle.
  3. Lack of quantization and LoRA support.