mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Files

chenwanqq cd4d941ed1 Add LLaVA support (#2234 )

* first commit

* llava

* clippy and fmt

* some fixes

* minor fixes

* remove useless file

* refactor: Remove llava/constants.rs and update llava/mod.rs

* modify variable name

* modify code after clippy

* Minor tweaks.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>

2024-06-03 11:54:09 +02:00

1.8 KiB

Raw Permalink Blame History

candle-llava

LLaVA (Large Language-and-Vision Assistant) is an end-to-end trained large multimodal model. This example is from candle-llava

The code is based on https://github.com/haotian-liu/LLaVA, Hence the llava-hf version of config may perform differently.

model zoo

Right now this has been tested on liuhaotian/llava-v1.6-vicuna-7b and llava-hf/llava-v1.6-vicuna-7b-hf. Memory usage might have room for optimization.

Tokenizer Setup

The llava-hf models contain a tokenizer.json file so can be used directly with the -hf command line flag.

For the original llava models, you can use the following code to generate the tokenizer.json file.

conda create -n llava python=3.10  
pip install transformers protobuf
conda activate llava
python -c "from transformers import AutoTokenizer;tokenizer=AutoTokenizer.from_pretrained('liuhaotian/llava-v1.6-vicuna-7b');tokenizer.save_pretrained('tokenizer')"

Then the tokenizer.json file should be in tokenizer/tokenizer.json (which is the default path).

eval

cargo run --example llava --features cuda -- --image-file "llava_logo.png" --prompt "is this a cat?" --hf # default args, use  llava-hf/llava-v1.6-vicuna-7b-hf. image-file is required^_^
cargo run --example llava --features cuda -- --model-path liuhaotian/llava-v1.6-vicuna-7b  --image-file "llava_logo.png" --prompt "is this a cat?" # use liuhaotian/llava-v1.6-vicuna-7b, tokenizer setup should be done

Major Limitations

Currently only support llama-2/vicuna llm. Haven't supoort Mistral yet.
There are some ops like split, nonzero and where are not supported by candle.
Lack of quantization and LoRA support.

1.8 KiB Raw Permalink Blame History