mirror of
https://github.com/huggingface/candle.git
synced 2025-06-16 18:48:51 +00:00

* first commit * llava * clippy and fmt * some fixes * minor fixes * remove useless file * refactor: Remove llava/constants.rs and update llava/mod.rs * modify variable name * modify code after clippy * Minor tweaks. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
41 lines
1.8 KiB
Markdown
41 lines
1.8 KiB
Markdown
# candle-llava
|
|
|
|
LLaVA (Large Language-and-Vision Assistant) is an end-to-end trained large
|
|
multimodal model. This example is from [candle-llava](https://github.com/chenwanqq/candle-llava)
|
|
|
|
The code is based on [https://github.com/haotian-liu/LLaVA](https://github.com/haotian-liu/LLaVA), Hence the llava-hf version of config may perform differently.
|
|
|
|
## model zoo
|
|
* [liuhaotian/LLaVA](https://huggingface.co/liuhaotian)
|
|
* [llava-hf](https://huggingface.co/llava-hf)
|
|
|
|
Right now this has been tested on `liuhaotian/llava-v1.6-vicuna-7b` and
|
|
`llava-hf/llava-v1.6-vicuna-7b-hf`. Memory usage might have room for optimization.
|
|
|
|
## Tokenizer Setup
|
|
The llava-hf models contain a `tokenizer.json` file so can be used directly with
|
|
the `-hf` command line flag.
|
|
|
|
For the original llava models, you can use the following code to generate the `tokenizer.json` file.
|
|
|
|
```bash
|
|
conda create -n llava python=3.10
|
|
pip install transformers protobuf
|
|
conda activate llava
|
|
python -c "from transformers import AutoTokenizer;tokenizer=AutoTokenizer.from_pretrained('liuhaotian/llava-v1.6-vicuna-7b');tokenizer.save_pretrained('tokenizer')"
|
|
```
|
|
Then the `tokenizer.json` file should be in `tokenizer/tokenizer.json` (which is the default path).
|
|
|
|
|
|
## eval
|
|
|
|
```bash
|
|
cargo run --example llava --features cuda -- --image-file "llava_logo.png" --prompt "is this a cat?" --hf # default args, use llava-hf/llava-v1.6-vicuna-7b-hf. image-file is required^_^
|
|
cargo run --example llava --features cuda -- --model-path liuhaotian/llava-v1.6-vicuna-7b --image-file "llava_logo.png" --prompt "is this a cat?" # use liuhaotian/llava-v1.6-vicuna-7b, tokenizer setup should be done
|
|
```
|
|
|
|
## Major Limitations
|
|
1. Currently only support llama-2/vicuna llm. Haven't supoort Mistral yet.
|
|
2. There are some ops like split, nonzero and where are not supported by candle.
|
|
3. Lack of quantization and LoRA support.
|