mirror of
https://github.com/huggingface/candle.git
synced 2025-06-15 18:28:24 +00:00

* moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward
26 lines
1.1 KiB
Markdown
26 lines
1.1 KiB
Markdown
# candle-moondream
|
|
|
|
[Moondream](https://github.com/vikhyat/moondream) is a computer-vision model can answer real-world questions about images. It's tiny by today's models, with only 1.6B parameters. That enables it to run on a variety of devices, including mobile phones and edge devices.
|
|
|
|
## Running some examples
|
|
First download an example image
|
|
```bash
|
|
$ wget https://raw.githubusercontent.com/vikhyat/moondream/main/assets/demo-1.jpg
|
|
```
|
|
|
|
<img src="https://raw.githubusercontent.com/vikhyat/moondream/main/assets/demo-1.jpg" width="200">
|
|
|
|
Now you can run Moondream from the `candle-examples` crate:
|
|
```bash
|
|
$ cargo run --example moondream --release -- --prompt "What is the girl eating?" --image "./demo-1.jpg"
|
|
|
|
avavx: false, neon: true, simd128: false, f16c: false
|
|
temp: 0.00 repeat-penalty: 1.00 repeat-last-n: 64
|
|
retrieved the files in 3.395583ms
|
|
Running on CPU, to run on GPU(metal), build this example with `--features metal`
|
|
loaded the model in 5.485493792s
|
|
loaded and encoded the image Tensor[dims 3, 378, 378; f32] in 4.801396417s
|
|
starting the inference loop
|
|
The girl is eating a hamburger.<
|
|
9 tokens generated (0.68 token/s)
|
|
``` |