More Model Module Docs (#2623)

* dinov2

* add another example

* ad dinov2reg4

* eva2

* efficientvit

* moondream

* update t5

* update t5

* rwkv

* stable diffusion docs

* add wasm link

* add segment_anything

* adjsut for clippy

* ignore bertdoc

* dinov2 ignore

* update block to be text

* remove the rust blocks for the moment

* bump python to 3.11

* add a setup-python step

* add py311 to test as well
This commit is contained in:
zachcp
2024-11-17 14:27:24 -05:00
committed by GitHub
parent a3f200e369
commit 12d7e7b145
12 changed files with 291 additions and 72 deletions

View File

@ -1,9 +1,40 @@
//! EfficientViT (MSRA) inference implementation based on timm.
//!
//! See ["EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention"](https://arxiv.org/abs/2305.07027)
//! This crate provides an implementation of the EfficientViT model from Microsoft Research Asia
//! for efficient image classification. The model uses cascaded group attention modules
//! to achieve strong performance while maintaining low memory usage.
//!
//! The model was originally described in the paper:
//! ["EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention"](https://arxiv.org/abs/2305.07027)
//!
//! This implementation is based on the reference implementation from
//! [pytorch-image-models](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_msra.py).
//!
//! # Example Usage
//!
//! This candle implementation uses a pre-trained EfficientViT (from Microsoft Research Asia) network for inference.
//! The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.
//!
//!
//! ```bash
//! cargo run
//! --example efficientvit \
//! --release -- \
//! --image candle-examples/examples/yolo-v8/assets/bike.jpg --which m1
//!
//! > loaded image Tensor[dims 3, 224, 224; f32]
//! > model built
//! > mountain bike, all-terrain bike, off-roader: 69.80%
//! > unicycle, monocycle : 13.03%
//! > bicycle-built-for-two, tandem bicycle, tandem: 9.28%
//! > crash helmet : 2.25%
//! > alp : 0.46%
//! ```
//!
//! <div align=center>
//! <img src="https://github.com/huggingface/candle/raw/main/candle-examples/examples/yolo-v8/assets/bike.jpg" alt="" width=640>
//! </div>
//!
//! Based on implementation from [pytorch-image-models](https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/efficientvit_msra.py)
use candle::{Result, Tensor, D};
use candle_nn::{
batch_norm, conv2d, conv2d_no_bias, linear, ops::sigmoid, ops::softmax, Conv2dConfig, Func,