mirror of
https://github.com/huggingface/candle.git
synced 2025-06-19 11:56:45 +00:00
Use HF Papers
This commit is contained in:
@ -4,7 +4,7 @@ Experimental, not instruction-tuned small LLM from the Hazy Research group, comb
|
||||
|
||||
[Blogpost](https://hazyresearch.stanford.edu/blog/2024-03-03-based)
|
||||
|
||||
[Simple linear attention language models balance the recall-throughput tradeoff](https://arxiv.org/abs/2402.18668)
|
||||
[Simple linear attention language models balance the recall-throughput tradeoff](https://huggingface.co/papers/2402.18668)
|
||||
|
||||
## Running an example
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-beit
|
||||
|
||||
[Beit](https://arxiv.org/abs/2106.08254) is a computer vision model.
|
||||
[Beit](https://huggingface.co/papers/2106.08254) is a computer vision model.
|
||||
In this example, it is used as an ImageNet classifier: the model returns the
|
||||
probability for the image to belong to each of the 1000 ImageNet categories.
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
[HuggingFace Model Card](https://huggingface.co/vidore/colpali-v1.2-merged)
|
||||
|
||||
```
|
||||
wget https://arxiv.org/pdf/1706.03762.pdf
|
||||
wget https://huggingface.co/papers/1706.03762
|
||||
cargo run --features cuda,pdf2image --release --example colpali -- --prompt "What is Positional Encoding" --pdf "1706.03762.pdf"
|
||||
```
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
A lightweight CNN architecture that processes image patches similar to a vision transformer, with separate spatial and channel convolutions.
|
||||
|
||||
ConvMixer from [Patches Are All You Need?](https://arxiv.org/pdf/2201.09792) and [ConvMixer](https://github.com/locuslab/convmixer).
|
||||
ConvMixer from [Patches Are All You Need?](https://huggingface.co/papers/2201.09792) and [ConvMixer](https://github.com/locuslab/convmixer).
|
||||
|
||||
## Running an example
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
# candle-convnext
|
||||
|
||||
[A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) and
|
||||
[ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808).
|
||||
[A ConvNet for the 2020s](https://huggingface.co/papers/2201.03545) and
|
||||
[ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://huggingface.co/papers/2301.00808).
|
||||
|
||||
This candle implementation uses a pre-trained ConvNeXt network for inference. The
|
||||
classification head has been trained on the ImageNet dataset and returns the
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-dinov2-reg4
|
||||
|
||||
[DINOv2-reg4](https://arxiv.org/abs/2309.16588) is the lastest version of DINOv2 with registers.
|
||||
[DINOv2-reg4](https://huggingface.co/papers/2309.16588) is the lastest version of DINOv2 with registers.
|
||||
In this example, it is used as an plant species classifier: the model returns the
|
||||
probability for the image to belong to each of the 7806 PlantCLEF2024 categories.
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
//! DINOv2 reg4 finetuned on PlantCLEF 2024
|
||||
//! https://arxiv.org/abs/2309.16588
|
||||
//! https://huggingface.co/papers/2309.16588
|
||||
//! https://huggingface.co/spaces/BVRA/PlantCLEF2024
|
||||
//! https://zenodo.org/records/10848263
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
//! EfficientNet implementation.
|
||||
//!
|
||||
//! https://arxiv.org/abs/1905.11946
|
||||
//! https://huggingface.co/papers/1905.11946
|
||||
|
||||
#[cfg(feature = "mkl")]
|
||||
extern crate intel_mkl_src;
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-efficientvit
|
||||
|
||||
[EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention](https://arxiv.org/abs/2305.07027).
|
||||
[EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention](https://huggingface.co/papers/2305.07027).
|
||||
|
||||
This candle implementation uses a pre-trained EfficientViT (from Microsoft Research Asia) network for inference.
|
||||
The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-eva2
|
||||
|
||||
[EVA-02](https://arxiv.org/abs/2303.11331) is a computer vision model.
|
||||
[EVA-02](https://huggingface.co/papers/2303.11331) is a computer vision model.
|
||||
In this example, it is used as an ImageNet classifier: the model returns the
|
||||
probability for the image to belong to each of the 1000 ImageNet categories.
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-fastvit
|
||||
|
||||
[FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://arxiv.org/abs/2303.14189).
|
||||
[FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://huggingface.co/papers/2303.14189).
|
||||
This candle implementation uses a pre-trained FastViT network for inference. The
|
||||
classification head has been trained on the ImageNet dataset and returns the
|
||||
probabilities for the top-5 classes.
|
||||
|
@ -3,7 +3,7 @@
|
||||
gte-Qwen1.5-7B-instruct is a variant of the GTE embedding model family.
|
||||
|
||||
- [Model card](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct) on the HuggingFace Hub.
|
||||
- [Technical report](https://arxiv.org/abs/2308.03281) *Towards General Text Embeddings with Multi-stage Contrastive Learning*
|
||||
- [Technical report](https://huggingface.co/papers/2308.03281) *Towards General Text Embeddings with Multi-stage Contrastive Learning*
|
||||
|
||||
|
||||
## Running the example
|
||||
|
@ -1,6 +1,6 @@
|
||||
# hiera
|
||||
|
||||
[Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/abs/2306.00989)
|
||||
[Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://huggingface.co/papers/2306.00989)
|
||||
This candle implementation uses pre-trained Hiera models from timm for inference.
|
||||
The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.
|
||||
|
||||
|
@ -5,7 +5,7 @@ the transformer architecture. It leverages State Space Models (SSMs) with the
|
||||
goal of being computationally efficient on long sequences. The implementation is
|
||||
based on [mamba.rs](https://github.com/LaurentMazare/mamba.rs).
|
||||
|
||||
- [1]. [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752).
|
||||
- [1]. [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://huggingface.co/papers/2312.00752).
|
||||
|
||||
Compared to the mamba-minimal example, this version is far more efficient but
|
||||
would only work for inference.
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
MobileCLIP is family of efficient CLIP-like models using FastViT-based image encoders.
|
||||
|
||||
See [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://arxiv.org/abs/2311.17049)
|
||||
See [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://huggingface.co/papers/2311.17049)
|
||||
|
||||
|
||||
## Running on an example on cpu
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-mobilenetv4
|
||||
|
||||
[MobileNetV4 - Universal Models for the Mobile Ecosystem](https://arxiv.org/abs/2404.10518)
|
||||
[MobileNetV4 - Universal Models for the Mobile Ecosystem](https://huggingface.co/papers/2404.10518)
|
||||
This candle implementation uses pre-trained MobileNetV4 models from timm for inference.
|
||||
The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-mobileone
|
||||
|
||||
[MobileOne: An Improved One millisecond Mobile Backbone](https://arxiv.org/abs/2206.04040).
|
||||
[MobileOne: An Improved One millisecond Mobile Backbone](https://huggingface.co/papers/2206.04040).
|
||||
|
||||
This candle implementation uses a pre-trained MobileOne network for inference. The
|
||||
classification head has been trained on the ImageNet dataset and returns the
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-musicgen
|
||||
|
||||
Candle implementation of musicgen from [Simple and Controllable Music Generation](https://arxiv.org/pdf/2306.05284).
|
||||
Candle implementation of musicgen from [Simple and Controllable Music Generation](https://huggingface.co/papers/2306.05284).
|
||||
|
||||
## Running an example
|
||||
|
||||
|
@ -3,7 +3,7 @@
|
||||
OLMo is a series of Open Language Models designed to enable the science of language models.
|
||||
|
||||
- **Project Page:** https://allenai.org/olmo
|
||||
- **Papers:** [OLMo](https://arxiv.org/abs/2402.00838) [OLMo 2](https://arxiv.org/abs/2501.00656)
|
||||
- **Papers:** [OLMo](https://huggingface.co/papers/2402.00838) [OLMo 2](https://huggingface.co/papers/2501.00656)
|
||||
- **Technical blog post:** https://blog.allenai.org/olmo-open-language-model-87ccfc95f580
|
||||
- **W&B Logs:** https://wandb.ai/ai2-llm/OLMo-1B/reports/OLMo-1B--Vmlldzo2NzY1Njk1
|
||||
<!-- - **Press release:** TODO -->
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
This example demonstrates how to run [ONNX](https://github.com/onnx/onnx) based models in Candle.
|
||||
|
||||
It contains small variants of two models, [SqueezeNet](https://arxiv.org/pdf/1602.07360.pdf) (default) and [EfficientNet](https://arxiv.org/pdf/1905.11946.pdf).
|
||||
It contains small variants of two models, [SqueezeNet](https://huggingface.co/papers/1602.07360) (default) and [EfficientNet](https://huggingface.co/papers/1905.11946).
|
||||
|
||||
You can run the examples with following commands:
|
||||
|
||||
|
@ -51,7 +51,7 @@ cargo run --example quantized-t5 --release -- \
|
||||
Note that a storm surge is what forecasters consider a hurricane's most dangerous part.
|
||||
```
|
||||
|
||||
### [MADLAD-400](https://arxiv.org/abs/2309.04662)
|
||||
### [MADLAD-400](https://huggingface.co/papers/2309.04662)
|
||||
|
||||
MADLAD-400 is a series of multilingual machine translation T5 models trained on 250 billion tokens covering over 450 languages using publicly available data. These models are competitive with significantly larger models.
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-repvgg
|
||||
|
||||
[RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697).
|
||||
[RepVGG: Making VGG-style ConvNets Great Again](https://huggingface.co/papers/2101.03697).
|
||||
|
||||
This candle implementation uses a pre-trained RepVGG network for inference. The
|
||||
classification head has been trained on the ImageNet dataset and returns the
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-resnet
|
||||
|
||||
A candle implementation of inference using a pre-trained [ResNet](https://arxiv.org/abs/1512.03385).
|
||||
A candle implementation of inference using a pre-trained [ResNet](https://huggingface.co/papers/1512.03385).
|
||||
This uses a classification head trained on the ImageNet dataset and returns the
|
||||
probabilities for the top-5 classes.
|
||||
|
||||
|
@ -7,7 +7,7 @@
|
||||
Stable Diffusion 3 Medium is a text-to-image model based on Multimodal Diffusion Transformer (MMDiT) architecture.
|
||||
|
||||
- [huggingface repo](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
|
||||
- [research paper](https://arxiv.org/pdf/2403.03206)
|
||||
- [research paper](https://huggingface.co/papers/2403.03206)
|
||||
- [announcement blog post](https://stability.ai/news/stable-diffusion-3-medium)
|
||||
|
||||
Stable Diffusion 3.5 is a family of text-to-image models with latest improvements:
|
||||
|
@ -69,7 +69,7 @@ pub fn euler_sample(
|
||||
}
|
||||
|
||||
// The "Resolution-dependent shifting of timestep schedules" recommended in the SD3 tech report paper
|
||||
// https://arxiv.org/pdf/2403.03206
|
||||
// https://huggingface.co/papers/2403.03206
|
||||
// Following the implementation in ComfyUI:
|
||||
// https://github.com/comfyanonymous/ComfyUI/blob/3c60ecd7a83da43d694e26a77ca6b93106891251/
|
||||
// comfy/model_sampling.py#L181
|
||||
|
@ -1,6 +1,6 @@
|
||||
# candle-starcoder2
|
||||
|
||||
Candle implementation of Star Coder 2 family of code generation model from [StarCoder 2 and The Stack v2: The Next Generation](https://arxiv.org/pdf/2402.19173).
|
||||
Candle implementation of Star Coder 2 family of code generation model from [StarCoder 2 and The Stack v2: The Next Generation](https://huggingface.co/papers/2402.19173).
|
||||
|
||||
## Running an example
|
||||
|
||||
|
@ -16,7 +16,7 @@ $ cargo run --example stella-en-v5 --release -- --query "What are safetensors?"
|
||||
> Tensor[[1, 1024], f32]
|
||||
```
|
||||
|
||||
Stella_en_1.5B_v5 is trained by [MRL](https://arxiv.org/abs/2205.13147) enabling multiple embedding dimensions.
|
||||
Stella_en_1.5B_v5 is trained by [MRL](https://huggingface.co/papers/2205.13147) enabling multiple embedding dimensions.
|
||||
|
||||
The following reproduces the example in the [model card](https://huggingface.co/dunzhang/stella_en_1.5B_v5) for a retrieval task (s2p). The sample queries and docs are hardcoded in the example.
|
||||
|
||||
|
@ -13,7 +13,7 @@ $ cargo run --example t5 --release -- --model-id "t5-small" --prompt "translate
|
||||
|
||||
Variants such as [flan-t5](https://huggingface.co/google/flan-t5-small), [flan-ul2](https://huggingface.co/google/flan-ul2) (with `--revision "refs/pr/25"`), and [Co-EdIT](https://huggingface.co/grammarly/coedit-large) are also supported.
|
||||
|
||||
## Translation with [MADLAD-400](https://arxiv.org/abs/2309.04662)
|
||||
## Translation with [MADLAD-400](https://huggingface.co/papers/2309.04662)
|
||||
|
||||
MADLAD-400 is a series of multilingual machine translation T5 models trained on 250 billion tokens covering over 450 languages using publicly available data. These models are competitive with significantly larger models.
|
||||
|
||||
|
@ -8,7 +8,7 @@ The candle implementation reproduces the same structure/files for models and
|
||||
pipelines. Useful resources:
|
||||
|
||||
- [Official implementation](https://github.com/dome272/Wuerstchen).
|
||||
- [Arxiv paper](https://arxiv.org/abs/2306.00637).
|
||||
- [Arxiv paper](https://huggingface.co/papers/2306.00637).
|
||||
- Blog post: [Introducing Würstchen: Fast Diffusion for Image Generation](https://huggingface.co/blog/wuerstchen).
|
||||
|
||||
## Getting the weights
|
||||
|
Reference in New Issue
Block a user