Use HF Papers

This commit is contained in:
Quentin Gallouédec
2025-05-17 03:41:24 +00:00
parent 92106c8762
commit ffb8d63324
97 changed files with 113 additions and 113 deletions

View File

@ -4,7 +4,7 @@ Experimental, not instruction-tuned small LLM from the Hazy Research group, comb
[Blogpost](https://hazyresearch.stanford.edu/blog/2024-03-03-based)
[Simple linear attention language models balance the recall-throughput tradeoff](https://arxiv.org/abs/2402.18668)
[Simple linear attention language models balance the recall-throughput tradeoff](https://huggingface.co/papers/2402.18668)
## Running an example

View File

@ -1,6 +1,6 @@
# candle-beit
[Beit](https://arxiv.org/abs/2106.08254) is a computer vision model.
[Beit](https://huggingface.co/papers/2106.08254) is a computer vision model.
In this example, it is used as an ImageNet classifier: the model returns the
probability for the image to belong to each of the 1000 ImageNet categories.

View File

@ -3,7 +3,7 @@
[HuggingFace Model Card](https://huggingface.co/vidore/colpali-v1.2-merged)
```
wget https://arxiv.org/pdf/1706.03762.pdf
wget https://huggingface.co/papers/1706.03762
cargo run --features cuda,pdf2image --release --example colpali -- --prompt "What is Positional Encoding" --pdf "1706.03762.pdf"
```

View File

@ -2,7 +2,7 @@
A lightweight CNN architecture that processes image patches similar to a vision transformer, with separate spatial and channel convolutions.
ConvMixer from [Patches Are All You Need?](https://arxiv.org/pdf/2201.09792) and [ConvMixer](https://github.com/locuslab/convmixer).
ConvMixer from [Patches Are All You Need?](https://huggingface.co/papers/2201.09792) and [ConvMixer](https://github.com/locuslab/convmixer).
## Running an example

View File

@ -1,7 +1,7 @@
# candle-convnext
[A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) and
[ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://arxiv.org/abs/2301.00808).
[A ConvNet for the 2020s](https://huggingface.co/papers/2201.03545) and
[ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https://huggingface.co/papers/2301.00808).
This candle implementation uses a pre-trained ConvNeXt network for inference. The
classification head has been trained on the ImageNet dataset and returns the

View File

@ -1,6 +1,6 @@
# candle-dinov2-reg4
[DINOv2-reg4](https://arxiv.org/abs/2309.16588) is the lastest version of DINOv2 with registers.
[DINOv2-reg4](https://huggingface.co/papers/2309.16588) is the lastest version of DINOv2 with registers.
In this example, it is used as an plant species classifier: the model returns the
probability for the image to belong to each of the 7806 PlantCLEF2024 categories.

View File

@ -1,5 +1,5 @@
//! DINOv2 reg4 finetuned on PlantCLEF 2024
//! https://arxiv.org/abs/2309.16588
//! https://huggingface.co/papers/2309.16588
//! https://huggingface.co/spaces/BVRA/PlantCLEF2024
//! https://zenodo.org/records/10848263

View File

@ -1,6 +1,6 @@
//! EfficientNet implementation.
//!
//! https://arxiv.org/abs/1905.11946
//! https://huggingface.co/papers/1905.11946
#[cfg(feature = "mkl")]
extern crate intel_mkl_src;

View File

@ -1,6 +1,6 @@
# candle-efficientvit
[EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention](https://arxiv.org/abs/2305.07027).
[EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention](https://huggingface.co/papers/2305.07027).
This candle implementation uses a pre-trained EfficientViT (from Microsoft Research Asia) network for inference.
The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.

View File

@ -1,6 +1,6 @@
# candle-eva2
[EVA-02](https://arxiv.org/abs/2303.11331) is a computer vision model.
[EVA-02](https://huggingface.co/papers/2303.11331) is a computer vision model.
In this example, it is used as an ImageNet classifier: the model returns the
probability for the image to belong to each of the 1000 ImageNet categories.

View File

@ -1,6 +1,6 @@
# candle-fastvit
[FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://arxiv.org/abs/2303.14189).
[FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization](https://huggingface.co/papers/2303.14189).
This candle implementation uses a pre-trained FastViT network for inference. The
classification head has been trained on the ImageNet dataset and returns the
probabilities for the top-5 classes.

View File

@ -3,7 +3,7 @@
gte-Qwen1.5-7B-instruct is a variant of the GTE embedding model family.
- [Model card](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct) on the HuggingFace Hub.
- [Technical report](https://arxiv.org/abs/2308.03281) *Towards General Text Embeddings with Multi-stage Contrastive Learning*
- [Technical report](https://huggingface.co/papers/2308.03281) *Towards General Text Embeddings with Multi-stage Contrastive Learning*
## Running the example

View File

@ -1,6 +1,6 @@
# hiera
[Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/abs/2306.00989)
[Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://huggingface.co/papers/2306.00989)
This candle implementation uses pre-trained Hiera models from timm for inference.
The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.

View File

@ -5,7 +5,7 @@ the transformer architecture. It leverages State Space Models (SSMs) with the
goal of being computationally efficient on long sequences. The implementation is
based on [mamba.rs](https://github.com/LaurentMazare/mamba.rs).
- [1]. [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752).
- [1]. [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://huggingface.co/papers/2312.00752).
Compared to the mamba-minimal example, this version is far more efficient but
would only work for inference.

View File

@ -2,7 +2,7 @@
MobileCLIP is family of efficient CLIP-like models using FastViT-based image encoders.
See [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://arxiv.org/abs/2311.17049)
See [MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training](https://huggingface.co/papers/2311.17049)
## Running on an example on cpu

View File

@ -1,6 +1,6 @@
# candle-mobilenetv4
[MobileNetV4 - Universal Models for the Mobile Ecosystem](https://arxiv.org/abs/2404.10518)
[MobileNetV4 - Universal Models for the Mobile Ecosystem](https://huggingface.co/papers/2404.10518)
This candle implementation uses pre-trained MobileNetV4 models from timm for inference.
The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.

View File

@ -1,6 +1,6 @@
# candle-mobileone
[MobileOne: An Improved One millisecond Mobile Backbone](https://arxiv.org/abs/2206.04040).
[MobileOne: An Improved One millisecond Mobile Backbone](https://huggingface.co/papers/2206.04040).
This candle implementation uses a pre-trained MobileOne network for inference. The
classification head has been trained on the ImageNet dataset and returns the

View File

@ -1,6 +1,6 @@
# candle-musicgen
Candle implementation of musicgen from [Simple and Controllable Music Generation](https://arxiv.org/pdf/2306.05284).
Candle implementation of musicgen from [Simple and Controllable Music Generation](https://huggingface.co/papers/2306.05284).
## Running an example

View File

@ -3,7 +3,7 @@
OLMo is a series of Open Language Models designed to enable the science of language models.
- **Project Page:** https://allenai.org/olmo
- **Papers:** [OLMo](https://arxiv.org/abs/2402.00838) [OLMo 2](https://arxiv.org/abs/2501.00656)
- **Papers:** [OLMo](https://huggingface.co/papers/2402.00838) [OLMo 2](https://huggingface.co/papers/2501.00656)
- **Technical blog post:** https://blog.allenai.org/olmo-open-language-model-87ccfc95f580
- **W&B Logs:** https://wandb.ai/ai2-llm/OLMo-1B/reports/OLMo-1B--Vmlldzo2NzY1Njk1
<!-- - **Press release:** TODO -->

View File

@ -2,7 +2,7 @@
This example demonstrates how to run [ONNX](https://github.com/onnx/onnx) based models in Candle.
It contains small variants of two models, [SqueezeNet](https://arxiv.org/pdf/1602.07360.pdf) (default) and [EfficientNet](https://arxiv.org/pdf/1905.11946.pdf).
It contains small variants of two models, [SqueezeNet](https://huggingface.co/papers/1602.07360) (default) and [EfficientNet](https://huggingface.co/papers/1905.11946).
You can run the examples with following commands:

View File

@ -51,7 +51,7 @@ cargo run --example quantized-t5 --release -- \
Note that a storm surge is what forecasters consider a hurricane's most dangerous part.
```
### [MADLAD-400](https://arxiv.org/abs/2309.04662)
### [MADLAD-400](https://huggingface.co/papers/2309.04662)
MADLAD-400 is a series of multilingual machine translation T5 models trained on 250 billion tokens covering over 450 languages using publicly available data. These models are competitive with significantly larger models.

View File

@ -1,6 +1,6 @@
# candle-repvgg
[RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697).
[RepVGG: Making VGG-style ConvNets Great Again](https://huggingface.co/papers/2101.03697).
This candle implementation uses a pre-trained RepVGG network for inference. The
classification head has been trained on the ImageNet dataset and returns the

View File

@ -1,6 +1,6 @@
# candle-resnet
A candle implementation of inference using a pre-trained [ResNet](https://arxiv.org/abs/1512.03385).
A candle implementation of inference using a pre-trained [ResNet](https://huggingface.co/papers/1512.03385).
This uses a classification head trained on the ImageNet dataset and returns the
probabilities for the top-5 classes.

View File

@ -7,7 +7,7 @@
Stable Diffusion 3 Medium is a text-to-image model based on Multimodal Diffusion Transformer (MMDiT) architecture.
- [huggingface repo](https://huggingface.co/stabilityai/stable-diffusion-3-medium)
- [research paper](https://arxiv.org/pdf/2403.03206)
- [research paper](https://huggingface.co/papers/2403.03206)
- [announcement blog post](https://stability.ai/news/stable-diffusion-3-medium)
Stable Diffusion 3.5 is a family of text-to-image models with latest improvements:

View File

@ -69,7 +69,7 @@ pub fn euler_sample(
}
// The "Resolution-dependent shifting of timestep schedules" recommended in the SD3 tech report paper
// https://arxiv.org/pdf/2403.03206
// https://huggingface.co/papers/2403.03206
// Following the implementation in ComfyUI:
// https://github.com/comfyanonymous/ComfyUI/blob/3c60ecd7a83da43d694e26a77ca6b93106891251/
// comfy/model_sampling.py#L181

View File

@ -1,6 +1,6 @@
# candle-starcoder2
Candle implementation of Star Coder 2 family of code generation model from [StarCoder 2 and The Stack v2: The Next Generation](https://arxiv.org/pdf/2402.19173).
Candle implementation of Star Coder 2 family of code generation model from [StarCoder 2 and The Stack v2: The Next Generation](https://huggingface.co/papers/2402.19173).
## Running an example

View File

@ -16,7 +16,7 @@ $ cargo run --example stella-en-v5 --release -- --query "What are safetensors?"
> Tensor[[1, 1024], f32]
```
Stella_en_1.5B_v5 is trained by [MRL](https://arxiv.org/abs/2205.13147) enabling multiple embedding dimensions.
Stella_en_1.5B_v5 is trained by [MRL](https://huggingface.co/papers/2205.13147) enabling multiple embedding dimensions.
The following reproduces the example in the [model card](https://huggingface.co/dunzhang/stella_en_1.5B_v5) for a retrieval task (s2p). The sample queries and docs are hardcoded in the example.

View File

@ -13,7 +13,7 @@ $ cargo run --example t5 --release -- --model-id "t5-small" --prompt "translate
Variants such as [flan-t5](https://huggingface.co/google/flan-t5-small), [flan-ul2](https://huggingface.co/google/flan-ul2) (with `--revision "refs/pr/25"`), and [Co-EdIT](https://huggingface.co/grammarly/coedit-large) are also supported.
## Translation with [MADLAD-400](https://arxiv.org/abs/2309.04662)
## Translation with [MADLAD-400](https://huggingface.co/papers/2309.04662)
MADLAD-400 is a series of multilingual machine translation T5 models trained on 250 billion tokens covering over 450 languages using publicly available data. These models are competitive with significantly larger models.

View File

@ -8,7 +8,7 @@ The candle implementation reproduces the same structure/files for models and
pipelines. Useful resources:
- [Official implementation](https://github.com/dome272/Wuerstchen).
- [Arxiv paper](https://arxiv.org/abs/2306.00637).
- [Arxiv paper](https://huggingface.co/papers/2306.00637).
- Blog post: [Introducing Würstchen: Fast Diffusion for Image Generation](https://huggingface.co/blog/wuerstchen).
## Getting the weights