Quantized version of flux. (#2500)

* Quantized version of flux.

* More generic sampling.

* Hook the quantized model.

* Use the newly minted gguf file.

* Fix for the quantized model.

* Default to avoid the faster cuda kernels.
This commit is contained in:
Laurent Mazare
2024-09-26 10:23:43 +02:00
committed by GitHub
parent d01207dbf3
commit 10d47183c0
6 changed files with 555 additions and 26 deletions

View File

@ -92,8 +92,8 @@ pub fn unpack(xs: &Tensor, height: usize, width: usize) -> Result<Tensor> {
}
#[allow(clippy::too_many_arguments)]
pub fn denoise(
model: &super::model::Flux,
pub fn denoise<M: super::WithForward>(
model: &M,
img: &Tensor,
img_ids: &Tensor,
txt: &Tensor,