candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 10:26:33 +00:00

Author	SHA1	Message	Date
Laurent Mazare	08c049def3	Improve the handling of matmul with squeezed layouts. (#1998 ) * Improve the handling of matmul with squeezed layouts. * Fix for the cuda backend. * Revert the temporary fix.	2024-04-02 23:17:05 +02:00
Santiago Medina	d17b2cdad9	Match Moondream's latest release (#1997 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2	2024-04-02 21:37:09 +02:00
Jorge António	fb918a23c8	first commit (#1994 )	2024-04-02 16:31:05 +02:00
Laurent Mazare	b23436bf90	Stable diffusion fix. (#1993 ) * Stable diffusion fix. * And add a comment.	2024-04-02 14:36:28 +02:00
Laurent Mazare	be9c200cbb	Expose the t5 config fields + allow t5-large. (#1987 )	2024-04-01 20:58:34 +02:00
Santiago Medina	ea0d8d3753	Quantized moondream implementation and BOS token (#1980 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Pass bos token at the beginning of tensor. * Quantize moondream. * Forward with image bos token. * Clippy. * Use q4_0 quantization. * Add pointers for sequence and tokens; Remove seq_len conditional	2024-04-01 19:37:54 +02:00
Laurent Mazare	f9954b73ba	Add options to use local files + specify a custom repo or branch. (#1973 )	2024-03-31 09:32:50 +02:00
Santiago Medina	92f81d2fcb	Add Moondream transformer implementation and example (#1970 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward	2024-03-31 08:54:56 +02:00
Laurent Mazare	b190fd8592	Remove some unnecessary calls to contiguous. (#1968 ) * Remove some unnecessary calls to contiguous. * Slightly improved kv cache concatenation.	2024-03-30 13:22:00 +01:00
Laurent Mazare	708e422456	Qwen MoE model. (#1960 ) * Qwen MoE model. * Add the MoE model to the example. * Fix the scaling. * Readme updates. * Readme tweaks.	2024-03-28 23:10:57 +01:00
Laurent Mazare	cdc8b57b5c	Fix clippy lints + minor cleanups. (#1957 ) * Fix clippy lints + minor cleanups. * fmt. * Derive clone.	2024-03-28 14:17:46 +01:00
Tigran Zhampeissov	b0340d72ec	CLIP model implementation with example (#1950 ) * CLIP model implementation with example * CLIP Implementation fixes, batch images * CLIP model remove images from git * CLIP model remove unnecessary use of batch_indices	2024-03-28 13:44:12 +01:00
Jorge António	ada5d7c096	add send and sync trait bounds for scheduler config in stable diffusion models (#1952 ) * first commit * add Sync deriving * static * remove static	2024-03-28 10:03:00 +01:00
Jorge António	75b6d4b0da	add config for mamba 2.8b model parameter (#1946 ) * first commit * Make the mamba config public. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-03-27 07:47:23 +01:00
Laurent Mazare	66f0a4eeea	Another fix for squeezing. (#1943 )	2024-03-26 17:05:26 +01:00
Laurent Mazare	4523ecfb2a	Faster repeat penalty (#1940 ) * Avoid the attention mask where possible. * Faster repeat penalty.	2024-03-26 11:31:20 +01:00
Laurent Mazare	196765e995	Use the new rope kernel in mistral. (#1937 ) * Use the new rope kernel in mistral. * Compute the cos and sin with full precision. * Bugfix.	2024-03-25 23:26:05 +01:00
Laurent Mazare	d3a8d291d5	Avoid the attention mask where possible. (#1933 )	2024-03-25 15:31:04 +01:00
Laurent Mazare	1b98f84a2b	Fast kernels for rotary embeddings. (#1928 ) * Fast kernels for rotary embeddings. * Add a test for the fast CPU kernel. * Rope cuda bindings. * Cuda kernel. * Metal kernel (part 1). * Cuda kernels. * Finish the metal kernel. * Use the new kernels in the quantized example. * Fix warning.	2024-03-24 22:48:52 +01:00
laurent	cf7d7fcf2f	Also avoid the mask in the llama example.	2024-03-24 19:04:32 +01:00
laurent	8c0db87992	Avoid using the attn mask when not necessary.	2024-03-24 18:55:56 +01:00
Laurent Mazare	e2b4829531	Support more mistral models. (#1927 ) * Support more mistral models. * Use the appropriate rope parameter.	2024-03-24 08:04:04 +01:00
laurent	5e70821dd0	Allow for arbitrary temperature modifications.	2024-03-23 15:47:39 +01:00
Laurent Mazare	a62a97340c	Add topk sampling. (#1923 )	2024-03-23 15:26:09 +01:00
Laurent Mazare	6f877592a7	Avoid broadcasting on the batch dimension for the attention mask. (#1920 )	2024-03-23 13:08:53 +01:00
Laurent Mazare	32f567bac4	Fix loading the gguf files. (#1913 )	2024-03-22 10:28:38 +01:00
Laurent Mazare	c07e4057ab	Fix for the llama model. (#1906 )	2024-03-21 19:36:10 +01:00
Laurent Mazare	c0bdd9c7a6	Use the fast RmsNorm in the quantized model. (#1904 )	2024-03-21 18:49:35 +01:00
Laurent Mazare	455c42aa72	Avoid copying the data on squeeze and unsqueeze. (#1884 ) * Avoid copying the data on squeeze and unsqueeze. * Fix the quantized llama example. * Unrelated fix for the quantized stable-lm example on cuda. * Fix for mamba on cuda (unrelated to the PR).	2024-03-20 13:04:36 +01:00
Jani Monoses	90fc82211f	Use a common with_tracing::RmsNorm in a few models. (#1871 ) * Add RmsNorm with tracing. * Use with_tracing::RmsNorm in some models.	2024-03-18 21:40:06 +01:00
Laurent Mazare	ff03fd3fb3	Expose some helper functions to create quantized models. (#1837 )	2024-03-12 11:30:24 +01:00
Laurent Mazare	0c5eecbc0f	Add some tracing to metavoice. (#1826 )	2024-03-09 12:24:11 +01:00
Laurent Mazare	dd00482ea3	Quantized version of the metavoice model. (#1824 ) * Quantized version of the metavoice model. * Integrate the quantized version of metavoice.	2024-03-09 11:06:04 +01:00
Laurent Mazare	8a99cf7dd2	Add a flag to select the dtype used in metavoice. (#1805 )	2024-03-05 12:16:00 +01:00
Laurent Mazare	8cc0a183ba	Speaker embeddings computation for metavoice. (#1800 ) * Speaker embeddings computation for metavoice. * Compute the speaker embeddings.	2024-03-04 14:13:01 +01:00
Jiayu Liu	924ccae30c	Add an initial Segformer implementation (#1617 ) * add segformer * Make the id2label field optional. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-03-03 16:01:46 +01:00
Laurent Mazare	60dc72b96b	More metavoice tweaks. (#1796 )	2024-03-03 15:05:25 +01:00
Laurent Mazare	4fff5b51f5	Metavoice - first cut (#1717 ) * Add the metavoice transformer. * Sketch the speaker-encoder module. * Adding to the metavoice model. * Start adding the metavoice example. * Get some logits out. * Load the second stage model. * Get the second step to run. * Tweak the example. * Add encodec tilting. * Glue the different bits together. * Fix a shape issue. * Use a constant. * BPE tokenization. * Add a warning.	2024-03-02 18:50:01 +01:00
Laurent Mazare	314630638d	Rustfmt fix. (#1788 )	2024-03-02 10:35:07 +01:00
Frkri	3e3def4134	Update StableLM config (#1787 )	2024-03-02 09:56:57 +01:00
Jani Monoses	979deaca07	EfficientVit (MSRA) model (#1783 ) * Add EfficientVit (Microsoft Research Asia) model. * Mention models in README	2024-03-01 08:53:52 +01:00
Jack Shih	b485e4b6ee	add models of rwkv v6 and quantized rwkv v6 (#1781 ) * add models of rwkv v6 and quantized rwkv v6 * fix ci clippy fail	2024-03-01 08:37:56 +01:00
Laurent Mazare	4fd00b8900	Add the StarCoder2 model. (#1779 ) * Add the StarCoder2 model. * Add the example code and get things to work. * And also tweak the readme.	2024-02-28 21:02:41 +01:00
Laurent Mazare	d0aca6c3c6	Encodec encoding demo. (#1775 )	2024-02-28 06:49:03 +01:00
Laurent Mazare	15e8644149	Apply dilations in the encodec model. (#1772 ) * Apply dilations in the encodec model. * Add some encoding bits.	2024-02-27 23:26:35 +01:00
Laurent Mazare	0c49e95dfb	Encodec model. (#1771 ) * Encodec model. * Fixes. * Add the padding functions. * Get the LSTM bit to work. * Get the encodec model to generate some tokens (decoder only for now). * Minor tweak. * Minor tweak.	2024-02-27 22:59:40 +01:00
Laurent Mazare	205767f9de	Avoid tensor copying in the quantized example. (#1770 )	2024-02-27 20:32:30 +01:00
Jack Shih	918136ba46	add quantized rwkv v5 model (#1743 ) * and quantized rwkv v5 model * Integrate the quantized rwkv model in the initial example. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-02-25 21:43:40 +01:00
Laurent Mazare	1a6043af51	Tweak the VarMap set type. (#1758 )	2024-02-25 20:50:08 +01:00
Laurent Mazare	28057781aa	Make the cache for the llama model explicit too. (#1745 )	2024-02-22 12:04:33 +01:00

... 2 3 4 5 6 ...

409 Commits