candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Daniel Clough	cd889c0f8a	add config_amazon_mistral_lite (#1493 ) Co-authored-by: Ubuntu <danielclough@users.noreply.github.com>	2023-12-28 19:59:58 +01:00
Laurent Mazare	d35f0a1376	Bump the crate version to 0.3.3. (#1490 )	2023-12-28 13:38:30 +01:00
drbh	f6408a3779	feat: add clear_kv_cache to mistral and qmistral models (#1464 )	2023-12-21 21:19:19 +01:00
Daniel Clough	563a79afa1	make fn name generic (#1459 ) Co-authored-by: Ubuntu <danielclough@users.noreply.github.com>	2023-12-21 02:16:31 +01:00
Daniel Clough	8ede5f4210	add fn config_chat_ml (#1458 ) * add fn config_chat_ml * Add a link to the original config. --------- Co-authored-by: Ubuntu <danielclough@users.noreply.github.com> Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-12-20 21:03:24 +01:00
Nicolas Patry	9fc210fae8	Merge pull request #1318 from huggingface/metal4 Starting to fix some tests.	2023-12-20 15:37:31 +01:00
Laurent Mazare	94817dac56	Bump the crate version to 0.3.2. (#1452 )	2023-12-17 05:34:53 -06:00
Laurent Mazare	1e86717bf2	Fix a couple typos (#1451 ) * Mixtral quantized instruct. * Fix a couple typos.	2023-12-17 05:20:05 -06:00
Laurent Mazare	30a958e5dd	Quantized mixtral model (#1442 ) * Add the Mixtral model. * Add more of the mixtral layers. * Add the final layers for mixtral. * Sketch the expert selection. * Add some expert routing logic. * Hopefully finish the routing logic for mixtral. * Add the mixtral example. * Fix the weight filenames. * Bugfix. * Another fix. * Yet another fix + remove the unused pragma. * Shape fix. * Support for quantized mixtral. * Support mixtral in the quantized example. * Mlp or moe type. * Fix the expert field namings. * Refactor the mlp bit. * More MoE logic. * Add the MoE quantized logic. * Fix the experts length.	2023-12-15 19:16:06 -06:00
Laurent Mazare	614842b311	Add the Mixtral model. (#1437 ) * Add the Mixtral model. * Add more of the mixtral layers. * Add the final layers for mixtral. * Sketch the expert selection. * Add some expert routing logic. * Hopefully finish the routing logic for mixtral. * Add the mixtral example. * Fix the weight filenames. * Bugfix. * Another fix. * Yet another fix + remove the unused pragma. * Shape fix. * Add a readme.	2023-12-15 14:19:56 -06:00
Nicolas Patry	916a8c5464	Revert candle-transformers.	2023-12-15 11:15:21 +01:00
Nicolas Patry	ece4c69a68	Fixing softmax.	2023-12-15 01:35:08 +01:00
Laurent Mazare	5e33c85c8f	Quantized version for phi-v2. (#1430 ) * Quantized version for phi-v2. * More quantized support.	2023-12-13 21:16:34 -06:00
Laurent Mazare	2b3a018be7	Support for phi-2. (#1429 ) * Support for phi-2. * Use the v2 naming scheme.	2023-12-13 20:59:29 -06:00
Nicolas Patry	931432ed55	Fixing tests + matmul from MFA	2023-12-13 16:58:36 +01:00
Nicolas Patry	a9d0657432	Better version ?	2023-12-13 12:09:20 +01:00
nicolas	87dc559817	Lots of updates including some stack of command buffers.	2023-12-12 17:41:56 +01:00
Juarez Bochi	9bd94c1ffa	Speed up bert with approx gelu (#1410 )	2023-12-06 17:46:37 +01:00
Edwin Cheng	37bf1ed012	Stable Diffusion Turbo Support (#1395 ) * Add support for SD Turbo * Set Leading as default in euler_ancestral discrete * Use the appropriate default values for n_steps and guidance_scale. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-12-03 08:37:10 +01:00
Edwin Cheng	dd40edfe73	Add Euler Ancestral Discrete Scheduler (#1390 ) * Add Euler Ancestral Discrete Scheduler * Fix a bug of init_noise_sigma generation * minor fixes * use partition_point instead of custom bsearch * Fix some clippy lints. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-12-02 19:59:23 +00:00
Laurent Mazare	7c3cfd1086	Use the llama weight names for the Yi example. (#1381 )	2023-11-27 20:42:52 +00:00
Odunayo	762e996ce6	Distibert (#1366 ) * add bce with logit loss * add bce with logit loss * remove imports * fix tiny bug * add test documentation and refactor function * fix test cases and formatting * distilbet files * Apply various cleanups. * More cleanups. * More polish. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-11-24 15:09:14 +00:00
MilkFather	ca19a9af62	Fix linspace implementation (#1358 ) * Fix linspace implementation `steps` should be strictly greater than 1 to make it consistent with the context. * Handle steps == 0 and steps == 1. * Fix rustfmt. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-11-23 07:35:13 +00:00
Laurent Mazare	9ab3f9729f	Use the whisper-v3 tokenizer now that it has been added. (#1337 ) * Use the whisper-v3 tokenizer now that it has been added. * Use the appropriate nospeech token.	2023-11-16 22:10:31 +00:00
drbh	a1f41ab37b	feat: adds reset_kv_cache (#1335 )	2023-11-16 21:17:42 +00:00
drbh	92a05b51cf	fix: address clippy 0.1.74 issues (#1336 ) - clippy::needless-borrows-for-generic-args - clippy::reserve-after-initialization	2023-11-16 21:15:22 +00:00
Laurent Mazare	a209ce8ceb	Update for 0.3.1. (#1324 )	2023-11-11 18:48:52 +00:00
Laurent Mazare	a007f8fdb4	Add the Yi-6b and Yi-34b models. (#1320 ) * Add the Yi-6b model. * Add the 34b model. * Add the yi example. * Fix the weight file names.	2023-11-11 12:00:48 +01:00
Andy Braga	1b12142a02	Add min to buckets in relative_position_bucket (#1312 )	2023-11-10 11:57:25 +01:00
Juarez Bochi	18d30005c5	Add support to UL2 model family (#1300 ) * Add support to UL2 model family * Update docs with UL2 * Create ActivationWithOptionalGating to avoid polluting activations * Also refactor quantized t5 * Remove useless conversion * Revert Activation::NewGelu name change * Remove useless return * Apply rustfmt and clippy recommendations * Reuse t5::ActivationWithOptionalGating in quantized version * (cosmetic change) use a match rather than ifs + avoid early returns. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-11-09 18:55:09 +01:00
Ogundepo Odunayo	6958384327	Add support for TrOCR Model (#1303 ) * add bce with logit loss * add bce with logit loss * remove imports * fix tiny bug * add test documentation and refactor function * fix test cases and formatting * add trocr model * fix formatting * commit the actual model lol * more formatting * remove tokenizer config	2023-11-09 18:49:17 +01:00
Juarez Bochi	f772213e84	Fix bug introduced in madlad PR (#1298 )	2023-11-08 17:55:46 +01:00
Laurent Mazare	2d28497197	Preliminary support for whisper v3. (#1294 ) * Preliminary support for whisper v3. * Add the missing files.	2023-11-08 06:42:52 +01:00
Juarez Bochi	508f811b93	Add support for MADLAD400 (#1285 ) * Add support for madlad * Add support for quantized MADLAD	2023-11-07 05:35:37 +01:00
Laurent Mazare	6975c65112	Share the layer-norm implementation. (#1248 )	2023-11-03 06:30:05 +01:00
Laurent Mazare	6c990a33ea	Remove the unused pragma for marian. (#1236 )	2023-11-01 20:04:52 +00:00
Laurent Mazare	1704f1b3ae	Consolidate the with-tracing usage. (#1234 )	2023-11-01 18:21:36 +00:00
Laurent Mazare	693fad511c	Preliminary support for ssd1b. (#1233 )	2023-11-01 14:37:52 +00:00
Laurent Mazare	c12ad45562	Add a KV cache to marian decoding. (#1226 )	2023-10-31 08:47:44 +00:00
Laurent Mazare	392a00a147	Add support for the marian base model. (#1221 )	2023-10-30 19:20:36 +00:00
Laurent Mazare	4c967b9184	Use the hub files for the marian example. (#1220 ) * Use the hub files for the marian example. * Use the secondary decoder. * Add a readme. * More readme.	2023-10-30 17:29:36 +00:00
Laurent Mazare	969960847a	Bugfixes for marian-mt. (#1219 ) * Bugfixes for marian-mt. * Apply the final decoding head. * More fixes.	2023-10-30 11:44:19 +00:00
Laurent Mazare	7bbde55c61	Marian MT model (#1210 ) * Skeleton files for the marian MT model. * Marian initialization. * Implement the attention forward method. * Forward pass for the encoder side. * Expose the encoder and decoder. * Start plugging the decoder. * Forward pass for the decoder layer. * Set up the marian example. * Add some missing backtraces. * Bugfix.	2023-10-29 15:12:22 +00:00
Laurent Mazare	55bc3382cf	Allow for different behavior between training and eval (#1213 ) * Forward with training. * Do not use dropout on vgg evaluation.	2023-10-29 07:53:09 +01:00
drbh	dece37c6f4	feat: implement VGG13, VGG16 and VGG19 (#1211 ) * feat: implement VGG13, VGG16 and VGG19 * Cosmetic fixes. * More cosmetic tweaks + avoid re-loading the weights on each final layer. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-10-29 06:10:23 +00:00
Laurent Mazare	012ae0090e	Infer the config for llama2-c. (#1208 )	2023-10-28 19:00:39 +01:00
Laurent Mazare	95a857cf57	Move the llama2-c model in transformers. (#1205 )	2023-10-28 16:51:19 +01:00
Laurent Mazare	612f5b8156	Make more models cloneable. (#1203 )	2023-10-28 07:43:08 +01:00
Laurent Mazare	c8face3f95	Add the relu2 and relu6 activations. (#1201 )	2023-10-27 20:51:16 +01:00
Laurent Mazare	85bea43e5b	Make the whisper model cloneable (#1200 ) * Add a quantized variant of llama2.c * Clippy fixes. * Make the whisper model cloneable.	2023-10-27 16:59:19 +01:00

... 3 4 5 6 7 ...

378 Commits