candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	30a958e5dd	Quantized mixtral model (#1442 ) * Add the Mixtral model. * Add more of the mixtral layers. * Add the final layers for mixtral. * Sketch the expert selection. * Add some expert routing logic. * Hopefully finish the routing logic for mixtral. * Add the mixtral example. * Fix the weight filenames. * Bugfix. * Another fix. * Yet another fix + remove the unused pragma. * Shape fix. * Support for quantized mixtral. * Support mixtral in the quantized example. * Mlp or moe type. * Fix the expert field namings. * Refactor the mlp bit. * More MoE logic. * Add the MoE quantized logic. * Fix the experts length.	2023-12-15 19:16:06 -06:00
Laurent Mazare	16161145ae	Add the leo models to the quantized examples. (#1398 )	2023-12-03 12:30:41 +00:00
Lucas de Ávila Martins	5aa1a65dab	Add quantized Starling, fix open-chat prompt (#1393 ) * Add quantized Starling, fix open-chat prompt * Fix open-chat and starling prompts	2023-12-02 16:47:19 +00:00
Lucas de Ávila Martins	f49bf6a81d	Fix OpenChat 3.5 tokenizer (#1347 )	2023-11-19 18:48:04 +00:00
Lucas de Ávila Martins	992a788da1	Add OpenChat 3.5 to quantized examples (#1346 ) * Add OpenChat to quantized examples * Add chat prompt * Make the openchat example more in line with the other models. * Fix a typo. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2023-11-19 18:28:52 +00:00
Michael Leandersson	2341aa079e	Fix quantized zephyr chat prompt (#1314 ) (#1317 ) * Fix quantized zephyr chat prompt (#1314) * Avoid using a mutable variable. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-11-11 09:14:12 +01:00
Laurent Mazare	d4a45c936a	Quantized model small tweaks (#1290 ) * Support the shape op in ONNX. * Share the axis normalization bits. * Add some limited support for gather. * Unsqueeze. * Comparison with broadcasting. * Add Not + handle i32. * Tweaks for the quantized model.	2023-11-07 21:21:37 +01:00
DTJ11235	5a363dbc26	Adds check for 7b-zephyr and uses correct template (#1283 ) * Adds check for 7b-zephyr and uses correct template * Handle zephyr as mistral. * Disable the protoc bits of the CI. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-11-06 21:05:39 +01:00
Laurent Mazare	620c94d12e	Add support for Zephyr-7b in the quantized model. (#1124 )	2023-10-18 17:31:26 +01:00
Laurent Mazare	f6054e9d60	Fix the prompt for mistral when using instruct/interactive mode. (#1013 )	2023-10-01 06:44:30 +01:00
Laurent Mazare	328167ec04	Integrate TheBloke quantized mistral weights. (#1012 )	2023-09-30 22:39:42 +01:00
Laurent Mazare	cbd36157ac	Add a gif to the quantized readme. (#833 ) * Add a gif to the quantized readme. * gif update.	2023-09-13 08:43:52 +01:00
Laurent Mazare	e82fcf1c59	Add more example readmes. (#828 ) * Add more readmes. * Add a readme for dinov2. * Add some skeleton files for a couple more examples. * More whisper details.	2023-09-12 17:21:24 +01:00
Juarez Bochi	805bf9ffa7	Implement top_p / nucleus sampling (#819 ) * Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error	2023-09-12 18:10:16 +02:00
Laurent Mazare	bb23b90b1d	Add a small readme for the quantized example. (#823 )	2023-09-12 10:17:31 +01:00
Laurent Mazare	35f72514f5	Move more models to candle-transformers (#796 ) * Move dinov2. * Move efficientnet. * Move the quantized llama model. * Move segment-anything.	2023-09-10 10:20:18 +01:00
Laurent Mazare	7cef35c84d	Tweak some quantized args (#692 ) * Print the args + change the default temp/repeat penalty. * Minor formatting tweak.	2023-08-31 17:25:21 +01:00
Laurent Mazare	7509c98970	Interactive mode for the quantized model. (#690 )	2023-08-31 10:52:42 +01:00
Laurent Mazare	a1a5ab8b0a	Neon optimized vecdot (#666 ) * Q5k vecdot. * Add the q3k vecdot. * Q2k vecdot. * Move the quantized model to its own file.	2023-08-29 22:28:46 +01:00
Laurent Mazare	72ebb12bca	Remove some dead-code annotations. (#629 ) * Remove some dead-code annotations. * More dead code removal. * One more. * CI fix.	2023-08-27 18:52:33 +01:00
Laurent Mazare	6e485f2deb	Add some optional repeat penalty. (#623 ) * Add some optional repeat penalty. * Add the missing files.	2023-08-27 10:48:45 +01:00
Laurent Mazare	c093b03d51	Generic implementation of vecdot for q80. (#596 ) * Generic implementation of vecdot for q80. * Add support for code-llama 7b. * Support more code-llama.	2023-08-25 09:04:05 +01:00
Laurent Mazare	4ee1cf038a	Get the rms epsilon from GGUF. (#565 )	2023-08-23 11:40:20 +01:00
Laurent Mazare	0f4ff8a739	Fix the quantized example. (#564 )	2023-08-23 11:09:55 +01:00
cksac	89a00b56cc	add chat models in quantized example (#551 ) * add chat models in quantized example * cargo fmt	2023-08-23 11:05:33 +01:00
Laurent Mazare	508d34daf2	GGUF support in the quantized model. (#559 ) * GGUF support in the quantized model. * Get the GGUF support to work on llama.	2023-08-23 09:20:57 +01:00
Laurent Mazare	f9ecc84477	GQA support in the quantized model. (#555 ) * GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.	2023-08-22 19:41:10 +01:00
Laurent Mazare	44420d8ae1	Add some llama-v2 variants. (#545 )	2023-08-22 08:35:15 +01:00
Laurent Mazare	4300864ce9	Add some optional repeat penalty. (#535 )	2023-08-21 09:59:13 +01:00
Laurent Mazare	a1812f934f	Add a yolo-v3 example. (#528 ) * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.	2023-08-20 18:19:37 +01:00
Laurent Mazare	d73ca3d28e	Line up the llama.cpp implementation with the candle one. (#518 ) * Separate the prompt stats from the post-prompt ones in the quantized example. * Slightly nicer output printing. * Line up with the llama.cpp implementation.	2023-08-19 20:12:07 +01:00
Laurent Mazare	c78ce76501	Add a simple Module trait and implement it for the various nn layers (#500 ) * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.	2023-08-18 09:38:22 +01:00
Laurent Mazare	557b2c28dd	Q6K quantization (#495 ) * Print the detected arch options. * Add the q6k quantization. * Add a currently broken test. * Bugfix. * Bugfix. * Another bugfix. * Another bugfix + get the test to work.	2023-08-17 22:22:57 +01:00
Laurent Mazare	5f30c1e1e0	Add the whisper small model. (#490 )	2023-08-17 15:48:34 +01:00
Laurent Mazare	ad7c53953b	Add a verbose-prompt mode, similar to llama.cpp. (#489 )	2023-08-17 15:26:44 +01:00
Laurent Mazare	d32e8199cd	Layer norm tweaks (#482 ) * Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.	2023-08-17 10:07:13 +01:00
Laurent Mazare	d99cac3ec3	Move the avx specific bits to a separate file. (#481 )	2023-08-17 09:01:06 +01:00

37 Commits