candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 02:16:37 +00:00

Author	SHA1	Message	Date
Lukas Kreussel	109e95b189	Basic `qmatmul` parallelization (#492 ) * Basic `par_iter` parallelization * Pass errors up * Disable `avx` for x86 macs	2023-08-18 09:45:37 +01:00
Laurent Mazare	c78ce76501	Add a simple Module trait and implement it for the various nn layers (#500 ) * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.	2023-08-18 09:38:22 +01:00
Laurent Mazare	13401df4d1	Add an abstract type for RmsNorm. (#499 )	2023-08-18 08:52:14 +01:00
Laurent Mazare	a22b1bed7b	Tensor -> QTensor conversion (#496 ) * Sketch some qmatmul test. * Add the quantization function. * More testing. * Make the test smaller and faster. * Add some shape checking.	2023-08-18 08:19:20 +01:00
Laurent Mazare	26fd37b348	Use the main branch of the HF repo where possible. (#498 ) * Use the main branch of the HF repo where possible. * And add the large model.	2023-08-18 08:18:30 +01:00
Franco Lucchini	f056dcab21	Add medium model (#497 )	2023-08-18 08:08:59 +01:00
Laurent Mazare	557b2c28dd	Q6K quantization (#495 ) * Print the detected arch options. * Add the q6k quantization. * Add a currently broken test. * Bugfix. * Bugfix. * Another bugfix. * Another bugfix + get the test to work.	2023-08-17 22:22:57 +01:00
Laurent Mazare	fc81af1712	AVX version of the q6k vec-dot. (#493 ) * AVX version of the q6k vec-dot. * Use the avx sum.	2023-08-17 20:13:18 +01:00
Laurent Mazare	3164cd24fa	Replicate the sot-token logic from the Python implementation more acc… (#491 ) * Replicate the sot-token logic from the Python implementation more accurately. * Add a flag to control the timestamp mode.	2023-08-17 16:59:36 +01:00
Laurent Mazare	5f30c1e1e0	Add the whisper small model. (#490 )	2023-08-17 15:48:34 +01:00
Laurent Mazare	ad7c53953b	Add a verbose-prompt mode, similar to llama.cpp. (#489 )	2023-08-17 15:26:44 +01:00
Laurent Mazare	5d99026fd2	F16 support for stable diffusion (#488 ) * F16 support for stable diffusion. * Keep the attention bits in F32. * Keep more of the attention bits in F32. * More mixed precision support.	2023-08-17 13:48:56 +01:00
Laurent Mazare	c3176f0dfb	Flash-attention support in stable diffusion (#487 ) * Add flash-attention for the stable-diffusion example. * Change the dtype. * Silly fix. * Another fix. * Revert the dtype back to the query dtype after apply flash-attn.	2023-08-17 12:16:40 +01:00
Laurent Mazare	03be33eea4	Relax the requirements on CustomOp. (#486 ) * Relax the requirements on CustomOp. * Simplify the custom-ops when no backward is required.	2023-08-17 11:12:05 +01:00
Laurent Mazare	d32e8199cd	Layer norm tweaks (#482 ) * Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.	2023-08-17 10:07:13 +01:00
Laurent Mazare	d99cac3ec3	Move the avx specific bits to a separate file. (#481 )	2023-08-17 09:01:06 +01:00
Laurent Mazare	f708efb19c	Add some accelerate details on the readme. (#480 )	2023-08-17 08:26:02 +01:00
Laurent Mazare	306c8eee7a	AVX version of the vecdot for q4_0. (#474 ) * AVX version of the vecdot for q4_0. * Tweak the avx bits. * Add a qmatmul benchmark. * Fix the quantized test.	2023-08-17 07:03:32 +01:00
Laurent Mazare	098909de40	Add vecdot for q6k-q8k. (#476 ) * Add vecdot for q6k-q8k. * Add some testing for q8k. * Use QMatMul for the output layer.	2023-08-16 20:59:40 +01:00
Laurent Mazare	3bedba1fce	Use a zipped iterator. (#475 ) * Use a zipped iterator. * Add to/from float for q8k.	2023-08-16 20:15:11 +01:00
Laurent Mazare	c5f45887dc	Add some tracing to the quantized example. (#473 )	2023-08-16 18:49:08 +01:00
Nicolas Patry	fa4590d7fd	Merge pull request #469 from huggingface/fix_llama_v1 Fixing llamav1	2023-08-16 17:47:40 +02:00
Laurent Mazare	2e206e269d	Add the model argument. (#471 )	2023-08-16 16:41:06 +01:00
Laurent Mazare	575e88a999	Add a quantized test that use negative values. (#470 ) * Add a quantized test that use negative values. * Add a default tokenizer.	2023-08-16 16:32:58 +01:00
Laurent Mazare	a9101700b6	Add a kv-cache to the quantized llama example. (#466 ) * Add a kv-cache to the quantized llama example. * Also print the prompt. * Bugfix in q6k dequantizing. * Another bugfix.	2023-08-16 14:28:42 +01:00
Nicolas Patry	102fa4c2e3	Fixing llamav1	2023-08-16 14:53:29 +02:00
Laurent Mazare	3071134788	Get the ggml based llama to generate some text. (#464 ) * Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.	2023-08-16 12:41:07 +01:00
Nicolas Patry	fec87e86f5	Merge pull request #465 from huggingface/llama_hub_config Using the real config from the hub when available.	2023-08-16 13:28:59 +02:00
Nicolas Patry	33c882ea74	Clippy.	2023-08-16 10:41:00 +02:00
Nicolas Patry	76804730c6	Using the real config from the hub when available.	2023-08-16 10:36:01 +02:00
Laurent Mazare	965597a873	Add a test for qmatmul. (#459 )	2023-08-16 06:36:27 +01:00
Laurent Mazare	ca449f9ee1	Add quantized tensors. (#458 ) * Add quantized tensors. * Implement the debug trait for QTensor. * Add the QMatMul custom op.	2023-08-15 22:45:53 +01:00
Laurent Mazare	b8263aa15c	Quantized support for f16 and f32 (#457 ) * Add f32 as a quantized type. * Add f16 as a quantized type too.	2023-08-15 21:09:37 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00
Laurent Mazare	08effe3762	More quantization support (#455 ) * Properly initialize wdata. * Simplify the matmul bits. * Add from_float for q4_0. * Fix a couple bugs. * Get the test to work. * Get clippy to be happy.	2023-08-15 18:58:04 +01:00
Laurent Mazare	8ad4a21ffc	Add a basic optimizer example. (#454 )	2023-08-15 17:19:18 +01:00
Laurent Mazare	5e49922be2	Basic quantization support (#453 ) * Add a vecdot trait. * Start implementing mul_mat. * Add to the mul mat implementation. * Add q8_0 quantization. * Implement the GgmlType trait for all types. * Add the missing block. * Add a TODO.	2023-08-15 15:53:19 +01:00
Chengxu Yang	ebcfd96d94	add c++17 flags (#452 )	2023-08-15 15:29:34 +01:00
Laurent Mazare	5b1690fffa	Tweak the llama example. (#450 )	2023-08-15 12:18:20 +01:00
Guoqing Bao	3cc87058b7	Support local weights & dynamic outputs (#447 ) * Support local weights & dynamic outputs * Revise as suggested * Cargo code format	2023-08-15 11:51:57 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	a094dc503d	Add a cuda kernel for avg-pool2d. (#440 ) * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.	2023-08-14 12:32:05 +01:00
Laurent Mazare	34f4b3187e	Add a naive conv2d cuda kernel. (#438 ) * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.	2023-08-14 10:34:42 +01:00
Laurent Mazare	eab54e4490	Fix the tests for mkl. (#437 )	2023-08-14 08:09:27 +01:00
Lukas Kreussel	9e7e6e0288	Add dequantization for ggmls `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` (#407 ) * Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` * expose `tensor_from_ggml` for external usage * bugfixes & example	2023-08-13 23:22:57 +01:00
Laurent Mazare	8bd2b22b33	Optimize the logit computations in the whisper example. (#434 )	2023-08-13 22:00:13 +01:00
Laurent Mazare	d379a76a9e	Add a softmax bench. (#433 ) * Add a softmax bench. * Add the vectorized sum reduce.	2023-08-13 20:09:18 +01:00

1 2 3 4 5 ...

888 Commits