candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-20 12:06:35 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3bedba1fce	Use a zipped iterator. (#475 ) * Use a zipped iterator. * Add to/from float for q8k.	2023-08-16 20:15:11 +01:00
Laurent Mazare	a9101700b6	Add a kv-cache to the quantized llama example. (#466 ) * Add a kv-cache to the quantized llama example. * Also print the prompt. * Bugfix in q6k dequantizing. * Another bugfix.	2023-08-16 14:28:42 +01:00
Laurent Mazare	3071134788	Get the ggml based llama to generate some text. (#464 ) * Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.	2023-08-16 12:41:07 +01:00
Laurent Mazare	ca449f9ee1	Add quantized tensors. (#458 ) * Add quantized tensors. * Implement the debug trait for QTensor. * Add the QMatMul custom op.	2023-08-15 22:45:53 +01:00
Laurent Mazare	b8263aa15c	Quantized support for f16 and f32 (#457 ) * Add f32 as a quantized type. * Add f16 as a quantized type too.	2023-08-15 21:09:37 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00
Laurent Mazare	08effe3762	More quantization support (#455 ) * Properly initialize wdata. * Simplify the matmul bits. * Add from_float for q4_0. * Fix a couple bugs. * Get the test to work. * Get clippy to be happy.	2023-08-15 18:58:04 +01:00
Laurent Mazare	5e49922be2	Basic quantization support (#453 ) * Add a vecdot trait. * Start implementing mul_mat. * Add to the mul mat implementation. * Add q8_0 quantization. * Implement the GgmlType trait for all types. * Add the missing block. * Add a TODO.	2023-08-15 15:53:19 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	a094dc503d	Add a cuda kernel for avg-pool2d. (#440 ) * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.	2023-08-14 12:32:05 +01:00
Laurent Mazare	34f4b3187e	Add a naive conv2d cuda kernel. (#438 ) * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.	2023-08-14 10:34:42 +01:00
Lukas Kreussel	9e7e6e0288	Add dequantization for ggmls `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` (#407 ) * Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` * expose `tensor_from_ggml` for external usage * bugfixes & example	2023-08-13 23:22:57 +01:00
Laurent Mazare	d379a76a9e	Add a softmax bench. (#433 ) * Add a softmax bench. * Add the vectorized sum reduce.	2023-08-13 20:09:18 +01:00
Laurent Mazare	9af438ac1b	Track the conv2d operations in stable-diffusion. (#431 ) * Track the conv2d operations in stable-diffusion. * Add more tracing to stable-diffusion. * Also trace the resnet bits. * Trace the attention blocks. * Also trace the attention inner part. * Small tweak.	2023-08-13 15:58:26 +01:00
Laurent Mazare	9aca398a4f	More accelerate optimizations (#427 ) * Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.	2023-08-13 12:53:34 +01:00
Yumin Wu	16b89f5b83	fix: can directly save the loaded weights (#421 )	2023-08-12 16:33:29 +01:00
Laurent Mazare	e12372021b	Expose the tensor write-bytes function. (#412 )	2023-08-11 17:13:42 +01:00
Laurent Mazare	662db45fc3	Use zero padding in conv1d and conv2d (same as pytorch). (#408 )	2023-08-11 14:53:05 +01:00
Laurent Mazare	e29c7809ec	Parallelise the CPU kernels for the conv ops. (#401 ) * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.	2023-08-11 05:51:58 +01:00
Laurent Mazare	a325c1aa50	Upsample test + bugfix. (#399 )	2023-08-10 21:02:35 +02:00
Laurent Mazare	94eff56aee	Optimize the cpu conv2d kernel (#396 ) * Conv2d simd optimization. * Fix the contiguous copying. * Small tweak.	2023-08-10 17:40:09 +01:00
Laurent Mazare	c8039579a5	Conv1d optimize (#392 ) * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.	2023-08-10 15:23:52 +01:00
Laurent Mazare	f3fe730a30	Npy tweaks & error with path (#384 ) * Simplify the npy writing. * Wrap the file path so as to provide better errors.	2023-08-10 06:21:58 +01:00
Laurent Mazare	c7f92f985e	Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. (#383 )	2023-08-10 05:48:19 +01:00
Lei	3bbc08a8df	Fix randn cpu (#382 ) * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use	2023-08-10 05:33:44 +01:00
Ciarán Curley	25ec2d9f6b	fix: remove incorrect unwrap (#379 )	2023-08-09 21:45:24 +01:00
Laurent Mazare	fcfdcbd337	Add a conv1d benchmark based on the whisper sizes. (#377 ) * Add a conv1d benchmark based on the whisper sizes. * Enforce the batch-dim in conv1d.	2023-08-09 20:27:03 +01:00
LeeeSe	a5c5a893aa	add max_pool2d (#371 ) Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>	2023-08-09 18:05:26 +01:00
Laurent Mazare	1892bd139c	Extract the strides in the conv ops. (#370 )	2023-08-09 17:57:05 +01:00
Laurent Mazare	cd225bd3b1	More testing for avg-pool2d. (#366 ) * More testing for avg-pool2d. * Another fix. * Add a max-pool test with non-divisible kernel sizes.	2023-08-09 16:12:23 +01:00
Nicolas Patry	dece0b8a76	Merge pull request #263 from huggingface/book_3 Book 3 (advanced loading + hub)	2023-08-09 16:50:11 +02:00
Laurent Mazare	b80348d22f	Bugfix for avg-pool + add some test. (#365 )	2023-08-09 15:44:16 +01:00
Laurent Mazare	dbc6f281c9	Conv1d test with padding. (#356 )	2023-08-09 05:45:38 +01:00
Laurent Mazare	cf965ecaa8	Simplify the conv1d and conv2d code. (#352 )	2023-08-08 22:10:59 +01:00
Laurent Mazare	b9864e1357	Fix size-in-bytes for u8. (#351 )	2023-08-08 21:15:18 +01:00
Laurent Mazare	608b2358c6	Add some conv1d test + bugfix using padding. (#349 )	2023-08-08 20:50:20 +01:00
Laurent Mazare	1e6dbeac01	Add some conv2d tests. (#347 ) * Add some conv2d tests. * Add a simpler conv2d test. * More conv2d testing + bugfix. * Add a todo.	2023-08-08 19:02:42 +01:00
Laurent Mazare	13ce68ff9b	Bugfix for conv2d. (#343 )	2023-08-08 15:20:00 +01:00
Laurent Mazare	ab35684326	Naive implementation for conv2d. (#341 )	2023-08-08 06:34:36 +01:00
Laurent Mazare	b5bb5e056d	Add more conv2d support. (#340 ) * Add more conv2d support. * Conv2d cpu work. * Conv2d output shape.	2023-08-08 06:04:32 +01:00
Laurent Mazare	d0d7010682	CPU implementation for upsample-nearest2d. (#339 )	2023-08-07 20:07:10 +01:00
Laurent Mazare	fc265d9dcf	Some CLIP fixes for stable diffusion. (#338 ) * Some CLIP fixes for stable diffusion. * Add the avg-pool2d operation on cpu.	2023-08-07 18:31:45 +01:00
Laurent Mazare	2345b8ce3f	Skeleton for the avg-pool2d and upsample-nearest2d ops. (#337 ) * Skeleton for the avg-pool2d and upsample-nearest2d ops. * Preliminary conv2d support.	2023-08-07 16:15:38 +01:00
Laurent Mazare	f53a333ea9	Simple pad support. (#336 ) * Simple pad support. * Fix the tensor indexing when padding.	2023-08-07 15:24:56 +01:00
Laurent Mazare	2c9f605976	Add rand-like/randn-like. (#333 )	2023-08-06 21:51:08 +01:00
Laurent Mazare	166bfd5847	Add the recip op + use it in stable-diffusion. (#331 ) * Add the recip unary op. * Fix the cuda kernel. * Use the recip op in sigmoid.	2023-08-06 21:14:52 +01:00
Laurent Mazare	d34039e352	Add a stable diffusion example (#328 ) * Start adding a stable-diffusion example. * Proper computation of the causal mask. * Add the chunk operation. * Work in progress: port the attention module. * Add some dummy modules for conv2d and group-norm, get the attention module to compile. * Re-enable the 2d convolution. * Add the embeddings module. * Add the resnet module. * Add the unet blocks. * Add the unet. * And add the variational auto-encoder. * Use the pad function from utils.	2023-08-06 17:49:43 +01:00

1 2 3 4 5 ...

254 Commits