candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 11:37:11 +00:00

Author	SHA1	Message	Date
Laurent Mazare	607ffb9f1e	Retrieve more information from PyTorch checkpoints. (#515 ) * Retrieve more information from PyTorch checkpoints. * Add enough support to load dino-v2 backbone weights.	2023-08-19 15:05:34 +01:00
Laurent Mazare	f861a9df6e	Add ggml support to tensor-tools (#512 ) * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring. * Also support ggml files.	2023-08-19 11:45:22 +01:00
Laurent Mazare	ad33715c61	Preliminary support for importing PyTorch weights. (#511 ) * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring.	2023-08-19 11:26:32 +01:00
Laurent Mazare	90ff04e77e	Add the tensor-tools binary. (#510 )	2023-08-19 09:06:44 +01:00
Laurent Mazare	cb069d6063	Add the permute op (similar to pytorch). (#504 ) * Add the permute op (similar to pytorch). * Add the backprop for dimension permutation.	2023-08-18 16:30:53 +01:00
Laurent Mazare	95462c6a2e	Add a vision transformer example (dino-v2). (#502 ) * Add a vision transformer example (dino-v2). * Add some documentation + test. * CI fix. * Another fix (still unable to replicate the errors locally :( )	2023-08-18 11:58:06 +01:00
Lukas Kreussel	109e95b189	Basic `qmatmul` parallelization (#492 ) * Basic `par_iter` parallelization * Pass errors up * Disable `avx` for x86 macs	2023-08-18 09:45:37 +01:00
Laurent Mazare	c78ce76501	Add a simple Module trait and implement it for the various nn layers (#500 ) * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.	2023-08-18 09:38:22 +01:00
Laurent Mazare	a22b1bed7b	Tensor -> QTensor conversion (#496 ) * Sketch some qmatmul test. * Add the quantization function. * More testing. * Make the test smaller and faster. * Add some shape checking.	2023-08-18 08:19:20 +01:00
Laurent Mazare	557b2c28dd	Q6K quantization (#495 ) * Print the detected arch options. * Add the q6k quantization. * Add a currently broken test. * Bugfix. * Bugfix. * Another bugfix. * Another bugfix + get the test to work.	2023-08-17 22:22:57 +01:00
Laurent Mazare	fc81af1712	AVX version of the q6k vec-dot. (#493 ) * AVX version of the q6k vec-dot. * Use the avx sum.	2023-08-17 20:13:18 +01:00
Laurent Mazare	03be33eea4	Relax the requirements on CustomOp. (#486 ) * Relax the requirements on CustomOp. * Simplify the custom-ops when no backward is required.	2023-08-17 11:12:05 +01:00
Laurent Mazare	d99cac3ec3	Move the avx specific bits to a separate file. (#481 )	2023-08-17 09:01:06 +01:00
Laurent Mazare	306c8eee7a	AVX version of the vecdot for q4_0. (#474 ) * AVX version of the vecdot for q4_0. * Tweak the avx bits. * Add a qmatmul benchmark. * Fix the quantized test.	2023-08-17 07:03:32 +01:00
Laurent Mazare	098909de40	Add vecdot for q6k-q8k. (#476 ) * Add vecdot for q6k-q8k. * Add some testing for q8k. * Use QMatMul for the output layer.	2023-08-16 20:59:40 +01:00
Laurent Mazare	3bedba1fce	Use a zipped iterator. (#475 ) * Use a zipped iterator. * Add to/from float for q8k.	2023-08-16 20:15:11 +01:00
Laurent Mazare	575e88a999	Add a quantized test that use negative values. (#470 ) * Add a quantized test that use negative values. * Add a default tokenizer.	2023-08-16 16:32:58 +01:00
Laurent Mazare	a9101700b6	Add a kv-cache to the quantized llama example. (#466 ) * Add a kv-cache to the quantized llama example. * Also print the prompt. * Bugfix in q6k dequantizing. * Another bugfix.	2023-08-16 14:28:42 +01:00
Laurent Mazare	3071134788	Get the ggml based llama to generate some text. (#464 ) * Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.	2023-08-16 12:41:07 +01:00
Laurent Mazare	965597a873	Add a test for qmatmul. (#459 )	2023-08-16 06:36:27 +01:00
Laurent Mazare	ca449f9ee1	Add quantized tensors. (#458 ) * Add quantized tensors. * Implement the debug trait for QTensor. * Add the QMatMul custom op.	2023-08-15 22:45:53 +01:00
Laurent Mazare	b8263aa15c	Quantized support for f16 and f32 (#457 ) * Add f32 as a quantized type. * Add f16 as a quantized type too.	2023-08-15 21:09:37 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00
Laurent Mazare	08effe3762	More quantization support (#455 ) * Properly initialize wdata. * Simplify the matmul bits. * Add from_float for q4_0. * Fix a couple bugs. * Get the test to work. * Get clippy to be happy.	2023-08-15 18:58:04 +01:00
Laurent Mazare	5e49922be2	Basic quantization support (#453 ) * Add a vecdot trait. * Start implementing mul_mat. * Add to the mul mat implementation. * Add q8_0 quantization. * Implement the GgmlType trait for all types. * Add the missing block. * Add a TODO.	2023-08-15 15:53:19 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	a094dc503d	Add a cuda kernel for avg-pool2d. (#440 ) * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.	2023-08-14 12:32:05 +01:00
Laurent Mazare	34f4b3187e	Add a naive conv2d cuda kernel. (#438 ) * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.	2023-08-14 10:34:42 +01:00
Lukas Kreussel	9e7e6e0288	Add dequantization for ggmls `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` (#407 ) * Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` * expose `tensor_from_ggml` for external usage * bugfixes & example	2023-08-13 23:22:57 +01:00
Laurent Mazare	d379a76a9e	Add a softmax bench. (#433 ) * Add a softmax bench. * Add the vectorized sum reduce.	2023-08-13 20:09:18 +01:00
Laurent Mazare	9af438ac1b	Track the conv2d operations in stable-diffusion. (#431 ) * Track the conv2d operations in stable-diffusion. * Add more tracing to stable-diffusion. * Also trace the resnet bits. * Trace the attention blocks. * Also trace the attention inner part. * Small tweak.	2023-08-13 15:58:26 +01:00
Laurent Mazare	5a63b51f14	Add a matmul benchmark. (#429 )	2023-08-13 13:41:03 +01:00
Laurent Mazare	9aca398a4f	More accelerate optimizations (#427 ) * Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.	2023-08-13 12:53:34 +01:00
Yumin Wu	16b89f5b83	fix: can directly save the loaded weights (#421 )	2023-08-12 16:33:29 +01:00
Laurent Mazare	e12372021b	Expose the tensor write-bytes function. (#412 )	2023-08-11 17:13:42 +01:00
Laurent Mazare	01ea57da8c	Fix the conv tests. (#409 )	2023-08-11 14:59:54 +01:00
Laurent Mazare	662db45fc3	Use zero padding in conv1d and conv2d (same as pytorch). (#408 )	2023-08-11 14:53:05 +01:00
Laurent Mazare	e29c7809ec	Parallelise the CPU kernels for the conv ops. (#401 ) * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.	2023-08-11 05:51:58 +01:00
Laurent Mazare	a325c1aa50	Upsample test + bugfix. (#399 )	2023-08-10 21:02:35 +02:00
Laurent Mazare	94eff56aee	Optimize the cpu conv2d kernel (#396 ) * Conv2d simd optimization. * Fix the contiguous copying. * Small tweak.	2023-08-10 17:40:09 +01:00
Laurent Mazare	ff53f38467	Small example for benchmarking some cpu ops (#394 ) * Refactor the benchmark example. * Rename the example. * Add some comments.	2023-08-10 17:00:17 +01:00
Laurent Mazare	c8039579a5	Conv1d optimize (#392 ) * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.	2023-08-10 15:23:52 +01:00
Laurent Mazare	f3fe730a30	Npy tweaks & error with path (#384 ) * Simplify the npy writing. * Wrap the file path so as to provide better errors.	2023-08-10 06:21:58 +01:00
Laurent Mazare	c7f92f985e	Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. (#383 )	2023-08-10 05:48:19 +01:00
Lei	3bbc08a8df	Fix randn cpu (#382 ) * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use	2023-08-10 05:33:44 +01:00
Ciarán Curley	25ec2d9f6b	fix: remove incorrect unwrap (#379 )	2023-08-09 21:45:24 +01:00
Laurent Mazare	fcfdcbd337	Add a conv1d benchmark based on the whisper sizes. (#377 ) * Add a conv1d benchmark based on the whisper sizes. * Enforce the batch-dim in conv1d.	2023-08-09 20:27:03 +01:00

... 2 3 4 5 6 ...

460 Commits