candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-21 20:22:49 +00:00

Author	SHA1	Message	Date
Nicolas Patry	0bb344f798	[RFC] Start removal of `VarBuilder`. - Uses `Initializer` trait instead. - Allows more decoupling between init and load, which are very different ops. - Allows more decoupling between backends (safetensors, npy, ggml, etc...) This is a minimum viable change. There are 3 kind of objects with various relations. The `Model`: This is `Llama`, `Linear`, `Rms` ... They contain tensors (and possibly other things). and are used to call `forward` basically. They should have no ownership of any internals like Rng state or actual shapes of the tensors (the tensors already own those) The `Initializer`: This is a struct containing necessary information to generate new random tensors. Typically they should own a random generator, and generate different kind of random tensors based on what kind of `Model` they are initializing. This do not own any information about the `Model` itself. Default init stores the `Vec<Var>` for now, in order to send to the optimizer. Ths `Config`: This is the necessary information to link between the `Model` and the `Initializer`. This is another struct which is a companion of the implementation of the initalization. Typical information is the shape of the tensors for simple `Model`, the `eps` for RMS, the `use_bias` boolean to know whether we should have a bias in the linear layer. This should remove all need for `VarBuilder` during intialization, and allow removing every initialization bit within `VarBuilder`. Modifying `llama2-c` to follow that initialization is left on purpose for a follow-up to keep the current PR rather small.	2023-08-16 14:39:36 +02:00
Laurent Mazare	965597a873	Add a test for qmatmul. (#459 )	2023-08-16 06:36:27 +01:00
Laurent Mazare	ca449f9ee1	Add quantized tensors. (#458 ) * Add quantized tensors. * Implement the debug trait for QTensor. * Add the QMatMul custom op.	2023-08-15 22:45:53 +01:00
Laurent Mazare	b8263aa15c	Quantized support for f16 and f32 (#457 ) * Add f32 as a quantized type. * Add f16 as a quantized type too.	2023-08-15 21:09:37 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00
Laurent Mazare	08effe3762	More quantization support (#455 ) * Properly initialize wdata. * Simplify the matmul bits. * Add from_float for q4_0. * Fix a couple bugs. * Get the test to work. * Get clippy to be happy.	2023-08-15 18:58:04 +01:00
Laurent Mazare	8ad4a21ffc	Add a basic optimizer example. (#454 )	2023-08-15 17:19:18 +01:00
Laurent Mazare	5e49922be2	Basic quantization support (#453 ) * Add a vecdot trait. * Start implementing mul_mat. * Add to the mul mat implementation. * Add q8_0 quantization. * Implement the GgmlType trait for all types. * Add the missing block. * Add a TODO.	2023-08-15 15:53:19 +01:00
Chengxu Yang	ebcfd96d94	add c++17 flags (#452 )	2023-08-15 15:29:34 +01:00
Laurent Mazare	5b1690fffa	Tweak the llama example. (#450 )	2023-08-15 12:18:20 +01:00
Guoqing Bao	3cc87058b7	Support local weights & dynamic outputs (#447 ) * Support local weights & dynamic outputs * Revise as suggested * Cargo code format	2023-08-15 11:51:57 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	a094dc503d	Add a cuda kernel for avg-pool2d. (#440 ) * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.	2023-08-14 12:32:05 +01:00
Laurent Mazare	34f4b3187e	Add a naive conv2d cuda kernel. (#438 ) * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.	2023-08-14 10:34:42 +01:00
Laurent Mazare	eab54e4490	Fix the tests for mkl. (#437 )	2023-08-14 08:09:27 +01:00
Lukas Kreussel	9e7e6e0288	Add dequantization for ggmls `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` (#407 ) * Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` * expose `tensor_from_ggml` for external usage * bugfixes & example	2023-08-13 23:22:57 +01:00
Laurent Mazare	8bd2b22b33	Optimize the logit computations in the whisper example. (#434 )	2023-08-13 22:00:13 +01:00
Laurent Mazare	d379a76a9e	Add a softmax bench. (#433 ) * Add a softmax bench. * Add the vectorized sum reduce.	2023-08-13 20:09:18 +01:00
Laurent Mazare	9af438ac1b	Track the conv2d operations in stable-diffusion. (#431 ) * Track the conv2d operations in stable-diffusion. * Add more tracing to stable-diffusion. * Also trace the resnet bits. * Trace the attention blocks. * Also trace the attention inner part. * Small tweak.	2023-08-13 15:58:26 +01:00
Laurent Mazare	b1ff78f762	Allow using accelerate with stable-diffusion. (#430 )	2023-08-13 14:14:20 +01:00
Laurent Mazare	5a63b51f14	Add a matmul benchmark. (#429 )	2023-08-13 13:41:03 +01:00
Laurent Mazare	6d694554b8	Support longer sequences in language detection. (#428 )	2023-08-13 13:16:15 +01:00
Laurent Mazare	9aca398a4f	More accelerate optimizations (#427 ) * Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.	2023-08-13 12:53:34 +01:00
Laurent Mazare	60cd1551ca	Add a KV cache to whisper. (#426 )	2023-08-12 21:17:08 +01:00
Laurent Mazare	a0908d212c	Add a -language argument. (#425 )	2023-08-12 17:08:40 +01:00
Laurent Mazare	972078e1ae	Update the readme with the discord server and common errors. (#423 )	2023-08-12 16:45:58 +01:00
Yumin Wu	16b89f5b83	fix: can directly save the loaded weights (#421 )	2023-08-12 16:33:29 +01:00
Laurent Mazare	0741ebbd51	More multilingual support for whisper. (#419 ) * More multilingual support for whisper. * Use the language token appropriately.	2023-08-12 15:32:52 +01:00
Laurent Mazare	0c3f109faa	Basic multilingual support for whisper (#417 ) * Multi-lingual support for whisper. * Avoid hardcoding the token names. * More multi-lingual support. * Remove the todo.	2023-08-12 11:23:04 +01:00
Laurent Mazare	2ba6b2826f	Fix the readme instructions for stable-diffusion. (#415 )	2023-08-11 18:59:04 +01:00
Laurent Mazare	1d0157bbc4	Stable diffusion: retrieve the model files from the HF hub. (#414 ) * Retrieve the model files from the HF hub in the stable diffusion example. * Add to the readme.	2023-08-11 18:57:06 +01:00
Laurent Mazare	91dbf907d3	Add more whisper variants. (#413 )	2023-08-11 17:33:55 +01:00
Laurent Mazare	e12372021b	Expose the tensor write-bytes function. (#412 )	2023-08-11 17:13:42 +01:00
Laurent Mazare	55e428c8ae	Expose the varmap inner data. (#411 )	2023-08-11 16:58:56 +01:00
Laurent Mazare	01ea57da8c	Fix the conv tests. (#409 )	2023-08-11 14:59:54 +01:00
Laurent Mazare	662db45fc3	Use zero padding in conv1d and conv2d (same as pytorch). (#408 )	2023-08-11 14:53:05 +01:00
Laurent Mazare	906c0f3eb5	Remove the checkpoint conversion script. (#405 ) * Remove the checkpoint conversion script. * Remove references to the script.	2023-08-11 05:59:48 +01:00
Laurent Mazare	e29c7809ec	Parallelise the CPU kernels for the conv ops. (#401 ) * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.	2023-08-11 05:51:58 +01:00
Laurent Mazare	a325c1aa50	Upsample test + bugfix. (#399 )	2023-08-10 21:02:35 +02:00
Nicolas Patry	b6cf26e48e	Merge pull request #393 from huggingface/older_gpus Working on older GPUs (still not compute 52 it seems but > 6 could be OK)	2023-08-10 20:49:23 +02:00
Nicolas Patry	379eadc68e	Working now.	2023-08-10 19:43:25 +02:00
Nicolas Patry	7e4fbc1e17	[DO NOT MERGE] temporary PR so users can try out on older GPUs.	2023-08-10 19:36:31 +02:00
Laurent Mazare	80f0482f26	Fix the stable-diffusion vae. (#398 ) * Fix the stable-diffusion vae. * Fix for saving images.	2023-08-10 18:24:31 +01:00
Laurent Mazare	94eff56aee	Optimize the cpu conv2d kernel (#396 ) * Conv2d simd optimization. * Fix the contiguous copying. * Small tweak.	2023-08-10 17:40:09 +01:00
Nicolas Patry	a55133effd	Merge pull request #395 from huggingface/fix_compat_windows Compat windows.	2023-08-10 18:05:12 +02:00
Laurent Mazare	ff53f38467	Small example for benchmarking some cpu ops (#394 ) * Refactor the benchmark example. * Rename the example. * Add some comments.	2023-08-10 17:00:17 +01:00
Nicolas Patry	4a95d34c83	Compat windows.	2023-08-10 17:46:47 +02:00

1 2 3 4 5 ...

859 Commits