candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3e89df938c	Starcoder fix (#264 ) * Bugfix for starcoder. * Get some proper code generation. * Slightly simpler softmax.	2023-07-28 11:17:49 +01:00
Laurent Mazare	6a54ca115e	Add some Bigcode model (#260 ) * Start sketching the bigcode gpt model. * Sketch the bigcode model. * Implement the attention mechanism. * Random reshaping. * Sketch more of the example. * Add some kv cache. * Properly generate the position ids. * Proper attention mask. * Bail on upcasting. * Properly apply the attention mask. * Add the smaller starcoder variants. * Update for the new hub api. * Fix a shape issue. * Fix another shape issue. * Get some logits out. * Adjust the weigth names.	2023-07-28 09:57:32 +01:00
Nicolas Patry	4f260ef025	Merge pull request #216 from LaurentMazare/llama_multiprocess2 TP sharding v2	2023-07-28 08:06:13 +01:00
Nicolas Patry	ca479a873e	Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around all the time)	2023-07-27 20:05:02 +02:00
Nicolas Patry	25a2086e8f	Putting back Send + Sync	2023-07-27 09:58:47 +02:00
Nicolas Patry	7c7e6ba201	Removing inner dependency on safetensors.	2023-07-27 09:58:47 +02:00
Nicolas Patry	ed58de7551	Fixed TP sharded version.	2023-07-27 09:58:46 +02:00
Nicolas Patry	1735e4831e	TP sharding v2	2023-07-27 09:58:14 +02:00
Laurent Mazare	209f06d7c3	Micro-cleanup. (#256 )	2023-07-27 07:55:54 +01:00
Laurent Mazare	84ad558e50	Switch to using llama-v2 by default. (#251 )	2023-07-26 17:18:27 +01:00
Laurent Mazare	1235aa2536	Use bail rather than wrapping a string where possible. (#249 ) * Use bail rather than wrapping a string where possible. * Revert the cuda default bit.	2023-07-26 15:42:46 +01:00
Laurent Mazare	f052ba76cb	Lining up the flash attn version with the non-flash one. (#248 ) * Move the flash-attn function in the proper crate. * Causality tweak.	2023-07-26 15:11:45 +01:00
Nicolas Patry	8b1d12bead	Merge pull request #246 from LaurentMazare/rename_custom_op Rename exposed ops.	2023-07-26 14:20:29 +01:00
Laurent Mazare	2ce5f12513	Again set a few extra params in flash-attn. (#245 ) * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.	2023-07-26 14:16:37 +01:00
Nicolas Patry	1a5416ec35	Rename exposed ops.	2023-07-26 12:43:19 +02:00
Laurent Mazare	fa2b64d678	Proper flash-attn parameters. (#244 ) * Proper flash-attn parameters. * Set the flash attention parameters. * Add more validations. * Setup the o_ flash attn parameters. * More flash-attn support. * Set more flash attn parameters.	2023-07-26 10:13:40 +01:00
Laurent Mazare	e40b150bbe	Better handling of dtypes in llama. (#243 )	2023-07-26 08:28:33 +01:00
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00
Laurent Mazare	550a13a547	Use the binary decoder for llama2.c. (#230 ) * Use the binary decoder for llama2.c. * Add the temperature. * Formatting tweak. * Fix the rotary embeddings.	2023-07-24 10:56:08 +01:00
Laurent Mazare	35b65fed88	Add llama2.c as an example. (#229 ) * Start adding llama2.c. * Model loading. * Add the llama-v2 model. * Start converting the weights. * Rotary embedding tweaks. * Get the model to generate some tokens.	2023-07-24 09:13:50 +01:00
Laurent Mazare	b6f7dfb682	CPU implementation for the custom RMS example. (#228 ) * CPU implementation for the custom RMS example. * Add the eps parameter.	2023-07-23 20:04:20 +01:00
Laurent Mazare	e449ce53a2	Wrapping code to call the custom op. (#225 ) * Wrapping code to call the custom op. * Get the rms example to work. * Get around rustfmt failing in the CI. * Fix the rms computation.	2023-07-23 11:31:17 +01:00
Laurent Mazare	b8a10425ad	Kernel build example (#224 ) * Build example kernels. * Add some sample custom kernel. * Get the example kernel to compile. * Add some cuda code. * More cuda custom op. * More cuda custom ops.	2023-07-23 07:15:37 +01:00
Laurent Mazare	1f26042693	Move some shared functions to the nn module. (#221 )	2023-07-22 13:25:11 +01:00
Laurent Mazare	43c7223292	Rename the .r functions to .dims so as to be a bit more explicit. (#220 )	2023-07-22 10:39:27 +01:00
Laurent Mazare	52c5d8c087	Add the gather op. (#219 ) * Start adding gather. * Gather cpu implementation + use in simple training. * Add scatter_add for the gradient of gather. * Simple cpu implementation of scatter_add. * Use gather in the simple-training backprop.	2023-07-22 07:21:28 +01:00
Laurent Mazare	410654525f	Refactor the reduce ops in order to introduce argmin/argmax. (#212 ) * Refactor the reduce ops in order to introduce argmin/argmax. * Clippy fixes. * Use the newly introduced argmax. * Fix the strided case. * Handle the non-contiguous case.	2023-07-21 11:41:08 +01:00
Laurent Mazare	c60831aad4	Add more gradient tests + bugfixes. (#211 ) * Add more gradient tests + bugfixes. * More tests and fixes. * More tests.	2023-07-21 06:52:39 +01:00
Laurent Mazare	4845d5cc64	More realistic training setup. (#210 ) * More realistic training setup. * Compute the model accuracy. * Very inefficient backprop for index select. * More backprop. * Fix some backprop issues. * Backprop fix. * Another broadcasting backprop fix. * Better backprop for reducing ops. * Training again. * Add some gradient tests. * Get the training to work.	2023-07-20 18:25:41 +01:00
Laurent Mazare	12d6dc018d	Support for MQA for llama v2. (#205 ) * Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.	2023-07-20 06:39:04 +01:00
Nicolas Patry	9515e8ea6c	Merge branch 'main' into remove_wrapper	2023-07-19 18:53:55 +02:00
Nicolas Patry	e6584476c4	Merge pull request #200 from LaurentMazare/removing_candle_hub Removing `candle-hub` internal to extract into `hf-hub` standalone.	2023-07-19 17:27:55 +02:00
Laurent Mazare	cb687b4897	Add some more developed training examples. (#199 ) * Use contiguous tensors for variables. * Sketch the mnist example. * Start adding the reduce ops. * Renaming. * Refactor the reduce operations. * Bugfix for the broadcasting vectorization.	2023-07-19 15:37:52 +01:00
Nicolas Patry	dfd624dbd3	[Proposal] Remove SafeTensor wrapper (allows finer control for users).	2023-07-19 16:25:44 +02:00
Nicolas Patry	439321745a	Removing `candle-hub` internal to extract into `hf-hub` standalone.	2023-07-19 15:04:38 +02:00
Laurent Mazare	ff61a42ad7	Use mkl to accelerate binary ops. (#190 ) * Vectorized binary ops with mkl. * Improve the binary op mkl support. * Push the support for mkl binary ops. * Proper vectorization of binary ops. * Proper mkl'isation when broadcasting binary ops.	2023-07-18 12:04:39 +01:00
Laurent Mazare	b706f32839	Add Shape try into (#189 ) * Add the TryInto trait for shapes. * Use the vectorized operations in block mode too.	2023-07-18 10:52:16 +01:00
Laurent Mazare	d6313d2447	Add more tracing details to bert. (#188 )	2023-07-18 08:11:05 +01:00
Laurent Mazare	b8abe2bb4b	Factorize the tokenizers version in the workspace cargo def. (#186 )	2023-07-18 06:48:13 +01:00
Laurent Mazare	f0cccd08f0	Bert tracing (#184 ) * Add some tracing to bert. * More tracing. * Add a flag for tracing.	2023-07-17 19:40:42 +01:00
Laurent Mazare	104f89df31	Centralize the dependency versions and inherit them. (#177 )	2023-07-16 07:47:17 +01:00
Laurent Mazare	66750f9827	Add some 'cuda-if-available' helper function. (#172 )	2023-07-15 08:25:15 +01:00
Nicolas Patry	4ed56d7861	Removing cuda default. Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.	2023-07-14 16:52:15 +02:00
Laurent Mazare	a2f72edc0d	Simplify the parameters used by sum and sum_keepdim. (#165 )	2023-07-14 08:22:08 +01:00
Laurent Mazare	2bfa791336	Use the same default as pytorch for sum. (#164 )	2023-07-13 21:32:32 +01:00
Laurent Mazare	3c02ea56b0	Add a cli argument to easily switch the dtype. (#161 )	2023-07-13 19:18:49 +01:00
Laurent Mazare	50b0946a2d	Tensor mutability (#154 ) * Working towards tensor mutability. * Use a ref-cell to provide tensor mutability.	2023-07-13 11:04:40 +01:00
Laurent Mazare	a3663ce2f2	Encodec forward pass (#153 ) * Sketch the forward pass for encodec. * Forward pass for the encodec resnet block. * Encodec decoding.	2023-07-13 08:18:39 +01:00
Laurent Mazare	6c75a98ad2	Add the forward pass for the T5 model. (#152 ) * Add the forward pass for the T5 model. * More t5 forward pass.	2023-07-12 22:02:40 +01:00
Laurent Mazare	ba35d895e7	Sketch the candle-transformers crate. (#147 ) * Sketch the candle-transformers crate. * Format the empty files.	2023-07-12 13:49:31 +01:00

... 3 4 5 6 7

346 Commits