candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-17 19:18:50 +00:00

Author	SHA1	Message	Date
Laurent Mazare	bfa7c8fc01	Implement the module trait directly for QMatMul. (#1372 )	2023-11-25 10:09:45 +00:00
Nicolas Patry	df6814f34e	Refactor to simplify our lives for settings the params in the encoder.	2023-11-20 14:12:57 +01:00
Nicolas Patry	26c4e5bf1d	Metal part 1 - Scaffolding for metal. (#1308 ) * Metal part 1 - Scaffolding for metal. * Remove tracing.	2023-11-10 08:35:48 +01:00
Laurent Mazare	55bc3382cf	Allow for different behavior between training and eval (#1213 ) * Forward with training. * Do not use dropout on vgg evaluation.	2023-10-29 07:53:09 +01:00
Laurent Mazare	130fe5a087	Add the upblocks. (#853 )	2023-09-14 22:24:56 +01:00
Laurent Mazare	c88d6fd4b9	Remove set_training. (#784 )	2023-09-09 08:27:37 +01:00
Laurent Mazare	acf8f10ae1	Get the comparison operation to work on scalar values. (#780 ) * Get the comparison operation to work on scalar values. * Add some time measurement.	2023-09-08 20:13:29 +01:00
Laurent Mazare	2d3fcad267	Simplify usage of the pool functions. (#662 ) * Simplify usage of the pool functions. * Small tweak. * Attempt at using apply to simplify the convnet definition.	2023-08-29 19:12:16 +01:00
Laurent Mazare	5320aa6b7d	Move the test-utils bits to a shared place. (#619 )	2023-08-27 09:42:22 +01:00
Laurent Mazare	9c8d6dbc2a	Neon intrinsics for the q8_0 vecdot. (#604 ) * Neon intrinsics for the q8_0 vecdot. * Get the tests to run with accelerate (with some numerical error failures).	2023-08-25 14:42:18 +01:00
Laurent Mazare	ad33715c61	Preliminary support for importing PyTorch weights. (#511 ) * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring.	2023-08-19 11:26:32 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	c8039579a5	Conv1d optimize (#392 ) * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.	2023-08-10 15:23:52 +01:00
Laurent Mazare	b278834267	Support the Accelerate BLAS on macOS. (#325 ) * Add the accelerate feature. * Ffi tweaks.	2023-08-05 17:25:24 +01:00
Laurent Mazare	e635f18eda	Initial support for reading ggml files. (#311 ) * Start adding support for reading ggml files. * Compute the proper tensor size. * Print the read tensors. * Fix file reading.	2023-08-02 21:59:02 +01:00
Laurent Mazare	0902846f25	Add the AdamW optimizer. (#307 ) * Add the AdamW optimizer. * Add some AdamW test validated against PyTorch.	2023-08-02 14:03:49 +01:00
Laurent Mazare	51e51da896	Rename the candle crate to candle-core (#301 ) * Rename to candle-core. * More candle-core renaming.	2023-08-02 08:20:22 +01:00
Laurent Mazare	a27239f3d9	Add training for the llama2.c example (#296 ) * Rework the commands and run inference by default. * Add the training module and load the training dataset. * Random dataset iterator. * Proper valid-loss computation. * Compute the evaluation loss. * Add more substance to the training loop.	2023-08-01 17:23:07 +01:00
Laurent Mazare	6475bfadfe	Simplify Tensor::randn. (#255 ) * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.	2023-07-27 07:40:36 +01:00
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00
Laurent Mazare	23827c49cd	Cleanup some todos. (#226 ) * Cleanup some todos. * Fix more todo. * Optimize for the contiguous case. * Add the IntDType trait. * Handle the intdtype trait for more ops. * Remove a todo. * Remove a todo.	2023-07-23 16:00:00 +01:00
Laurent Mazare	a6bcdfb269	Custom ops with a single argument (#214 ) * Add the CustomOp1 trait. * Add an example of custom op. * Polish the custom op example. * Add some backward pass test for custom ops.	2023-07-21 15:18:05 +01:00
Laurent Mazare	4845d5cc64	More realistic training setup. (#210 ) * More realistic training setup. * Compute the model accuracy. * Very inefficient backprop for index select. * More backprop. * Fix some backprop issues. * Backprop fix. * Another broadcasting backprop fix. * Better backprop for reducing ops. * Training again. * Add some gradient tests. * Get the training to work.	2023-07-20 18:25:41 +01:00
Laurent Mazare	acb2f90469	Broadcasting performance optimization (cpu) (#182 ) * Avoid recomputing the index from scratch each time. * More performance optimisations.	2023-07-17 13:41:09 +01:00
Laurent Mazare	18ea92d83b	Iteration over strided blocks (#175 ) * Introduce the strided blocks. * Use the strided blocks to fasten the copy. * Add more testing.	2023-07-15 21:30:35 +01:00
Laurent Mazare	ded93a1169	Add the SGD optimizer (#160 ) * Add the nn::optim and some conversion traits. * Add the backward_step function for SGD. * Get the SGD optimizer to work and add a test. * Make the test slighly simpler.	2023-07-13 19:05:44 +01:00
Laurent Mazare	5ee3c95582	Move the variable creation to the variable module. (#159 ) * Move the variable creation to the variable module. * Make it possible to set a variable. * Add some basic gradient descent test. * Get the gradient descent test to work.	2023-07-13 16:55:40 +01:00
Laurent Mazare	6991036bc5	Introduce the variables api used for adjusting parameters during the training loop. (#158 ) * Add the variable api. * And add a comment.	2023-07-13 14:09:51 +01:00
Laurent Mazare	20599172ac	Add from_iter and arange, use it in the doctests. (#145 )	2023-07-12 12:03:01 +01:00
Laurent Mazare	fa760759e5	Allow for lazy loading of npz files, use it in llama to reduce memory usage in the cpu version. (#141 )	2023-07-11 20:22:34 +01:00
Laurent Mazare	64264d97c1	Modular backends (#138 ) * Add some trait to formalize backends. * Use the generic backend trait.	2023-07-11 11:17:02 +01:00
Nicolas Patry	fba07d6b6b	Merge pull request #127 from LaurentMazare/tensor_indexing `i(..)` indexing sugar (partial).	2023-07-10 19:56:34 +02:00
Nicolas Patry	ef0375d8bc	`i(..)` indexing sugar (partial). - Only range, and select (no tensor_select) - No negative indexing	2023-07-10 17:34:04 +02:00
Laurent Mazare	e2807c78a4	Enable the doctests to run with mkl (though they are broken for now). (#126 )	2023-07-10 16:27:46 +01:00
Laurent Mazare	548b1df7ea	Remove the dependency to blas and use mkl directly. (#125 )	2023-07-10 15:52:03 +01:00
Nicolas Patry	868743b8b9	Expanding a bit the README	2023-07-10 12:51:37 +02:00
laurent	2c3d871b2e	Add a simpler way to specify the dim index for some ops.	2023-07-05 20:22:43 +01:00
laurent	a424d95473	Add more of the conv1d op.	2023-07-04 11:15:45 +01:00
laurent	cf2789fb81	Move some safetensors bits in the candle-core crate.	2023-07-03 08:37:46 +01:00
laurent	c1bbbf94f6	Start refactoring the stride.	2023-06-28 12:57:30 +01:00
laurent	8c81a70170	PyTorch like display implementation.	2023-06-27 21:16:35 +01:00
laurent	1d504cc6b3	Rework the debug trait.	2023-06-27 19:10:30 +01:00
laurent	ca6aa8ff12	Use num-cpus to enable parallelism.	2023-06-27 14:42:26 +01:00
Nicolas Patry	d7f729fb8f	Refactor the hierarchy.	2023-06-27 11:57:27 +02:00

46 Commits