candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	f052ba76cb	Lining up the flash attn version with the non-flash one. (#248 ) * Move the flash-attn function in the proper crate. * Causality tweak.	2023-07-26 15:11:45 +01:00
Laurent Mazare	2ce5f12513	Again set a few extra params in flash-attn. (#245 ) * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.	2023-07-26 14:16:37 +01:00
Laurent Mazare	fa2b64d678	Proper flash-attn parameters. (#244 ) * Proper flash-attn parameters. * Set the flash attention parameters. * Add more validations. * Setup the o_ flash attn parameters. * More flash-attn support. * Set more flash attn parameters.	2023-07-26 10:13:40 +01:00
Laurent Mazare	e40b150bbe	Better handling of dtypes in llama. (#243 )	2023-07-26 08:28:33 +01:00
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00
Laurent Mazare	43c7223292	Rename the .r functions to .dims so as to be a bit more explicit. (#220 )	2023-07-22 10:39:27 +01:00
Laurent Mazare	12d6dc018d	Support for MQA for llama v2. (#205 ) * Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.	2023-07-20 06:39:04 +01:00
Nicolas Patry	439321745a	Removing `candle-hub` internal to extract into `hf-hub` standalone.	2023-07-19 15:04:38 +02:00
Laurent Mazare	66750f9827	Add some 'cuda-if-available' helper function. (#172 )	2023-07-15 08:25:15 +01:00
Nicolas Patry	4ed56d7861	Removing cuda default. Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.	2023-07-14 16:52:15 +02:00
Laurent Mazare	a2f72edc0d	Simplify the parameters used by sum and sum_keepdim. (#165 )	2023-07-14 08:22:08 +01:00
Laurent Mazare	2bfa791336	Use the same default as pytorch for sum. (#164 )	2023-07-13 21:32:32 +01:00
Laurent Mazare	3c02ea56b0	Add a cli argument to easily switch the dtype. (#161 )	2023-07-13 19:18:49 +01:00
Laurent Mazare	50b0946a2d	Tensor mutability (#154 ) * Working towards tensor mutability. * Use a ref-cell to provide tensor mutability.	2023-07-13 11:04:40 +01:00
Laurent Mazare	ba35d895e7	Sketch the candle-transformers crate. (#147 ) * Sketch the candle-transformers crate. * Format the empty files.	2023-07-12 13:49:31 +01:00
Laurent Mazare	eae646d322	Use arange in the examples. (#146 )	2023-07-12 12:12:34 +01:00
Laurent Mazare	20599172ac	Add from_iter and arange, use it in the doctests. (#145 )	2023-07-12 12:03:01 +01:00
Laurent Mazare	b3b39cca92	Llama batch (#144 ) * Add a batch dimension to llama. * Bugfixes.	2023-07-12 11:38:19 +01:00
Laurent Mazare	fa760759e5	Allow for lazy loading of npz files, use it in llama to reduce memory usage in the cpu version. (#141 )	2023-07-11 20:22:34 +01:00
Laurent Mazare	37cad85869	Resurrect the llama npy support. (#140 )	2023-07-11 19:32:10 +01:00
Laurent Mazare	760f1d7055	Refactor the llama example to make it more in sync with the other ones. (#139 ) * Refactor the llama example to make it more in sync with the other ones. * Make clippy happy. * Properly load the safetensor weights. * Get llama back to a working state for the safetensors case.	2023-07-11 17:20:55 +01:00
Laurent Mazare	674eb35e10	Remove some dead-code pragmas. (#137 )	2023-07-11 09:33:59 +01:00
Laurent Mazare	e923b3adc2	Add a KV cache to falcon. (#104 )	2023-07-07 17:24:38 +01:00
Nicolas Patry	115629fe08	Creating new sync Api for `candle-hub`. - `api::Api` -> `api::tokio::api` (And created new `api::sync::Api`). - Remove `tokio` from all our examples. - Using similar codebase for now instead of ureq (for simplicity).	2023-07-06 15:15:25 +02:00
Laurent Mazare	dd60bd84bb	MKL adjustments. (#87 )	2023-07-06 11:37:27 +01:00
Laurent Mazare	c297a50960	Add mkl support for matrix multiply. (#86 ) * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.	2023-07-06 11:05:05 +01:00
laurent	e2bfbcb79c	Support dim indexes in cat.	2023-07-05 20:39:08 +01:00
laurent	2c3d871b2e	Add a simpler way to specify the dim index for some ops.	2023-07-05 20:22:43 +01:00
laurent	fdb1acd2ff	Move llama in a cargo-examples directory.	2023-07-03 11:30:58 +01:00

29 Commits