candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

Author	SHA1	Message	Date
Nicolas Patry	40c80bfbb2	Merge branch 'main' into update_multiprocess	2023-07-29 16:38:35 +02:00
Laurent Mazare	07eb899729	More mnist training. (#275 )	2023-07-29 13:29:31 +01:00
Laurent Mazare	c0a8ed19eb	Support for where-cond on cuda for u8 and u32. (#274 )	2023-07-29 11:48:58 +01:00
Laurent Mazare	4bf2ebf836	Use u8 tensors for masks. (#273 )	2023-07-29 11:32:58 +01:00
Nicolas Patry	97d8712ba5	Remove single function.	2023-07-28 23:31:25 +02:00
Nicolas Patry	97181a77c0	Making multiprocess require flash-attn.	2023-07-28 23:31:24 +02:00
Laurent Mazare	50d8273ae4	Support both llama v1 and llama v2. (#272 )	2023-07-28 18:40:59 +01:00
Laurent Mazare	7513a5e005	Line-up the llama implementation with the python-transformers one. (#271 ) * Line-up the llama implementation with the python-transformers one. * Also lineup the multiprocess version.	2023-07-28 18:31:28 +01:00
Laurent Mazare	cb8dd5cd53	Back to using the main branch now that the PR has been merged. (#270 )	2023-07-28 16:22:44 +01:00
Laurent Mazare	a0e47aba98	Fix the revision used in starcoder to use the safetensors PR. (#269 )	2023-07-28 14:02:31 +01:00
Laurent Mazare	fb84ead8f7	Add the starcoder example to the readme. (#268 ) * Add the starcoder example to the readme. * Tweak.	2023-07-28 13:26:23 +01:00
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	68eab38de6	Cuda fix for starcoder. (#266 ) * Cuda fix for starcoder. * Nicer output.	2023-07-28 12:13:41 +01:00
Nicolas Patry	54ccf94472	Merge pull request #265 from LaurentMazare/fix_nccl Fix nccl	2023-07-28 11:37:58 +01:00
Nicolas Patry	4002968cf5	Put back `"dep:half"	2023-07-28 10:34:21 +00:00
Nicolas Patry	be256a6ba6	Fixing.	2023-07-28 10:23:05 +00:00
Nicolas Patry	d2dea11ef6	Fixing nccl feature.	2023-07-28 12:19:20 +02:00
Laurent Mazare	3e89df938c	Starcoder fix (#264 ) * Bugfix for starcoder. * Get some proper code generation. * Slightly simpler softmax.	2023-07-28 11:17:49 +01:00
Laurent Mazare	6a54ca115e	Add some Bigcode model (#260 ) * Start sketching the bigcode gpt model. * Sketch the bigcode model. * Implement the attention mechanism. * Random reshaping. * Sketch more of the example. * Add some kv cache. * Properly generate the position ids. * Proper attention mask. * Bail on upcasting. * Properly apply the attention mask. * Add the smaller starcoder variants. * Update for the new hub api. * Fix a shape issue. * Fix another shape issue. * Get some logits out. * Adjust the weigth names.	2023-07-28 09:57:32 +01:00
Nicolas Patry	4f260ef025	Merge pull request #216 from LaurentMazare/llama_multiprocess2 TP sharding v2	2023-07-28 08:06:13 +01:00
Nicolas Patry	0b97987b21	Merge pull request #261 from LaurentMazare/upgrade_hf_hub Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around all the time)	2023-07-28 07:03:30 +01:00
Nicolas Patry	8435a99edd	Added comment about offsets.	2023-07-27 20:11:57 +02:00
Nicolas Patry	ca479a873e	Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around all the time)	2023-07-27 20:05:02 +02:00
Nicolas Patry	952eca6b54	Fixing slice errors + comments.	2023-07-27 16:59:32 +02:00
Laurent Mazare	f291065f6c	Do not panic on empty ranges. (#257 )	2023-07-27 09:28:47 +01:00
Nicolas Patry	25a2086e8f	Putting back Send + Sync	2023-07-27 09:58:47 +02:00
Nicolas Patry	7c7e6ba201	Removing inner dependency on safetensors.	2023-07-27 09:58:47 +02:00
Nicolas Patry	1553b58fe5	Tensor are not necessarily sendable (CustomOp1).	2023-07-27 09:58:47 +02:00
Nicolas Patry	b7814f66b4	PyO3 is back.	2023-07-27 09:58:47 +02:00
Nicolas Patry	ed58de7551	Fixed TP sharded version.	2023-07-27 09:58:46 +02:00
Nicolas Patry	1735e4831e	TP sharding v2	2023-07-27 09:58:14 +02:00
Laurent Mazare	209f06d7c3	Micro-cleanup. (#256 )	2023-07-27 07:55:54 +01:00
Laurent Mazare	6475bfadfe	Simplify Tensor::randn. (#255 ) * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.	2023-07-27 07:40:36 +01:00
Laurent Mazare	89ba005962	Support backprop for a few more ops. (#254 )	2023-07-26 21:31:54 +01:00
Laurent Mazare	4f92420132	Add some flash attn test (#253 ) * Add some flash-attn test. * Add the cpu test. * Fail when the head is not a multiple of 8. * Polish the flash attention test.	2023-07-26 20:56:00 +01:00
Nicolas Patry	ded197497c	Merge pull request #252 from LaurentMazare/add_book Adding a cargo book	2023-07-26 17:35:54 +01:00
Laurent Mazare	84ad558e50	Switch to using llama-v2 by default. (#251 )	2023-07-26 17:18:27 +01:00
Nicolas Patry	368f169c6a	Permissions.	2023-07-26 18:12:02 +02:00
Nicolas Patry	8da6568c20	Typo.	2023-07-26 18:11:10 +02:00
Nicolas Patry	07a22fe606	Releasing within the branch to test the setup.	2023-07-26 18:08:34 +02:00
Nicolas Patry	834e1b197b	Adding a documentation book.	2023-07-26 18:06:31 +02:00
Laurent Mazare	89fd988836	Update to the latest gemm. (#250 )	2023-07-26 17:00:02 +01:00
Laurent Mazare	1235aa2536	Use bail rather than wrapping a string where possible. (#249 ) * Use bail rather than wrapping a string where possible. * Revert the cuda default bit.	2023-07-26 15:42:46 +01:00
Laurent Mazare	f052ba76cb	Lining up the flash attn version with the non-flash one. (#248 ) * Move the flash-attn function in the proper crate. * Causality tweak.	2023-07-26 15:11:45 +01:00
Nicolas Patry	46f2d9f0ac	Merge pull request #247 from LaurentMazare/add_number_of_tokens Add number of tokens.	2023-07-26 14:45:53 +01:00
Nicolas Patry	81bfa46702	Updated.	2023-07-26 15:21:50 +02:00
Nicolas Patry	8b1d12bead	Merge pull request #246 from LaurentMazare/rename_custom_op Rename exposed ops.	2023-07-26 14:20:29 +01:00
Nicolas Patry	035372248e	Simple QOL. - Add ms/token on llama2.c (15ms/token on my personal machine) - Hide `Run` buttons while models are not ready - Add dummy `progress` while weights are downloading (I briefly looked at putting a real progressbar.. and nothing easy enough came up.)	2023-07-26 15:17:32 +02:00
Laurent Mazare	2ce5f12513	Again set a few extra params in flash-attn. (#245 ) * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.	2023-07-26 14:16:37 +01:00
Nicolas Patry	97990f4afc	Add number of tokens.	2023-07-26 14:57:20 +02:00

1 2 3 4 5 ...

675 Commits