candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	37c539f2b7	Helper function to load sharded safetensors files (#1481 ) * Fix the quantized mistral example. * Add a helper function to load sharded safetensors weights. * Use the sharded loader.	2023-12-25 21:49:21 +01:00
Laurent Mazare	5b35fd0fcf	MMLU evaluation for Phi. (#1474 ) * MMLU evaluation for Phi. * Improve the evaluation.	2023-12-23 15:28:36 +01:00
Nicolas Patry	9fc210fae8	Merge pull request #1318 from huggingface/metal4 Starting to fix some tests.	2023-12-20 15:37:31 +01:00
Laurent Mazare	94817dac56	Bump the crate version to 0.3.2. (#1452 )	2023-12-17 05:34:53 -06:00
Nicolas Patry	4349ff1fc2	Starting to fix some tests. Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit `c65f68e988`. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.	2023-11-30 11:30:31 +01:00
Laurent Mazare	a209ce8ceb	Update for 0.3.1. (#1324 )	2023-11-11 18:48:52 +00:00
Laurent Mazare	a773a4b22b	[ONNX] Support a couple more ops. (#1284 ) * Support the shape op in ONNX. * Share the axis normalization bits. * Add some limited support for gather. * Unsqueeze. * Comparison with broadcasting. * Add Not + handle i32.	2023-11-06 22:44:58 +01:00
Laurent Mazare	2a45bcf943	Put the onnx example behind a feature flag. (#1276 ) * Put the onnx example behind a feature flag. * Exclude the onnx bits from the workspace. * README tweaks.	2023-11-06 07:45:07 +01:00
Laurent Mazare	928a9d906e	[ONNX] Do not generate values for constants. (#1272 ) * Do not generate values for constants. * Add an onnx based example using squeezenet.	2023-11-05 11:23:14 +01:00
Lukas Kreussel	174b208052	PyO3: Better shape handling (#1143 ) * Negative and `args` shape handling Rename to `PyShapeWithHole` + validate that only one hole exists * Regenerate stubs --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>	2023-10-29 15:41:44 +00:00
Laurent Mazare	29c7f2565d	Add some reinforcement learning example. (#1090 ) * Add some reinforcement learning example. * Python initialization. * Get the example to run. * Vectorized gym envs for the atari wrappers. * Get some simulation loop to run.	2023-10-14 16:46:43 +01:00
Laurent Mazare	096dee7073	Bump the version to 0.3.0. (#1014 ) * Bump the version to 0.3.0. * Changelog update.	2023-10-01 13:51:57 +01:00
Laurent Mazare	06207332bc	Streaming mode for reporting the generated tokens (#1007 ) * Token streaming. * Use the token output stream. * Flush the output. * Ensure that the last characters get reported.	2023-09-30 15:04:11 +01:00
Laurent Mazare	bcb0ed8f1c	Self-contained safetensors for the multiprocess llama example. (#950 )	2023-09-24 06:54:49 +01:00
Laurent Mazare	fb1c2ac535	Add flash-attn support. (#912 ) * Add flash-attn support. * Add the use-flash-attn flag. * Re-enable flash-attn.	2023-09-20 14:07:55 +01:00
Laurent Mazare	7dd8e12472	Bump the crate versions to v0.2.3. (#886 ) * Bump the crate version. * Also update the python bindings.	2023-09-18 12:14:03 +01:00
Laurent Mazare	2257f4d475	Bump the crate version + update the changelog. (#822 )	2023-09-12 06:39:24 +01:00
Laurent Mazare	d3f05eae8c	Move some models to candle-transformers so that it's easier to re-use. (#794 ) * Move some models to candle-transformers so that they can be shared. * Also move falcon. * Move Llama. * Move whisper (partial).	2023-09-10 09:40:27 +01:00
Laurent Mazare	3cd7e7b51d	Fuse the rel-pos additions via a custom-op. (#786 ) * Fuse the rel-pos additions via a custom-op. * Run with rayon. * Add more tracing.	2023-09-09 10:46:09 +01:00
Laurent Mazare	618f4e4c78	Add some documentation. (#673 ) * Add some documentation. * Bump the crate version.	2023-08-30 11:54:00 +01:00
Laurent Mazare	a3f97c143d	Bump the crate version + update CHANGELOG. (#628 )	2023-08-27 18:17:11 +01:00
Laurent Mazare	0afbc435df	Add some configurable legend for yolo detection. (#603 ) * Add some configurable legend for yolo detection. * Clippyness.	2023-08-25 13:50:31 +01:00
Laurent Mazare	97909e5068	Move the yolo model bits in a separate file. (#602 ) * Move the yolo model bits in a separate file. * Improve the drawing. * Bugfix.	2023-08-25 12:47:55 +01:00
Laurent Mazare	aba1e90797	Add some group parameter to convolutions. (#566 ) * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.	2023-08-23 12:58:55 +01:00
Laurent Mazare	a8f61e66cc	Bump the crates version to 0.1.2. (#522 )	2023-08-20 08:07:07 +01:00
Laurent Mazare	b9661a1c25	Enable the image crate by default in examples (#501 ) * Enable the image crate by default so that it's easier to compile the stable diffusion example. * Also update the readme.	2023-08-18 10:00:05 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Nicolas Patry	dece0b8a76	Merge pull request #263 from huggingface/book_3 Book 3 (advanced loading + hub)	2023-08-09 16:50:11 +02:00
Laurent Mazare	3a62aee91f	Write the generated images using the image crate. (#363 ) * Use the image crate to write the generated images. * Make the dependency optional.	2023-08-09 15:26:44 +01:00
Laurent Mazare	b278834267	Support the Accelerate BLAS on macOS. (#325 ) * Add the accelerate feature. * Ffi tweaks.	2023-08-05 17:25:24 +01:00
Laurent Mazare	620f83cf66	Add the candle-datasets crate (#322 ) * Move the vision datasets to a separate crate. * Move the batcher bits. * Update the readme. * Move the tiny-stories bits. --------- Co-authored-by: Jane Doe <jane.doe@example.org>	2023-08-05 08:56:50 +01:00
Nicolas Patry	c11e78b334	Odd rebase artifact.	2023-08-02 18:40:24 +02:00
Nicolas Patry	1b705a426f	Remove duplicate.	2023-08-02 18:40:24 +02:00
Nicolas Patry	a44471a305	Adding more details on how to load things. - Loading with memmap - Loading a sharded tensor - Moved some snippets to `candle-examples/src/lib.rs` This is because managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706 - This causes a non aligned inclusion https://github.com/rust-lang/mdBook/pull/1856 which we have to ignore fmt to remove. mdbook might need some more love :)	2023-08-02 18:40:24 +02:00
Laurent Mazare	4fe8a02f88	Update the repo location. (#305 )	2023-08-02 11:12:18 +01:00
Laurent Mazare	d38943aadc	Add version numbers for all the candle crates (#303 ) * Switch to candle-gemm for the time being. * Add the missing versions.	2023-08-02 10:52:13 +01:00
Laurent Mazare	51e51da896	Rename the candle crate to candle-core (#301 ) * Rename to candle-core. * More candle-core renaming.	2023-08-02 08:20:22 +01:00
Laurent Mazare	a27239f3d9	Add training for the llama2.c example (#296 ) * Rework the commands and run inference by default. * Add the training module and load the training dataset. * Random dataset iterator. * Proper valid-loss computation. * Compute the evaluation loss. * Add more substance to the training loop.	2023-08-01 17:23:07 +01:00
Laurent Mazare	38ff693af0	Add a flag to save the trained weights. (#279 )	2023-07-30 15:41:42 +01:00
Nicolas Patry	97181a77c0	Making multiprocess require flash-attn.	2023-07-28 23:31:24 +02:00
Nicolas Patry	4002968cf5	Put back `"dep:half"	2023-07-28 10:34:21 +00:00
Nicolas Patry	be256a6ba6	Fixing.	2023-07-28 10:23:05 +00:00
Nicolas Patry	d2dea11ef6	Fixing nccl feature.	2023-07-28 12:19:20 +02:00
Nicolas Patry	1735e4831e	TP sharding v2	2023-07-27 09:58:14 +02:00
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00
Laurent Mazare	35b65fed88	Add llama2.c as an example. (#229 ) * Start adding llama2.c. * Model loading. * Add the llama-v2 model. * Start converting the weights. * Rotary embedding tweaks. * Get the model to generate some tokens.	2023-07-24 09:13:50 +01:00
Laurent Mazare	b8a10425ad	Kernel build example (#224 ) * Build example kernels. * Add some sample custom kernel. * Get the example kernel to compile. * Add some cuda code. * More cuda custom op. * More cuda custom ops.	2023-07-23 07:15:37 +01:00
Nicolas Patry	439321745a	Removing `candle-hub` internal to extract into `hf-hub` standalone.	2023-07-19 15:04:38 +02:00
Laurent Mazare	b8abe2bb4b	Factorize the tokenizers version in the workspace cargo def. (#186 )	2023-07-18 06:48:13 +01:00

1 2

63 Commits