Commit Graph

63 Commits

Author SHA1 Message Date
37c539f2b7 Helper function to load sharded safetensors files (#1481)
* Fix the quantized mistral example.

* Add a helper function to load sharded safetensors weights.

* Use the sharded loader.
2023-12-25 21:49:21 +01:00
5b35fd0fcf MMLU evaluation for Phi. (#1474)
* MMLU evaluation for Phi.

* Improve the evaluation.
2023-12-23 15:28:36 +01:00
9fc210fae8 Merge pull request #1318 from huggingface/metal4
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
94817dac56 Bump the crate version to 0.3.2. (#1452) 2023-12-17 05:34:53 -06:00
4349ff1fc2 Starting to fix some tests.
Few fixes.

Going back on remote metal-rs.

Reusing a single buffer (for now) to speed things up.

Adding some half kernels.

All tests are panicking instead of random failure.

Putting back f16 index select.

Add erf.

Working version for llama2-c.

Fixes + cache compute_pipeline_state.

BF16 metal fix.

Remove some prints.

new_owned -> new()..to_owned().

Better batched matmul.

Metal operational.

Reuse buffers on our own reference counts.

Tmp gemm.

Revert "Tmp gemm."

This reverts commit c65f68e988.

Interleave committing.

Speeding up copies using blit.

Fmt.

Fmt.

Remove the assert!

Fmt all.

Fixes after big rebase.

Add softmax for half and bfloat + tests

Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
a209ce8ceb Update for 0.3.1. (#1324) 2023-11-11 18:48:52 +00:00
a773a4b22b [ONNX] Support a couple more ops. (#1284)
* Support the shape op in ONNX.

* Share the axis normalization bits.

* Add some limited support for gather.

* Unsqueeze.

* Comparison with broadcasting.

* Add Not + handle i32.
2023-11-06 22:44:58 +01:00
2a45bcf943 Put the onnx example behind a feature flag. (#1276)
* Put the onnx example behind a feature flag.

* Exclude the onnx bits from the workspace.

* README tweaks.
2023-11-06 07:45:07 +01:00
928a9d906e [ONNX] Do not generate values for constants. (#1272)
* Do not generate values for constants.

* Add an onnx based example using squeezenet.
2023-11-05 11:23:14 +01:00
174b208052 PyO3: Better shape handling (#1143)
* Negative and `*args` shape handling

* Rename to `PyShapeWithHole` + validate that only one hole exists

* Regenerate stubs

---------

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
2023-10-29 15:41:44 +00:00
29c7f2565d Add some reinforcement learning example. (#1090)
* Add some reinforcement learning example.

* Python initialization.

* Get the example to run.

* Vectorized gym envs for the atari wrappers.

* Get some simulation loop to run.
2023-10-14 16:46:43 +01:00
096dee7073 Bump the version to 0.3.0. (#1014)
* Bump the version to 0.3.0.

* Changelog update.
2023-10-01 13:51:57 +01:00
06207332bc Streaming mode for reporting the generated tokens (#1007)
* Token streaming.

* Use the token output stream.

* Flush the output.

* Ensure that the last characters get reported.
2023-09-30 15:04:11 +01:00
bcb0ed8f1c Self-contained safetensors for the multiprocess llama example. (#950) 2023-09-24 06:54:49 +01:00
fb1c2ac535 Add flash-attn support. (#912)
* Add flash-attn support.

* Add the use-flash-attn flag.

* Re-enable flash-attn.
2023-09-20 14:07:55 +01:00
7dd8e12472 Bump the crate versions to v0.2.3. (#886)
* Bump the crate version.

* Also update the python bindings.
2023-09-18 12:14:03 +01:00
2257f4d475 Bump the crate version + update the changelog. (#822) 2023-09-12 06:39:24 +01:00
d3f05eae8c Move some models to candle-transformers so that it's easier to re-use. (#794)
* Move some models to candle-transformers so that they can be shared.

* Also move falcon.

* Move Llama.

* Move whisper (partial).
2023-09-10 09:40:27 +01:00
3cd7e7b51d Fuse the rel-pos additions via a custom-op. (#786)
* Fuse the rel-pos additions via a custom-op.

* Run with rayon.

* Add more tracing.
2023-09-09 10:46:09 +01:00
618f4e4c78 Add some documentation. (#673)
* Add some documentation.

* Bump the crate version.
2023-08-30 11:54:00 +01:00
a3f97c143d Bump the crate version + update CHANGELOG. (#628) 2023-08-27 18:17:11 +01:00
0afbc435df Add some configurable legend for yolo detection. (#603)
* Add some configurable legend for yolo detection.

* Clippyness.
2023-08-25 13:50:31 +01:00
97909e5068 Move the yolo model bits in a separate file. (#602)
* Move the yolo model bits in a separate file.

* Improve the drawing.

* Bugfix.
2023-08-25 12:47:55 +01:00
aba1e90797 Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
a8f61e66cc Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
b9661a1c25 Enable the image crate by default in examples (#501)
* Enable the image crate by default so that it's easier to compile the stable diffusion example.

* Also update the readme.
2023-08-18 10:00:05 +01:00
531f23b4d0 Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
90374097dc Cudnn support (#445)
* Add a cudnn feature to be used for conv2d.

* Allocate the proper workspace.

* Only create a single cudnn handle per cuda device.

* Proper cudnn usage.

* Bugfix.
2023-08-14 21:30:41 +01:00
dece0b8a76 Merge pull request #263 from huggingface/book_3
Book 3 (advanced loading + hub)
2023-08-09 16:50:11 +02:00
3a62aee91f Write the generated images using the image crate. (#363)
* Use the image crate to write the generated images.

* Make the dependency optional.
2023-08-09 15:26:44 +01:00
b278834267 Support the Accelerate BLAS on macOS. (#325)
* Add the accelerate feature.

* Ffi tweaks.
2023-08-05 17:25:24 +01:00
620f83cf66 Add the candle-datasets crate (#322)
* Move the vision datasets to a separate crate.

* Move the batcher bits.

* Update the readme.

* Move the tiny-stories bits.

---------

Co-authored-by: Jane Doe <jane.doe@example.org>
2023-08-05 08:56:50 +01:00
c11e78b334 Odd rebase artifact. 2023-08-02 18:40:24 +02:00
1b705a426f Remove duplicate. 2023-08-02 18:40:24 +02:00
a44471a305 Adding more details on how to load things.
- Loading with memmap
- Loading a sharded tensor
- Moved some snippets to `candle-examples/src/lib.rs` This is because
managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706
- This causes a non aligned inclusion  https://github.com/rust-lang/mdBook/pull/1856 which we have
to ignore fmt to remove.

mdbook might need some more love :)
2023-08-02 18:40:24 +02:00
4fe8a02f88 Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
d38943aadc Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
51e51da896 Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
a27239f3d9 Add training for the llama2.c example (#296)
* Rework the commands and run inference by default.

* Add the training module and load the training dataset.

* Random dataset iterator.

* Proper valid-loss computation.

* Compute the evaluation loss.

* Add more substance to the training loop.
2023-08-01 17:23:07 +01:00
38ff693af0 Add a flag to save the trained weights. (#279) 2023-07-30 15:41:42 +01:00
97181a77c0 Making multiprocess require flash-attn. 2023-07-28 23:31:24 +02:00
4002968cf5 Put back `"dep:half" 2023-07-28 10:34:21 +00:00
be256a6ba6 Fixing. 2023-07-28 10:23:05 +00:00
d2dea11ef6 Fixing nccl feature. 2023-07-28 12:19:20 +02:00
1735e4831e TP sharding v2 2023-07-27 09:58:14 +02:00
d9f9c859af Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.

* More flash attn.

* Set up the flash attn parameters.

* Get things to compile locally.

* Move the flash attention files in a different directory.

* Build the static C library with nvcc.

* Add more flash attention.

* Update the build part.

* Better caching.

* Exclude flash attention from the default workspace.

* Put flash-attn behind a feature gate.

* Get the flash attn kernel to run.

* Move the flags to a more appropriate place.

* Enable flash attention in llama.

* Use flash attention in llama.
2023-07-26 07:48:10 +01:00
35b65fed88 Add llama2.c as an example. (#229)
* Start adding llama2.c.

* Model loading.

* Add the llama-v2 model.

* Start converting the weights.

* Rotary embedding tweaks.

* Get the model to generate some tokens.
2023-07-24 09:13:50 +01:00
b8a10425ad Kernel build example (#224)
* Build example kernels.

* Add some sample custom kernel.

* Get the example kernel to compile.

* Add some cuda code.

* More cuda custom op.

* More cuda custom ops.
2023-07-23 07:15:37 +01:00
439321745a Removing candle-hub internal to extract into hf-hub standalone. 2023-07-19 15:04:38 +02:00
b8abe2bb4b Factorize the tokenizers version in the workspace cargo def. (#186) 2023-07-18 06:48:13 +01:00