candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 02:16:37 +00:00

Author	SHA1	Message	Date
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	e29c7809ec	Parallelise the CPU kernels for the conv ops. (#401 ) * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.	2023-08-11 05:51:58 +01:00
Nicolas Patry	379eadc68e	Working now.	2023-08-10 19:43:25 +02:00
Nicolas Patry	7e4fbc1e17	[DO NOT MERGE] temporary PR so users can try out on older GPUs.	2023-08-10 19:36:31 +02:00
Laurent Mazare	c8039579a5	Conv1d optimize (#392 ) * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.	2023-08-10 15:23:52 +01:00
Lei	3bbc08a8df	Fix randn cpu (#382 ) * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use	2023-08-10 05:33:44 +01:00
Laurent Mazare	da26e2832c	Update gemm to 0.15.6. (#378 )	2023-08-09 21:04:28 +01:00
Laurent Mazare	3a62aee91f	Write the generated images using the image crate. (#363 ) * Use the image crate to write the generated images. * Make the dependency optional.	2023-08-09 15:26:44 +01:00
Laurent Mazare	e72ba0b9e7	Add the license files. (#335 )	2023-08-07 14:11:27 +01:00
Laurent Mazare	b278834267	Support the Accelerate BLAS on macOS. (#325 ) * Add the accelerate feature. * Ffi tweaks.	2023-08-05 17:25:24 +01:00
Laurent Mazare	620f83cf66	Add the candle-datasets crate (#322 ) * Move the vision datasets to a separate crate. * Move the batcher bits. * Update the readme. * Move the tiny-stories bits. --------- Co-authored-by: Jane Doe <jane.doe@example.org>	2023-08-05 08:56:50 +01:00
Laurent Mazare	4fe8a02f88	Update the repo location. (#305 )	2023-08-02 11:12:18 +01:00
Laurent Mazare	d38943aadc	Add version numbers for all the candle crates (#303 ) * Switch to candle-gemm for the time being. * Add the missing versions.	2023-08-02 10:52:13 +01:00
Laurent Mazare	6e33ff62d6	Update cudarc now that it includes the cublas-f16 and nccl changes. (#300 )	2023-08-02 05:54:28 +01:00
Nicolas Patry	d2dea11ef6	Fixing nccl feature.	2023-07-28 12:19:20 +02:00
Nicolas Patry	4f260ef025	Merge pull request #216 from LaurentMazare/llama_multiprocess2 TP sharding v2	2023-07-28 08:06:13 +01:00
Nicolas Patry	ca479a873e	Upgrading hf-hub to `0.2.0` (Modified API to not pass the Repo around all the time)	2023-07-27 20:05:02 +02:00
Nicolas Patry	b7814f66b4	PyO3 is back.	2023-07-27 09:58:47 +02:00
Nicolas Patry	ed58de7551	Fixed TP sharded version.	2023-07-27 09:58:46 +02:00
Nicolas Patry	1735e4831e	TP sharding v2	2023-07-27 09:58:14 +02:00
Laurent Mazare	6475bfadfe	Simplify Tensor::randn. (#255 ) * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.	2023-07-27 07:40:36 +01:00
Laurent Mazare	89fd988836	Update to the latest gemm. (#250 )	2023-07-26 17:00:02 +01:00
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00
Laurent Mazare	5a26cba733	Re-organize the wasm examples (#231 ) * Move the whisper example. * More renaming. * Add llama2 as a new wasm example. * Live generation. * More of the llama wasm example. * Formatting.	2023-07-24 12:36:02 +01:00
Laurent Mazare	dc416243a3	Bump the hf-hub dependency to 0.1.3. (#206 )	2023-07-20 07:27:52 +01:00
Laurent Mazare	c34f932319	Fix the mkl build. (#204 ) * Fix the mkl build. * Fix the build properly.	2023-07-19 19:41:11 +01:00
Nicolas Patry	439321745a	Removing `candle-hub` internal to extract into `hf-hub` standalone.	2023-07-19 15:04:38 +02:00
Laurent Mazare	b8abe2bb4b	Factorize the tokenizers version in the workspace cargo def. (#186 )	2023-07-18 06:48:13 +01:00
Laurent Mazare	f0cccd08f0	Bert tracing (#184 ) * Add some tracing to bert. * More tracing. * Add a flag for tracing.	2023-07-17 19:40:42 +01:00
Laurent Mazare	49ea09c73c	Gemm update (#183 ) * Update the gemm dependency. * Update the comment too. * Pin the sha256 dependency.	2023-07-17 14:05:39 +01:00
Laurent Mazare	104f89df31	Centralize the dependency versions and inherit them. (#177 )	2023-07-16 07:47:17 +01:00
Laurent Mazare	d1f5d44c04	Reenable pyo3 in the workspace list (#170 ) * Enable pyo3 back. * Adapt the CI.	2023-07-14 19:54:38 +01:00
Nicolas Patry	4ed56d7861	Removing cuda default. Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.	2023-07-14 16:52:15 +02:00
Laurent Mazare	88f666781f	Wasm proof of concept. (#167 ) * Wasm proof of concept. * Run whisper inference in the browser. * Some fixes. * Move the wasm example. * Change the tokenizer config.	2023-07-14 14:51:46 +01:00
Laurent Mazare	21aa29ddce	Use a rwlock for inner mutability. (#156 ) * Use a rw-lock. * Make clippy happier.	2023-07-13 11:25:24 +01:00
Laurent Mazare	50b0946a2d	Tensor mutability (#154 ) * Working towards tensor mutability. * Use a ref-cell to provide tensor mutability.	2023-07-13 11:04:40 +01:00
Laurent Mazare	ba35d895e7	Sketch the candle-transformers crate. (#147 ) * Sketch the candle-transformers crate. * Format the empty files.	2023-07-12 13:49:31 +01:00
Laurent Mazare	9ce0f1c010	Sketch the candle-nn crate. (#115 ) * Sketch the candle-nn crate. * Tweak the cuda dependencies. * More cuda tweaks.	2023-07-10 08:50:09 +01:00
Laurent Mazare	4afa461b34	Sketch the Falcon model. (#93 ) * Sketch the Falcon model. * Add more substance to the falcon example. * Falcon (wip). * Falcon (wip again). * Falcon inference. * Get the weights from the api and properly generate the model. * Use the proper model. * Fix the file/revision names. * Fix bias handling. * Recompute the rot embeddings. * Fix the input shape. * Add the release-with-debug profile. * Silly bugfix. * More bugfixes. * Stricter shape checking in matmul.	2023-07-06 19:01:21 +01:00
laurent	fdb1acd2ff	Move llama in a cargo-examples directory.	2023-07-03 11:30:58 +01:00
laurent	ebb0fedf14	Very simple pyo3 bindings for candle.	2023-07-01 20:36:44 +01:00
laurent	af66f0829e	Revert the new profile.	2023-06-29 19:08:50 +01:00
laurent	3232df9458	Add some KV cache to llama.	2023-06-29 15:29:40 +01:00
Nicolas Patry	1a82bc50c9	[Tmp] Adding candle-hub	2023-06-27 13:58:23 +02:00
Nicolas Patry	d7f729fb8f	Refactor the hierarchy.	2023-06-27 11:57:27 +02:00
laurent	22da2c7e02	More f16 and bf16 support.	2023-06-26 20:52:01 +01:00
laurent	a31411fd91	Start adding f16/bf16 support.	2023-06-26 19:37:47 +01:00
laurent	11696e6377	Faster model weight loading.	2023-06-26 07:40:11 +01:00
laurent	96c098b6cd	Remove the unecessary features.	2023-06-24 18:15:44 +01:00

1 2 3 4

157 Commits