candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Author	SHA1	Message	Date
Laurent Mazare	32f567bac4	Fix loading the gguf files. (#1913 )	2024-03-22 10:28:38 +01:00
Thomas Santerre	fee33b45c2	Add support for strided index-select on Metal (#1909 ) * initial implementation * use correct index, but still not breaking like it should have... * fix test	2024-03-22 07:30:02 +01:00
Laurent Mazare	6708870e63	Add the alloc_uninit function. (#1901 ) * Add the alloc_uninit function. * Dummy metal fix. * Lazy initialization.	2024-03-22 07:25:23 +01:00
Laurent Mazare	a00e24d752	Improve the error message on overlong prompts. (#1908 )	2024-03-21 21:08:07 +01:00
Laurent Mazare	c07e4057ab	Fix for the llama model. (#1906 )	2024-03-21 19:36:10 +01:00
Laurent Mazare	c0bdd9c7a6	Use the fast RmsNorm in the quantized model. (#1904 )	2024-03-21 18:49:35 +01:00
Thomas Santerre	9563a5fee4	Add support for conv_transpose2d on Metal backend (#1903 ) * add support for conv transpose 2d and add bench mark for float types * update bench calculation * enable testing all conv operations on metal	2024-03-21 18:08:45 +01:00
Laurent Mazare	ec97c98e81	Async tensor copying. (#1900 )	2024-03-21 13:09:42 +01:00
Sanchit Gandhi	bb3ee48039	whisper readme (#1899 )	2024-03-21 12:54:09 +01:00
Sanchit Gandhi	0c11e055be	support distil-large-v3 (#1898 )	2024-03-21 11:46:49 +01:00
Laurent Mazare	18036c6ccb	Update the image crate + use the re-exported version. (#1893 ) * Update the image crate + use the re-exported version. * Update to using ab_glyph.	2024-03-21 10:56:41 +01:00
Laurent Mazare	0fddec762e	RmsNorm kernel for metal. (#1895 ) * RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.	2024-03-21 09:48:56 +01:00
Laurent Mazare	74b7f59261	Prepare for the custom-op extension. (#1892 )	2024-03-21 07:02:20 +01:00
Laurent Mazare	af7f8b87d3	Custom op for RmsNorm (#1890 ) * Trying out a custom RmsNorm cuda kernel. * CPU implementation for rms-norm. * Cuda wrappers. * Add some validation. * Add some testing. * More testing.	2024-03-21 06:36:28 +01:00
Laurent Mazare	b219903d0f	Cuda backend optimization (#1886 ) * Attempt at making the kernel faster. * Also adapt the cast kernels. * Also apply to binary ops.	2024-03-20 18:32:55 +01:00
Laurent Mazare	469635a3eb	Minor cleanup. (#1885 )	2024-03-20 14:38:27 +01:00
Laurent Mazare	455c42aa72	Avoid copying the data on squeeze and unsqueeze. (#1884 ) * Avoid copying the data on squeeze and unsqueeze. * Fix the quantized llama example. * Unrelated fix for the quantized stable-lm example on cuda. * Fix for mamba on cuda (unrelated to the PR).	2024-03-20 13:04:36 +01:00
Thomas Santerre	2a8679509e	Add support for conv_transpose1d for metal backend (#1874 ) * first attempt * progress * integrate into metal backend * finish and get test passing * add other dtype support * update transpose1d dtypes supported	2024-03-19 08:46:58 +01:00
Laurent Mazare	143c481c20	Expose candle gather op in pyo3. (#1870 )	2024-03-18 21:54:15 +01:00
Laurent Mazare	f115895b9e	Apply rustfmt. (#1873 )	2024-03-18 21:43:31 +01:00
Jani Monoses	90fc82211f	Use a common with_tracing::RmsNorm in a few models. (#1871 ) * Add RmsNorm with tracing. * Use with_tracing::RmsNorm in some models.	2024-03-18 21:40:06 +01:00
Gabriel	6a966cf9e0	Add a DQN example to the reinforcement-learning section (#1872 )	2024-03-18 21:22:53 +01:00
Thomas Santerre	04a61a9c72	Add avg_pool2d metal implementation for the metal backend (#1869 ) * implement metal avg pool 2d * fixX * add suggested precision workaround for the accumulator	2024-03-18 18:50:14 +01:00
Laurent Mazare	58605252e8	Microphone support for the encodec example. (#1866 )	2024-03-18 11:19:46 +01:00
Laurent Mazare	d365ef32d9	Improve the encodec example: handle resampling. (#1865 ) * Improve the encodec example: handle resampling. * Play the audio directly.	2024-03-18 10:09:40 +01:00
Thomas Santerre	754fa1e813	Add support for max_pool2d for Metal backend (#1863 ) * first pass at implementation of maxpool2d * Add definitions for other dtypes * add tests for other dtypes * Cosmetic tweaks + re-enable maxpool2d tests for metal. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-03-18 08:33:30 +01:00
Thomas Santerre	184105792f	add test for index add and add missing match statements (#1862 )	2024-03-17 22:19:12 +01:00
Laurent Mazare	a15f859ab4	Fix for the encodec example. (#1861 )	2024-03-17 21:15:12 +01:00
Thomas Santerre	e316cb6997	add support for casting between all datatypes (#1860 )	2024-03-17 20:55:11 +01:00
Laurent Mazare	ce9fbc3682	Optimize the cat operation on contiguous tensors (#1855 ) * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.	2024-03-17 10:49:13 +01:00
Thomas Santerre	db8b24ae92	Add support for index u8/i64 and input f16/bf16 scatter-add on metal (#1849 ) * add support and tests for scatter add on metal * add support for all datatypes	2024-03-17 08:09:43 +01:00
Laurent Mazare	74bf6994b1	Move the image tensor to the appropriate device. (#1856 )	2024-03-16 22:25:46 +01:00
Laurent Mazare	cdc4c172c4	Implement the error trait for DTypeParseError. (#1852 )	2024-03-15 08:37:27 +01:00
Jani Monoses	e1f9c3776d	StableLM-2 models were updated to use GPT-2 tokenization. (#1847 )	2024-03-14 21:01:36 +01:00
Tyler Rockwood	3318fe30fb	Update gemma README (#1843 ) * Update gemma README * Fixit	2024-03-13 21:41:36 +01:00
Thomas Santerre	2bb9c683b9	Update README.md (#1840 ) Adds the candle-einops to the readme as an external resource	2024-03-13 14:36:25 +01:00
Laurent Mazare	ff03fd3fb3	Expose some helper functions to create quantized models. (#1837 )	2024-03-12 11:30:24 +01:00
Laurent Mazare	df5f69444e	Properly handle the batch dimension in cuda quantized matmul. (#1832 )	2024-03-10 20:23:43 +01:00
Laurent Mazare	0c5eecbc0f	Add some tracing to metavoice. (#1826 )	2024-03-09 12:24:11 +01:00
Laurent Mazare	56c9d3ee7b	Fix the model path for rwkv. (#1825 )	2024-03-09 11:21:48 +01:00
Laurent Mazare	dd00482ea3	Quantized version of the metavoice model. (#1824 ) * Quantized version of the metavoice model. * Integrate the quantized version of metavoice.	2024-03-09 11:06:04 +01:00
Laurent Mazare	936f6a4840	Fix dequantization. (#1823 )	2024-03-08 23:12:13 +01:00
Laurent Mazare	3440cec3a0	Fast CPU kernel for transposed 1d convolutions. (#1822 ) * Fast CPU kernel for transposed 1d convolutions. * Bugfix.	2024-03-08 22:43:07 +01:00
Laurent Mazare	e7fc1daa21	Bump the crate versions to 0.4.2. (#1821 )	2024-03-08 22:01:51 +01:00
Niklas Hallqvist	be5b68cd0b	Metal random-generation bug fixes (#1811 ) * use_resource API misunderstood. It is not additive. Several usages must be bit-ORed together. * The seeding was incorrect and used the address instead of the value of the passed in seed. * Add a check that likely exhibits failure to update the seed between generation of random tensors. * Buffer overrun, the length given to the std::ptr::copy call was in bytes, and not 32-bit units. * By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted. * Revert "By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted." This reverts commit `d7302de9` Discussion in https://github.com/huggingface/candle/pull/1811#issuecomment-1983079119 * The Metal random kernel failed to set element N/2 of tensors with N elements, N being even. The reason was that all threads but thread 0 all created 2 random samples, but thread 0 only one, i.e. an odd number. In order to produce an even number of samples, the early termination of thread 0 should only everr occur for odd sized tensors. * Add a test catching any deterministic tensor element in rand and randn output. --------- Co-authored-by: niklas <niklas@appli.se> Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>	2024-03-08 16:11:50 +01:00
Laurent Mazare	ea984d0421	Expose more printer options. (#1817 )	2024-03-08 15:04:18 +01:00
Laurent Mazare	9634583781	Expose a couple layout methods. (#1816 )	2024-03-08 10:52:22 +01:00
Kirpal Grewal	758366160e	add clone to candle dropout (#1814 )	2024-03-08 08:18:01 +01:00
Niklas Hallqvist	0a3487a776	Add a --seed argument to the stable-diffusion example. (#1812 ) * Add a --seed argument to the stable-diffusion example. * Make the case when no seed is specified, that it will not be set, but use the engine's default. This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it. --------- Co-authored-by: niklas <niklas@appli.se>	2024-03-08 08:17:36 +01:00
ivarflakstad	0c09d10f32	Improve metal buffer usage (#1807 ) * Improve metal buffer usage * Clone cpu storage when loading to reduce wait_until_complete calls * Use powers of two for buffer sizes so reuse is more likely. * Select best available buffer by size. * Add count to MetalStorage -> can use buffer with different size Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co> * Simplify new buffer creation without blit copy. Revert &[] -> Vec * Add documentation on newBufferWithBytes safety / synchronization * Drop unused buffers after command buffer is done syncing. --------- Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co>	2024-03-07 09:42:34 +01:00

... 3 4 5 6 7 ...

2079 Commits