candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 11:37:11 +00:00

Author	SHA1	Message	Date
Laurent Mazare	949f1eae6f	Implement a couple more binary ops. (#693 )	2023-08-31 21:30:15 +01:00
Laurent Mazare	9874d843f1	Fix the accelerate build (#678 ) * Cosmetic changes. * Fix the accelerate build for tanh.	2023-08-30 18:31:14 +02:00
Laurent Mazare	ad8a62dbf5	Add tanh. (#675 ) * Add tanh. * Use tanh in the lstm block. * Add a test for tanh forward and backward passes.	2023-08-30 13:54:50 +01:00
Laurent Mazare	618f4e4c78	Add some documentation. (#673 ) * Add some documentation. * Bump the crate version.	2023-08-30 11:54:00 +01:00
Laurent Mazare	393690387f	Support dilation in conv-transpose2d. (#671 )	2023-08-30 09:22:00 +01:00
Laurent Mazare	9b25113393	Small cleanups (avoid some possible mutations) (#670 ) * More mut cleanup. * Factor out some common bits.	2023-08-30 08:54:00 +01:00
Laurent Mazare	a1a5ab8b0a	Neon optimized vecdot (#666 ) * Q5k vecdot. * Add the q3k vecdot. * Q2k vecdot. * Move the quantized model to its own file.	2023-08-29 22:28:46 +01:00
Laurent Mazare	59b731de99	Add the powf op. (#664 ) * Add the powf op. * Cuda kernels and backprop. * Add a test.	2023-08-29 20:48:18 +01:00
Laurent Mazare	2d3fcad267	Simplify usage of the pool functions. (#662 ) * Simplify usage of the pool functions. * Small tweak. * Attempt at using apply to simplify the convnet definition.	2023-08-29 19:12:16 +01:00
Laurent Mazare	71221559d3	Fix the dilated convolutions. (#659 )	2023-08-29 16:37:42 +01:00
Laurent Mazare	a044907ffc	Dilated convolutions (#657 ) * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.	2023-08-29 16:12:11 +01:00
Lukas Kreussel	ee8bb1bde1	Add `avx` implemenetations of `q2k`, `q3k` and `q5k` vec-dot functions (#654 ) * `q2k` avx implementation * `q3k` avx implementation * `q5k` avx implementation * `avx` make masks constant * clippy stuff	2023-08-29 13:35:56 +01:00
Laurent Mazare	d0a330448d	Backprop support for pooling ops. (#652 ) * Backprop support for pooling ops. * max-pool gradient.	2023-08-29 10:17:59 +01:00
Laurent Mazare	4b8d57ba15	AVX version of the q4k vecdot. (#651 )	2023-08-29 09:41:17 +01:00
Laurent Mazare	fd3131a4ce	Fix the debug implementation. (#648 )	2023-08-28 22:51:39 +01:00
Laurent Mazare	037b41c9dc	Cuda conv transpose (#645 ) * Cuda kernel for conv-transpose. * Fix the cuda kernel. * Fix the tests.	2023-08-28 20:58:49 +01:00
Laurent Mazare	72fae3140c	Optimize the conv2d transpose cpu kernel. (#644 ) * Optimize the conv2d transpose cpu kernel. * Use multiple cores.	2023-08-28 20:06:31 +01:00
Laurent Mazare	ca26198b95	Fix the cpu kernel for conv-transpose. (#643 )	2023-08-28 16:45:12 +01:00
Laurent Mazare	b292047882	Backprop for conv2d. (#638 ) * Start adding backprop for conv2d. * Backprop for conv2d. * Bugfix + start adding a conv2d test. * Conv2d backprop testing. * More conv fixes.	2023-08-28 16:08:55 +01:00
Laurent Mazare	3cca89cc70	Add conv-transpose. (#635 ) * Add conv-transpose. * Return zeros for now. * Naive CPU implementation. * Add a conv-transpose test + fix the cpu implementation. * Add a second test.	2023-08-28 10:10:12 +01:00
Laurent Mazare	1da71a5da1	Neon optimized version of the q4k vecdot product. (#632 )	2023-08-27 21:30:47 +01:00
Laurent Mazare	a3f97c143d	Bump the crate version + update CHANGELOG. (#628 )	2023-08-27 18:17:11 +01:00
Laurent Mazare	be471d50ab	Llama quantization. (#625 )	2023-08-27 14:08:15 +01:00
Laurent Mazare	7151f2cf63	Add the quantize command. (#624 ) * Add the quantize command. * Bugfix for writing gguf files. * And add a comment.	2023-08-27 11:35:19 +01:00
Laurent Mazare	5320aa6b7d	Move the test-utils bits to a shared place. (#619 )	2023-08-27 09:42:22 +01:00
Laurent Mazare	a8b39dd7b7	Fix for q5_1 quantization. (#617 ) * Fix for q5_1 quantization. * Fix some typos.	2023-08-27 08:31:18 +01:00
Laurent Mazare	fa0d75b18d	Quantization tests + fix some issues. (#616 )	2023-08-27 08:17:38 +01:00
Laurent Mazare	28658054ff	More missing quantized bits. (#615 ) * Q4_1 support. * Add Q5_1 quantization. * Tweak.	2023-08-27 07:52:26 +01:00
Laurent Mazare	ab36a7f3e3	Fix for when f16c is not available. (#614 )	2023-08-27 07:19:52 +01:00
Laurent Mazare	f704e39761	Missing quants ops (#611 ) * Another transmute tweak. * Changelog tweak. * Add some missing quantized ops.	2023-08-26 20:09:04 +01:00
Laurent Mazare	fdf15f0e05	Another transmute tweak. (#610 ) * Another transmute tweak. * Changelog tweak.	2023-08-26 13:00:24 +01:00
Laurent Mazare	06b37ea7ad	Avoid using tmp values. (#609 )	2023-08-26 12:28:28 +01:00
Lukas Kreussel	c72eb3d75b	Add reference implementation for `q4k` and `q5k` (#586 ) * add `q2k` vec-dot * `q3k` vec-dot + quantization bugfix * `q4k` vec-dot * `q5k` vec-dot * Validate against GGML unit test results. * Remove some more `transmutes`	2023-08-26 12:07:54 +01:00
Patrick von Platen	71518caeee	Align tensor device print more with PyTorch (#590 ) * Improve tensor print * Use CudaDevice only if enabled with cuda feature * run rust fmt * up * improve * rustfmt	2023-08-26 11:20:22 +01:00
Laurent Mazare	6559eae72c	Avoid some transmutes. (#607 )	2023-08-25 18:21:37 +01:00
Laurent Mazare	9c8d6dbc2a	Neon intrinsics for the q8_0 vecdot. (#604 ) * Neon intrinsics for the q8_0 vecdot. * Get the tests to run with accelerate (with some numerical error failures).	2023-08-25 14:42:18 +01:00
Laurent Mazare	afc10a3232	AVX version for the q8-0 multiplications. (#598 )	2023-08-25 10:14:49 +01:00
Laurent Mazare	c093b03d51	Generic implementation of vecdot for q80. (#596 ) * Generic implementation of vecdot for q80. * Add support for code-llama 7b. * Support more code-llama.	2023-08-25 09:04:05 +01:00
Laurent Mazare	d8ba0452dc	Fail on bf16. (#594 )	2023-08-25 06:10:38 +01:00
Laurent Mazare	2cde0cb74b	More pickle support. (#588 ) * More pickle support. * Be more verbose.	2023-08-24 18:45:10 +01:00
Laurent Mazare	e21c686cdc	Fixes for clippy 1.72. (#587 )	2023-08-24 17:46:17 +01:00
Laurent Mazare	c265ac50fa	Add a function to write gguf files. (#585 ) * Add a function to write gguf files. * More GGUF file writing. * Write the tensor data in GGUF files.	2023-08-24 17:03:06 +01:00
Laurent Mazare	afd965f77c	More non square testing (#582 ) * Add more non square testing. * More testing.	2023-08-24 13:01:04 +01:00
Lukas Kreussel	d2f42ab086	Referenze implementations of `q2k` and `q3k` vec-dot functions (#580 ) * add `q2k` vec-dot * `q3k` vec-dot + quantization bugfix	2023-08-24 12:35:54 +01:00
Laurent Mazare	ca318a6ec7	Add to the cuda example a reproduction of the issue. (#579 ) * Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment.	2023-08-24 12:07:31 +01:00
Laurent Mazare	dd64465899	Add a test for conv2d with padding + bugfix the random number generation on cuda. (#578 ) * Add a test for conv2d with padding. * Cosmetic changes. * Bugfix the rand function on the cuda backend.	2023-08-24 10:16:37 +01:00
Laurent Mazare	431051cc32	Add Efficientnet (#572 ) * EfficientNet. * Complete the efficientnet implementation. * Improve group handling. * Get the efficientnet to work.	2023-08-23 18:02:58 +01:00
Laurent Mazare	7478dda255	Cosmetic tweaks. (#570 )	2023-08-23 15:45:40 +01:00
Laurent Mazare	329f661d9b	Trace softmax (#568 ) * Trace the softmax op. * Inline the sum. * Add min/max vec operations.	2023-08-23 15:25:50 +01:00
Lukas Kreussel	075b505480	Mirror GGML's unit tests (#569 ) * Add ggml unit tests * simplify random matmul test for other test cases	2023-08-23 15:25:17 +01:00

... 7 8 9 10 11 ...

779 Commits