fc59bc31bf
fix: add missing gpu fill_* ( #996 )
2023-09-29 15:49:30 +01:00
01b92cd959
fixes slice_scatter dim type ( #988 )
2023-09-29 07:54:45 +01:00
25657804ef
Simd128 q2k vecdot ( #982 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
2023-09-28 12:16:35 +01:00
5e1c595e00
Optimize the index-select cuda kernel. ( #976 )
2023-09-28 09:05:29 +01:00
9cb110c44c
Sketch a simd128 optimized q4k vecdot. ( #977 )
...
* Sketch a simd128 optimized q4k vecdot.
* Simdify.
* More quantization optimizations.
* Again more simdification.
* Simdify the splitting loop.
2023-09-27 20:19:38 +01:00
667f01c173
Simd128 vec-dot for q4_0. ( #974 )
...
* Simd128 vec-dot for q4_0.
* Bugfix.
* Add wasm tests.
* Bugfix for the q40 vecdot.
* More quantization tests.
2023-09-27 14:15:30 +01:00
e59784e353
simd128 optimized q8_0 vecdot ( #972 )
...
* wasm/simd128 version of the quantized q8_0 vecdot.
* Add the missing conversion.
2023-09-27 11:03:20 +01:00
ce0a4e3a85
Use the gelu-erf activation. ( #969 )
2023-09-26 22:30:21 +01:00
4abc1ea34d
Avoid some overflows on wasm32. ( #968 )
2023-09-26 11:15:38 +01:00
dc47224ab9
Override the default cudnn heuristics. ( #957 )
2023-09-25 10:31:53 +01:00
e32c89d90c
Add the buffered safetensor wrapper. ( #948 )
2023-09-23 22:57:42 +01:00
890d069092
Self-contained safetensor wrappers ( #946 )
...
* Self-contained safetensor wrappers.
* Use the new safetensor container in varbuilders.
2023-09-23 20:39:52 +01:00
ccf352f3d1
Use yoke to provide a self-referential container for mmaped safetenso… ( #939 )
...
* Use yoke to provide a self-referential container for mmaped safetensor files.
* Add the new self-owned type for safetensor files without removing the previous version.
* Add routing.
* Add an initializer for the case of multiple files.
2023-09-23 15:43:11 +01:00
912a3d63b0
Use the proper block size for quantizing models. ( #933 )
...
* Use the proper block size for quantizing models.
* Use the proper dimension.
2023-09-22 21:36:56 +01:00
8601537e31
Add slice-scatter. ( #927 )
...
* Add slice-scatter.
* Add the op.
* Make transpose be a no-op when the dimensions are identical.
* Add the backprop.
* And add some gradient test.
2023-09-22 12:18:16 +01:00
3b557765e8
T5 quantized example ( #922 )
...
* Load gguf files for the quantized t5.
* Add the quantized t5 example.
* Allow for loading local files.
* Add some support for quantizing safetensor files.
* Transpose before quantizing.
* Quantized t5.
* Retrieve the weights from the hub.
2023-09-21 12:33:15 +01:00
2619c4307f
Add a quantized version of the t5 model. ( #921 )
2023-09-21 11:13:39 +01:00
7b26e513f1
Add the erf function. ( #917 )
2023-09-21 06:19:10 +01:00
d7e48234d4
Add an erf based gelu op ( #900 )
...
* Erf based gelu.
* Add the erf backed gelu.
* Test the new gelu op (which is not gelu_new).
2023-09-19 19:54:28 +01:00
4f91c8e109
Improve the error message on shape mismatch for cat. ( #897 )
...
* Improve the error message on shape mismatch for cat.
* Cosmetic tweak.
2023-09-19 15:09:47 +01:00
7dd8e12472
Bump the crate versions to v0.2.3. ( #886 )
...
* Bump the crate version.
* Also update the python bindings.
2023-09-18 12:14:03 +01:00
635012d770
Do not backprop through argmin/argmax. ( #865 )
2023-09-15 22:15:40 +01:00
2746f2c4be
DiffNeXt/unet ( #859 )
...
* DiffNeXt/unet
* Start adding the vae.
* VAE residual block.
* VAE forward pass.
* Add pixel shuffling.
* Actually use pixel shuffling.
2023-09-15 10:14:02 +01:00
130fe5a087
Add the upblocks. ( #853 )
2023-09-14 22:24:56 +01:00
d6447ad635
Tensor based indexing. ( #842 )
2023-09-14 07:47:07 +01:00
9a465e1b26
Add 1d upsampling. ( #839 )
...
* Add 1d upsampling.
* Add the interpolate functions.
2023-09-13 16:50:39 +01:00
b11a2a7b9d
Move the constant to avoid some unused warning. ( #837 )
2023-09-13 11:56:53 +01:00
1c09164021
Add CANDLE_NVCC_CCBIN
support for candle-kernels
, and eliminate warning. ( #836 )
2023-09-13 11:39:22 +01:00
18d3c803a8
Scalar support in minimum/maximum. ( #832 )
...
* Scalar support in minimum/maximum.
* Add a clamp method to tensors.
2023-09-13 08:24:58 +01:00
2257f4d475
Bump the crate version + update the changelog. ( #822 )
2023-09-12 06:39:24 +01:00
871efc0307
Bugfix for the conv2d cpu kernel. ( #820 )
2023-09-11 23:11:27 +01:00
c5a058b169
Use the module trait in stable-diffusion. ( #817 )
2023-09-11 20:40:07 +01:00
dbd4561416
im2col version of the conv1d kernel. ( #815 )
...
* im2col version of the cuda conv1d kernel.
* im2col version of the conv1d cpu kernel.
2023-09-11 14:40:09 +01:00
70f38c2069
Proper error on unsupported dtypes when using gemm. ( #813 )
2023-09-11 12:10:51 +01:00
df712ecf64
Handle the case where the kernel is not contiguous in the cuda backend. ( #809 )
2023-09-11 09:48:31 +01:00
6fb665004c
Enable im2col on the cpu side. ( #805 )
...
* Enable im2col on the cpu side.
* Hook im2col on the cpu backend.
* Use the kernel offset.
* Avoid an unnecessary copy.
* Handle non-contiguous kernels.
* Add a const to select the conv2d kernel.
2023-09-11 09:28:13 +01:00
1cd74129d4
Add Im2Col support on the gpu side. ( #808 )
...
* Add Im2Col support on the gpu side.
* Actually enable.
2023-09-11 08:52:33 +01:00
98d1242b8f
im2col based conv2d ( #802 )
...
* im2col implementation for conv2d.
* Fix for the im2col implementation to match the current conv2d.
* Small optimization.
* Add a cuda kernel.
* Handle arbitrary layouts.
* Im2Col cuda code.
2023-09-10 21:02:42 +01:00
258ac32c38
Fix cuda randn when generating an odd number of values. ( #793 )
2023-09-09 18:44:21 +01:00
c88d6fd4b9
Remove set_training. ( #784 )
2023-09-09 08:27:37 +01:00
057f7909bc
Accelerate support for gelu. ( #782 )
2023-09-08 21:58:56 +01:00
acf8f10ae1
Get the comparison operation to work on scalar values. ( #780 )
...
* Get the comparison operation to work on scalar values.
* Add some time measurement.
2023-09-08 20:13:29 +01:00
158ff3c609
Add tracing to segment-anything ( #777 )
...
* Tracing support for segment-anything.
* More tracing.
* Handle the empty slice case.
2023-09-08 15:31:29 +01:00
98172d46fa
Fix some errors about BlockQ8_1 ( #776 )
...
* use int8 type instead of uint8 for BlockQ8_1.qs
The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ4_1
Ref: ebc96086af/ggml.c (L2840)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ5_1
Ref: ebc96086af/ggml.c (L3490)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
---------
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
2023-09-08 13:29:40 +01:00
0e250aee4f
Shape with holes ( #770 )
...
* Shape with holes.
* rustfmt.
2023-09-08 08:38:13 +01:00
7396b8ed1a
Segment Anything - process images ( #766 )
...
* Start processing images.
* Add LayerNorm2d.
* Properly use LayerNorm2d.
* Tweak eps.
* Use LayerNorm on inputs with a rank different from 3.
* Window partitioning.
* Fix a couple todos.
* More todos.
* Hard-code the einsums.
* More padding support.
* Some sizes tweaks.
* Use the hub to get the weights.
* Use a batch matmul.
* Tweaks.
* More fixes.
* Get some predictions to be generated.
2023-09-07 19:22:45 +01:00
6527ab81a3
Sketch the segment anything model. ( #759 )
...
* Sketch the segment anything model.
* Fix some clippy lint.
* Add the mask decoder.
2023-09-07 05:34:05 +01:00
7b1f2da828
Cudnn fix. ( #758 )
2023-09-06 17:39:39 +01:00
7299a68353
img2img pipeline for stable diffusion. ( #752 )
...
* img2img pipeline for stable diffusion.
* Rename the arguments + fix.
* Fix for zero strength.
* Another fix.
* Another fix.
* Revert.
* Include the backtrace.
* Noise scaling.
* Fix the height/width.
2023-09-06 07:06:49 +01:00
a4f40f3dc8
Use rayon directly rather than constraining the number of threads. ( #749 )
2023-09-05 20:26:15 +01:00