548b1df7ea
Remove the dependency to blas and use mkl directly. ( #125 )
2023-07-10 15:52:03 +01:00
221b1aff65
Support dgemm in mkl matmul. ( #122 )
2023-07-10 15:02:37 +01:00
270997a055
Add the elu op. ( #113 )
2023-07-09 21:56:31 +01:00
dd60bd84bb
MKL adjustments. ( #87 )
2023-07-06 11:37:27 +01:00
c297a50960
Add mkl support for matrix multiply. ( #86 )
...
* Fix some rebase issues.
* Use mkl instead.
* Use mkl in bert.
* Add the optional mkl feature.
* Conditional compilation based on the mkl feature.
* Add more mkl support.
2023-07-06 11:05:05 +01:00
459e2e1ae3
Properly handle the stride in conv1d.
2023-07-04 15:05:04 +01:00
b3d4d0fd0f
Very inefficient conv1d implementation.
2023-07-04 13:50:41 +01:00
950b4af49e
Proper conv1d dispatch.
2023-07-04 11:29:28 +01:00
a424d95473
Add more of the conv1d op.
2023-07-04 11:15:45 +01:00
3aac1047fe
Sketch the conv1d op.
2023-07-04 10:52:34 +01:00
a57b314780
Add a batch dimension on the bert example.
2023-07-04 06:10:52 +01:00
86d691c74c
Better handling of the batch dimension in matmul.
2023-07-03 22:51:40 +01:00
bbe0c5fbaa
Do not use rayon for a single thread (bis).
2023-06-30 18:47:22 +01:00
6b67d25d9f
Do not use rayon for a single thread.
2023-06-30 18:46:32 +01:00
fbc329ed85
Add the verbose cpu cast operations.
2023-06-30 10:33:29 +01:00
8ad47907f3
Add the kernels.
2023-06-30 10:26:56 +01:00
b4aab7b95f
Put more requirements on the withdtype trait.
2023-06-29 11:37:42 +01:00
eaa3ce359e
Cosmetic change.
2023-06-28 22:02:23 +01:00
1328b5cb20
Factor some code out.
2023-06-28 21:56:44 +01:00
c583ee0f2c
Add map2.
2023-06-28 21:38:01 +01:00
46c07b924c
Tweak some comment.
2023-06-28 21:10:54 +01:00
2ae368e98e
Switch from a macro to a trait to make things more generic.
2023-06-28 21:06:56 +01:00
3f0d9fbb25
Adapt the cuda bits.
2023-06-28 15:43:03 +01:00
cca699be6c
Fix some cpu issue.
2023-06-28 15:09:15 +01:00
1c755c0e5b
Remove some todos.
2023-06-28 14:33:06 +01:00
caafef6cc1
Get the cpu tests to run.
2023-06-28 14:32:02 +01:00
14449ff80c
Get the cpu backend to compile.
2023-06-28 14:12:38 +01:00
54a6c40f27
Propagate the changes on the cpu backend.
2023-06-28 14:00:49 +01:00
303b853098
Propagate the layout refactoring.
2023-06-28 13:42:23 +01:00
c1bbbf94f6
Start refactoring the stride.
2023-06-28 12:57:30 +01:00
19183b8e4f
Factor out the gemm bits.
2023-06-28 08:51:13 +01:00
0417d9cec8
Add more cuda testing again.
2023-06-28 08:33:43 +01:00
ca6aa8ff12
Use num-cpus to enable parallelism.
2023-06-27 14:42:26 +01:00
d7f729fb8f
Refactor the hierarchy.
2023-06-27 11:57:27 +02:00