Commit Graph

105 Commits

Author SHA1 Message Date
e9c052bf94 Add the comparison operations. (#207)
* Add the comparison operations.

* Add the helper functions on the tensor side.

* More cmp operations.

* Cpu implementation for the comparison operations.
2023-07-20 09:40:31 +01:00
ad12e20f6b Add cpu support for min and max. (#202)
* Add cpu support for min and max.

* Add min/max all.
2023-07-19 17:11:44 +01:00
cb687b4897 Add some more developed training examples. (#199)
* Use contiguous tensors for variables.

* Sketch the mnist example.

* Start adding the reduce ops.

* Renaming.

* Refactor the reduce operations.

* Bugfix for the broadcasting vectorization.
2023-07-19 15:37:52 +01:00
67e20c3792 Sum over more dims. (#197) 2023-07-19 06:46:32 +01:00
76dcc7a381 Test the broadcasting binary ops. (#196) 2023-07-19 06:18:36 +01:00
fd55fc9592 Add an optimized case when performing the softmax over the last dimension. (#195) 2023-07-18 17:59:50 +01:00
6623c227d8 Allow the compiler to vectorize some broadcasting loops. (#194)
* Allow the compiler to vectorize some broadcasting loops.

* Improve the symmetrical broadcasting case.
2023-07-18 17:12:32 +01:00
79a5b686d0 Properly use the offset when broadcasting on a narrow slice. (#193) 2023-07-18 16:36:23 +01:00
a45a3f0312 Optimize the sum for the contiguous case. (#192) 2023-07-18 14:57:06 +01:00
ff61a42ad7 Use mkl to accelerate binary ops. (#190)
* Vectorized binary ops with mkl.

* Improve the binary op mkl support.

* Push the support for mkl binary ops.

* Proper vectorization of binary ops.

* Proper mkl'isation when broadcasting binary ops.
2023-07-18 12:04:39 +01:00
b706f32839 Add Shape try into (#189)
* Add the TryInto trait for shapes.

* Use the vectorized operations in block mode too.
2023-07-18 10:52:16 +01:00
d73df74cb2 Preliminary support for mkl based gelu. (#187)
* Preliminary support for mkl based gelu.

* Add the vectorized function for unary ops.

* Get the mkl specialized gelu to work.
2023-07-18 07:48:48 +01:00
acb2f90469 Broadcasting performance optimization (cpu) (#182)
* Avoid recomputing the index from scratch each time.

* More performance optimisations.
2023-07-17 13:41:09 +01:00
5b1c0bc9be Performance improvement. (#181) 2023-07-17 11:07:14 +01:00
28e1c07304 Process unary functions per block (#180)
* Process unary functions per block.

* Add some inline hints.
2023-07-17 10:22:33 +01:00
18ea92d83b Iteration over strided blocks (#175)
* Introduce the strided blocks.

* Use the strided blocks to fasten the copy.

* Add more testing.
2023-07-15 21:30:35 +01:00
d88b6cdca9 Add backtrace information to errors where relevant. (#166)
* Add backtrace information to errors where relevant.

* More backtrace information.

* Add to the FAQ.
2023-07-14 09:31:25 +01:00
bcf96e3cf3 Implement the backend trait for the cpu backend. (#143) 2023-07-12 09:54:33 +01:00
a76ec797da Cleanup the main crate error and add a couple dedicated ones (#142)
* Cosmetic cleanups to the error enum.

* More error cleanup.

* Proper error handling rather than panicing.

* Add some conv1d dedicated error.
2023-07-12 09:17:08 +01:00
ae79c00e48 Allow for uniform initialization in a single step. (#136) 2023-07-11 08:52:29 +01:00
f29b77ec19 Random initializers. (#128)
* Random initialization.

* CPU rng generation.
2023-07-10 18:26:21 +01:00
548b1df7ea Remove the dependency to blas and use mkl directly. (#125) 2023-07-10 15:52:03 +01:00
221b1aff65 Support dgemm in mkl matmul. (#122) 2023-07-10 15:02:37 +01:00
270997a055 Add the elu op. (#113) 2023-07-09 21:56:31 +01:00
dd60bd84bb MKL adjustments. (#87) 2023-07-06 11:37:27 +01:00
c297a50960 Add mkl support for matrix multiply. (#86)
* Fix some rebase issues.

* Use mkl instead.

* Use mkl in bert.

* Add the optional mkl feature.

* Conditional compilation based on the mkl feature.

* Add more mkl support.
2023-07-06 11:05:05 +01:00
459e2e1ae3 Properly handle the stride in conv1d. 2023-07-04 15:05:04 +01:00
b3d4d0fd0f Very inefficient conv1d implementation. 2023-07-04 13:50:41 +01:00
950b4af49e Proper conv1d dispatch. 2023-07-04 11:29:28 +01:00
a424d95473 Add more of the conv1d op. 2023-07-04 11:15:45 +01:00
3aac1047fe Sketch the conv1d op. 2023-07-04 10:52:34 +01:00
a57b314780 Add a batch dimension on the bert example. 2023-07-04 06:10:52 +01:00
86d691c74c Better handling of the batch dimension in matmul. 2023-07-03 22:51:40 +01:00
bbe0c5fbaa Do not use rayon for a single thread (bis). 2023-06-30 18:47:22 +01:00
6b67d25d9f Do not use rayon for a single thread. 2023-06-30 18:46:32 +01:00
fbc329ed85 Add the verbose cpu cast operations. 2023-06-30 10:33:29 +01:00
8ad47907f3 Add the kernels. 2023-06-30 10:26:56 +01:00
b4aab7b95f Put more requirements on the withdtype trait. 2023-06-29 11:37:42 +01:00
eaa3ce359e Cosmetic change. 2023-06-28 22:02:23 +01:00
1328b5cb20 Factor some code out. 2023-06-28 21:56:44 +01:00
c583ee0f2c Add map2. 2023-06-28 21:38:01 +01:00
46c07b924c Tweak some comment. 2023-06-28 21:10:54 +01:00
2ae368e98e Switch from a macro to a trait to make things more generic. 2023-06-28 21:06:56 +01:00
3f0d9fbb25 Adapt the cuda bits. 2023-06-28 15:43:03 +01:00
cca699be6c Fix some cpu issue. 2023-06-28 15:09:15 +01:00
1c755c0e5b Remove some todos. 2023-06-28 14:33:06 +01:00
caafef6cc1 Get the cpu tests to run. 2023-06-28 14:32:02 +01:00
14449ff80c Get the cpu backend to compile. 2023-06-28 14:12:38 +01:00
54a6c40f27 Propagate the changes on the cpu backend. 2023-06-28 14:00:49 +01:00
303b853098 Propagate the layout refactoring. 2023-06-28 13:42:23 +01:00