Commit Graph

  • ece3ec6167 Final updates -> moving to deterministic for easier comparison. Ubuntu 2023-06-28 12:27:03 +00:00
  • 926fffa0b7 Ok. Ubuntu 2023-06-27 17:27:54 +00:00
  • e29dae044d Tmp. Ubuntu 2023-06-27 17:24:26 +00:00
  • 6c9e6b5a99 Get the cuda tests to pass. laurent 2023-06-28 15:53:23 +01:00
  • 8b4b2d1830 Merge pull request #28 from LaurentMazare/fix_hub Nicolas Patry 2023-06-28 16:49:07 +02:00
  • 3f0d9fbb25 Adapt the cuda bits. laurent 2023-06-28 15:43:03 +01:00
  • cfdfc04d5c Remove the unecessary file lock, attempt to rename before copying. Nicolas Patry 2023-06-28 16:42:26 +02:00
  • cca699be6c Fix some cpu issue. laurent 2023-06-28 15:09:15 +01:00
  • 1c755c0e5b Remove some todos. laurent 2023-06-28 14:33:06 +01:00
  • caafef6cc1 Get the cpu tests to run. laurent 2023-06-28 14:32:02 +01:00
  • 14449ff80c Get the cpu backend to compile. laurent 2023-06-28 14:12:38 +01:00
  • 54a6c40f27 Propagate the changes on the cpu backend. laurent 2023-06-28 14:00:49 +01:00
  • 303b853098 Propagate the layout refactoring. laurent 2023-06-28 13:42:23 +01:00
  • 30b355ccd2 Simplify the narrow implementation. laurent 2023-06-28 13:09:59 +01:00
  • c1bbbf94f6 Start refactoring the stride. laurent 2023-06-28 12:57:30 +01:00
  • d461d9d751 Merge pull request #26 from LaurentMazare/narrow-grad Laurent Mazare 2023-06-28 11:46:13 +01:00
  • 666d6dbcac Merge remote-tracking branch 'origin/main' into narrow-grad laurent 2023-06-28 11:45:46 +01:00
  • 2998ff6ef7 Merge pull request #25 from LaurentMazare/fix_hub Nicolas Patry 2023-06-28 12:42:38 +02:00
  • 7938d2b848 Add the grad for narrow. laurent 2023-06-28 10:46:00 +01:00
  • 9c86e4afa8 Fix flaky test ? Nicolas Patry 2023-06-28 11:40:41 +02:00
  • d0ff3b2d13 Merge pull request #24 from LaurentMazare/more-grads Laurent Mazare 2023-06-28 10:04:51 +01:00
  • 615196e7be Add more gradients. laurent 2023-06-28 09:59:52 +01:00
  • 50eff0005b Merge pull request #23 from LaurentMazare/relu Laurent Mazare 2023-06-28 09:44:24 +01:00
  • 1ce3843cab Add the relu op. laurent 2023-06-28 09:38:54 +01:00
  • b805c4114b Merge pull request #22 from LaurentMazare/more-cuda-testing2 Laurent Mazare 2023-06-28 09:01:25 +01:00
  • 19183b8e4f Factor out the gemm bits. laurent 2023-06-28 08:51:13 +01:00
  • 0417d9cec8 Add more cuda testing again. laurent 2023-06-28 08:33:43 +01:00
  • 64c6bc4f5e Merge pull request #21 from LaurentMazare/more-cuda-tests Laurent Mazare 2023-06-28 08:19:01 +01:00
  • 395c84e80a Also run the backprop tests on cuda. laurent 2023-06-28 08:15:03 +01:00
  • a457020d50 Merge pull request #20 from LaurentMazare/tensor-display Laurent Mazare 2023-06-27 21:53:09 +01:00
  • b0f5f2d22d Add some display tests + bugfixes. laurent 2023-06-27 21:37:28 +01:00
  • 8c81a70170 PyTorch like display implementation. laurent 2023-06-27 21:16:35 +01:00
  • 934655a60d Add squeeze/unsqueeze/stack. laurent 2023-06-27 19:32:00 +01:00
  • 1d504cc6b3 Rework the debug trait. laurent 2023-06-27 19:10:30 +01:00
  • d28bf64ed6 Merge pull request #18 from LaurentMazare/tensor-helper Laurent Mazare 2023-06-27 17:43:04 +01:00
  • 684f66326d Add the get method. laurent 2023-06-27 17:39:58 +01:00
  • c44e5346f4 Add some helper functions. laurent 2023-06-27 17:37:09 +01:00
  • efc39b71c5 Merge pull request #17 from LaurentMazare/cuda-test-utils Laurent Mazare 2023-06-27 16:24:04 +01:00
  • dbe3e4e7c0 Add some test utils module. laurent 2023-06-27 16:20:28 +01:00
  • aa35c418a5 Merge pull request #16 from LaurentMazare/cuda-tests Laurent Mazare 2023-06-27 15:51:28 +01:00
  • 47937650aa And add back some readme :) laurent 2023-06-27 15:50:43 +01:00
  • e221d38819 Factor the slicing code in cuda. laurent 2023-06-27 15:45:59 +01:00
  • 07a682c2ff Run the tensor tests for the cuda backend too. laurent 2023-06-27 15:37:01 +01:00
  • b3622c972f Merge pull request #15 from LaurentMazare/num-cpus Laurent Mazare 2023-06-27 14:45:08 +01:00
  • ca6aa8ff12 Use num-cpus to enable parallelism. laurent 2023-06-27 14:42:26 +01:00
  • 64ae526af4 Merge pull request #11 from LaurentMazare/add_hub Nicolas Patry 2023-06-27 15:37:52 +02:00
  • 70a90a1465 Clippy without features. Nicolas Patry 2023-06-27 14:04:20 +02:00
  • 75e0905832 Adding fully offline version. Nicolas Patry 2023-06-27 13:35:57 +02:00
  • 1a82bc50c9 [Tmp] Adding candle-hub Nicolas Patry 2023-06-27 12:07:34 +02:00
  • 8371890996 Merge pull request #12 from LaurentMazare/fix_ci Nicolas Patry 2023-06-27 13:58:01 +02:00
  • c2edaf83eb Ignoring candle-kernels during CI. Nicolas Patry 2023-06-27 13:53:23 +02:00
  • 140a8edf01 Merge pull request #14 from LaurentMazare/llama-opt Laurent Mazare 2023-06-27 12:21:31 +01:00
  • 318503cd38 Cache the causal mask in llama. laurent 2023-06-27 12:21:08 +01:00
  • 527a71fdad Merge pull request #13 from LaurentMazare/cuda-bugfixes Laurent Mazare 2023-06-27 11:32:26 +01:00
  • 380d61e990 Fix two cuda bugs (matmul and where_cond). laurent 2023-06-27 11:31:04 +01:00
  • 0fed864bbf Does this prevent candle-kernels test suite from being run ? Nicolas Patry 2023-06-27 12:14:53 +02:00
  • d7f729fb8f Refactor the hierarchy. Nicolas Patry 2023-06-27 11:57:27 +02:00
  • 6c4a960b15 Embedding bugfix. laurent 2023-06-27 09:56:19 +01:00
  • 18707891b7 Fix an error message. laurent 2023-06-27 09:45:38 +01:00
  • bb262ecc99 More casting kernels. laurent 2023-06-27 09:36:35 +01:00
  • ee3d290f8b Cuda support for dtype conversions. laurent 2023-06-27 09:15:46 +01:00
  • 51640ba7e6 Merge pull request #10 from LaurentMazare/f16 Laurent Mazare 2023-06-27 05:59:59 +01:00
  • e152c1273d Add more context for missing cuda kernels. laurent 2023-06-27 05:56:19 +01:00
  • 4d19889acc where_cond for f16. laurent 2023-06-26 22:14:32 +01:00
  • a6a7477bea Matmul cublas support for f16. laurent 2023-06-26 22:08:22 +01:00
  • 36a4749e95 Add the f16 affine kernel. laurent 2023-06-26 22:05:31 +01:00
  • 53fdbda683 Add the f16 sum kernel (fix). laurent 2023-06-26 22:02:22 +01:00
  • 93e24f29f4 Add the f16 sum kernel. laurent 2023-06-26 22:01:29 +01:00
  • d204f1c7c0 Cuda support for embedding f16. laurent 2023-06-26 21:58:15 +01:00
  • becb822ce0 Support more types in the cpu matmul. laurent 2023-06-26 21:37:41 +01:00
  • 7cfa4c307c Handle f16/bf16 in npy. laurent 2023-06-26 21:10:03 +01:00
  • de1f612645 Remove the default features from the CI as cuda is not available. laurent 2023-06-26 20:56:13 +01:00
  • 22da2c7e02 More f16 and bf16 support. laurent 2023-06-26 20:52:01 +01:00
  • a31411fd91 Start adding f16/bf16 support. laurent 2023-06-26 19:37:47 +01:00
  • 36a1a48ba0 Avoid a cast when no conversion is required. laurent 2023-06-26 18:16:19 +01:00
  • 46789c403c Cublas fixes. laurent 2023-06-26 17:59:27 +01:00
  • 1ad5baecc5 Handle transposed matrixes in cublas. laurent 2023-06-26 17:49:29 +01:00
  • 3761f02aa8 Use atomicAdd as a quick workaround some cuda synchronisation issue. laurent 2023-06-26 16:31:24 +01:00
  • f2ac5547fc Avoid the race condition on cuda sums. laurent 2023-06-26 16:19:06 +01:00
  • 687c5beb6a Decompose the softmax op so that it can be run on cuda. laurent 2023-06-26 15:36:21 +01:00
  • 33c0234a33 (Properly) add the where kernels. laurent 2023-06-26 13:25:56 +01:00
  • cd2a171c06 Add the where kernels. laurent 2023-06-26 13:25:02 +01:00
  • b1d6e264da Sketch the where_cond cuda kernel wrapper. laurent 2023-06-26 13:11:14 +01:00
  • 95a2c8e7da Add helper functions for fortran contiguous data. laurent 2023-06-26 13:02:06 +01:00
  • f6104c4b64 Add the reduce-sum kernel. laurent 2023-06-26 12:35:26 +01:00
  • 16f0f5b9d2 Add a cuda kernel for embeddings. laurent 2023-06-26 11:47:57 +01:00
  • 5952c3fa91 Cleanup the broadcast setup. laurent 2023-06-26 10:49:34 +01:00
  • 217bdcdf4d Fix the error message. laurent 2023-06-26 10:14:34 +01:00
  • 59a59f41a6 Add the cuda mode to llama. laurent 2023-06-26 10:06:44 +01:00
  • 512d12e38d Avoid copying the data around when loading weights. laurent 2023-06-26 08:09:03 +01:00
  • 4ad5d17d8c Slightly more efficient weight loading. laurent 2023-06-26 07:56:25 +01:00
  • 11696e6377 Faster model weight loading. laurent 2023-06-26 07:40:11 +01:00
  • d867155ef2 Load the weights for llama. laurent 2023-06-26 07:23:59 +01:00
  • 7a3101f15f Llama bugfix. laurent 2023-06-26 07:07:56 +01:00
  • 97424289d1 Fix the llama causal mask inversion. laurent 2023-06-25 21:16:54 +01:00
  • 117f014b55 Add where_cond and properly apply the causal mask. laurent 2023-06-25 21:08:03 +01:00
  • 25bcad290e Fix the causal mask computation. laurent 2023-06-25 20:19:30 +01:00
  • 8e404eb125 Get a some first inference to work on llama. laurent 2023-06-25 18:26:15 +01:00
  • 87c5aab005 More llama fixes. laurent 2023-06-25 18:08:41 +01:00
  • 60a5598c8b Fix some shape errors. laurent 2023-06-25 17:56:59 +01:00