Commit Graph

4 Commits

Author SHA1 Message Date
0794e70a19 Debugging index_add. 2023-11-03 12:09:05 +01:00
f57e3164ae Implemented cos for now. 2023-11-03 01:24:51 +01:00
7161002a34 Finished scaffolding, lots of TODOs
- Most kernels just copy themselfs to get the shapes correct
- Matmul works only in 1 case and simply empty allocates otherwise
- Logits and randomized to make the demo finish itself.

Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal.

Couln't get it super high by removing the obvious blockers (println + the actual running matmuls).

Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.
2023-11-02 15:32:28 +01:00
82cce52e73 Rename candle-metal -> candle-metal-kernels 2023-11-02 09:53:29 +01:00