Finished scaffolding, lots of TODOs

- Most kernels just copy themselfs to get the shapes correct
- Matmul works only in 1 case and simply empty allocates otherwise
- Logits and randomized to make the demo finish itself.

Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal.

Couln't get it super high by removing the obvious blockers (println + the actual running matmuls).

Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.
This commit is contained in:
Nicolas Patry
2023-11-02 15:32:28 +01:00
parent 82cce52e73
commit 7161002a34
11 changed files with 212 additions and 52 deletions

View File

@ -190,6 +190,16 @@ impl candle::CustomOp1 for SoftmaxLastDim {
device: dev.clone(),
};
Ok((dst, layout.shape().clone()))
}
#[cfg(feature = "metal")]
fn metal_fwd(
&self,
storage: &candle::MetalStorage,
layout: &Layout,
) -> Result<(candle::MetalStorage, Shape)> {
println!("TODO softmax-last-dim");
Ok((storage.clone(), layout.shape().clone()))
}
}