mirror of
https://github.com/huggingface/candle.git
synced 2025-06-20 04:00:28 +00:00
Finished scaffolding, lots of TODOs
- Most kernels just copy themselfs to get the shapes correct - Matmul works only in 1 case and simply empty allocates otherwise - Logits and randomized to make the demo finish itself. Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal. Couln't get it super high by removing the obvious blockers (println + the actual running matmuls). Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.
This commit is contained in:
@ -190,6 +190,16 @@ impl candle::CustomOp1 for SoftmaxLastDim {
|
||||
device: dev.clone(),
|
||||
};
|
||||
Ok((dst, layout.shape().clone()))
|
||||
}
|
||||
|
||||
#[cfg(feature = "metal")]
|
||||
fn metal_fwd(
|
||||
&self,
|
||||
storage: &candle::MetalStorage,
|
||||
layout: &Layout,
|
||||
) -> Result<(candle::MetalStorage, Shape)> {
|
||||
println!("TODO softmax-last-dim");
|
||||
Ok((storage.clone(), layout.shape().clone()))
|
||||
}
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user