Get the ggml based llama to generate some text. (#464)

* Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.
2025-06-17 11:08:52 +00:00 · 2023-08-16 12:41:07 +01:00
parent fec87e86f5
commit 3071134788
7 changed files with 381 additions and 37 deletions
--- a/candle-core/src/error.rs
+++ b/candle-core/src/error.rs
@ -210,6 +210,10 @@ impl Error {
        Self::Wrapped(Box::new(err))
    }

+    pub fn msg(err: impl std::error::Error + Send + Sync + 'static) -> Self {
+        Self::Msg(err.to_string())
+    }
+
    pub fn bt(self) -> Self {
        let backtrace = std::backtrace::Backtrace::capture();
        match backtrace.status() {